Page Comparison

This document is written regarding an internal feature for DOMDocuments if Laminas is not present.

As of netcurl 6.1.5, we start to laborate with DOMDocument again. As we are using html reading to generate RSS-links, DOMDocument and DOMElement plays an important part in fetching content instead of using regex-fetching as we in that case have to pars html tags and items manually. Instead we use this other method. In the test suite and this example we use a stored html-page from moviezine, which does not generate RSS data themselves. When we write this, our own wish is to be able to fetch all articles from their autogenerated list of news. We know that they have to kinds of elements where they store the content which also has classes applied to the element.

Features are available from the master branch.

In this particular case, we use xpath as the tasks (

Jira Legacy

server	Tornevall Networks
serverId	ef1f2374-e58a-319f-9d38-10348dbac859
key	NETCURL-339

/ #5) is based on. The elements we want to look for is:

Class XPath	Description
//*[@class="inner_article"]/a	The is very much based on the container for featured articles and will in our case return three articles.
//*[@class="articles_wrapper"]/a	After the featured articles, each article container has this class as the "main" class.

...

Class XPath (Sub)	Description
/*[contains(@class, "subtitle")]	This class, subtitle, contains the shorter title of the article.
/*[contains(@class, "lead")]	This class, lead, is the longer article text under the bolded titles for each element.

...

We can after this extraction collect each article element in a more "human friendly" array and start rendering content.

Code Block

language	php
theme	Emacs

foreach ($nodeInfo as $node) {
            $href = GenericParser::getValuesFromXPath($node, ['subtitle', 'mainNode', 'href']);
            $hrefText = GenericParser::getValuesFromXPath($node, ['subtitle', 'subNode', 'value']);
            $description = GenericParser::getValuesFromXPath($node, ['lead', 'subNode', 'value']);
            if (!empty($href)) {
                $articles[$href] = [
                    'title' => $hrefText,
                    'description' => $description,
                ];
            }
        }

The above example renders this array. From here on and forward, it will be much easier to handle. The main reason for why we do like this in the current example is to avoid duplicate hrefs.

Image Added

What if the above work is horrible?

It can be handled in one call also, as long as it is as "standard" as possible. Changes and adaptions may follow.

This is a oneshot and executes all above actions in one call.

Code Block

language	php
theme	Emacs

$nodeList = GenericParser::getContentFromXPath(
            file_get_contents(__DIR__ . '/templates/domdocument_mz.html'),
            [
                '//*[@class="inner_article"]/a',
                '//*[@class="articles_wrapper"]/a',
            ],
            [
                'subtitle' => '/*[contains(@class, "subtitle")]',
                'lead' => '/*[contains(@class, "lead")]',
            ],
            ['href', 'value'],
            ['subtitle' => 'mainNode', 'lead' => 'subNode'],
        );

The nodeList, when successful, will contain two variables:

Variable	Description
nodeInfo	Raw node info.
rendered	The rendered array.

Image Added

Versions Compared

Old Version 2

New Version Current

Key

What if the above work is horrible?