views:

52

answers:

1

Hi! How can i counting the words in a html page, with domDocument?

for example, if the input is something like:

    <div> Hello something open. <a href="open.php">click</a> 
    lorem ipsum <a href="open.php">here></a>

the output:
Number Word
1 Hello
2 something
3 open
4 click
5 lorem
6 ipsum
7 here.

And what if i need only the linktext?
click 4
here 7

+2  A: 

If you need this for the entire document, it is likely easier to just strip_tags and then run str_word_count on the result.

If you have to do this with DOM, you can do

$str = <<< HTML
<div> Hello something open. <a href="open.php">click</a>
lorem ipsum <a href="open.php">here></a></div>
HTML;

$dom = new DOMDocument;
$dom->loadHTML($str);

$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//text()');

$textNodeContent = '';
foreach($nodes as $node) {
    $textNodeContent .= " $node->nodeValue";
}
print_r(str_word_count( $textNodeContent, 1 ));

Using text() as the XPath expression will only give you the textnodes in the document. You can limit this to just return the link texts with //a/text() for the expression.

Gordon
Thanks, and what should I do if I want to count the words before the links? example before the first link 3 words, before the second links 6 words
turbod
@turbod in that case you'd have to iterate over the Nodes with DOM.
Gordon