views:

550

answers:

3

I've have the following (PHP) code that traverses an entire DOM document to get all of the text nodes. It's a bit of a ugly solution, and I'm sure there must be a better way... so, is there?

$skip = false;
$node = $document;
$nodes = array();
while ($node) {
    if ($node->nodeType == 3) {
     $nodes[] = $node;
    }
    if (!$skip && $node->firstChild) {
     $node = $node->firstChild;
    } elseif ($node->nextSibling) {
     $node = $node->nextSibling;
     $skip = false;
    } else {
     $node = $node->parentNode;
     $skip = true;
    }
}

Thanks.

+2  A: 

You could have a look at phpQuery, which lets you use jQuery-style selectors.

Chad Birch
Ideally I'm looking for something that works just with the existing DOM functions, without any need for additional libraries.
Jack Sleight
+3  A: 

The XPath expression you need is //text(). Try using it with xpath_eval. For example:

$xpath = $doc->xpath_new_context();
$textnodes = $xpath->xpath_eval('//text()');
Rob Kennedy
Perfect, thanks! :-)
Jack Sleight
A: 

Will preg_split work for you?

$textNodes = preg_split( '/<[^]+>/, $documentContent, -1, PREG_SPLIT_NO_EMPTY );
meouw