tags:

views:

36

answers:

2

I'm trying to use XPath in PHP and I get too many elements. This is my code:

libxml_use_internal_errors(true);
$document = new DOMDocument;
$document->strictErrorChecking = false;
$document->loadHTML($text);
$xpath = new DomXPath($document);
$placeholders = $xpath->query('//div[starts-with(@class, "waf-ph-")]');
print '$placeholders->length: ' . $placeholders->length;

There is only ONE element that matches the query. One. Not a single one more. But here's my output:

$placeholders->length: 7

I'm using loadHTML because I won't have full control over the input when I'm done coding and I can't guarantee standards-compliant XHTML. I do intend to use Tidy, but I'm trying not to rely on it yet. But there is only ONE div that matches the XPath.

Further investigation seems to indicate that it is the same element that has been found seven times.

What's going on?

Edit: the source of the DOM file is an HTML file that somewhere contains the following (this is a dummy address):

<div class="waf-ph-https\:\/\/aserver\.com\/apath\/app\.php5">
  <p class="notification">This is to be substituted.</p>
</div>

The string "waf-ph-" is found nowhere else in the file.

Edit:

Trying the following:

foreach ($document->placeholders as $node) print $document->saveXML($node);

returns the text of the above DIV seven times.

+1  A: 

If I use the snippet you give, I get one result.

For this XML:

$text = <<< XML
<root>
    <div class="waf-ph-1"></div>
    <div class="waf-ph-2"></div>
    <div class="waf-ph-3"></div>
    <div class="waf-ph-4"></div>
</root>
XML;

you will get four matches for the given XPath.

For this XML:

$text = <<< XML
<root>
    <div class="waf-ph-1"></div>
    <div class="wbf-ph-2"></div>
    <div class="wcf-ph-3"></div>
    <div class="wdf-ph-4"></div>
</root>
XML;

you will get only one. Your code is correct. This must be your HTML. Also note that //div will match any <div> regardless of it's position in the document. The following XML will also return 4 found nodes for your code:

$text = <<< XML
<root>
    <div class="waf-ph-1">
        <div class="waf-ph-2">
            <div class="waf-ph-3">
                <div class="waf-ph-4">
    </div></div></div></div>
</root>
XML;
Gordon
What you describe is exactly what my code was supposed to find. Thank you for telling me that my code is correct. I was going nuts! I'm not sure what's wrong then. ("It's never the compiler. Well, almost never.") It is going through Tidy... I'll look is Tidy duplicates my div. Though I don't see why it would. It's very hard and long for me to change the HTML file I source to the XPath: it's on restricted part of the server, so I just took for granted my code was wrong somehow. I guess I'll look into Tidy then. Even if the HTML is converted to XML, it's not likely to be THAT far off, is is?
eje211
@eje211 @ircmaxell gave a great suggestion in the comments below your question: output the found nodes. Do `foreach($placeholders as $node) { echo $document->saveXml($node); }` to print the XML to see what it found. Or simply output the whole document to see what DOM made of it.
Gordon
I didn't think of doing a saveXML. I was thinking of print_r, that's not very useful. My bad. I'll do it right now.
eje211
A: 

ircmxaell's comment about using spl_object_hash() really solved my problem and showed that, for once, the compiler (or interpreter) really was at fault. He should get credit for this question. Short of that, I'm writing this answer to credit him.

eje211