views:

48

answers:

2

Hi there,

I'm trying to use DOMDocument and XPath to search an HTML document using PHP. I want to search by a number such as '022222', and it should return the value of the corresponding h2 tag. Any thoughts on how this would be done?

The HTML document can be found at http://pastie.org/1211369

+2  A: 

How about this?

$sxml = simplexml_load_string($data);
$find = "022222";

print_r($sxml->xpath("//li[.='".$find."']/../../../div[@class='content']/h2"));

It returns:

Array
(
    [0] => SimpleXMLElement Object
        (
            [0] => Item 2
        )

)

//li[.='xxx'] will locate the li your searching for. Then we use ../ to step up three levels, before we descend into the content-div, as specified by div[@class='content']. Finally we choose the h2 child.

Just FYI, here's how to do it using DOM:

$dom = new DOMDocument();
$dom->loadXML($data);

$find = "022222";

$xpath = new DOMXpath($dom);
$res = $xpath->evaluate("//li[.='".$find."']/../../../div[@class='content']/h2");

if ($res->length > 0) {
    $node = $res->item(0);
    echo $node->firstChild->wholeText."\n";
}
Emil H
Brilliant, works perfectly. I had no idea you could traverse the DOM using ../ Thanks!
RichW
@RichW, You're welcome. :)
Emil H
+2  A: 
I want to search by a number such as '022222', and it should return the value of the corresponding h2 tag. Any thoughts on how this would be done?

The HTML document can be found at http://pastie.org/1211369

To start with, the text at the provided link is not a well-formed XML or XHtml document and cannot be directly parsed with XPath.

Therefore I have wrapped it inan <html> element.

On this XML document one of the XPath expressions that selects exactly the wanted text node is:

/*/div[div/ul/li = '022222']/div[@class='content']/h2/text()

Among other advantages, this XPath expression doesn't use any reverse axes and is thus more readable.

The complete XML document on which this XPath expression is evaluated is the following:

<html>
 <div class="item">
    <div class="content"><h2>Item 1</h2></div>
    <div class="phone">
        <ul class="phone-single">
            <li>01234 567890</li>
        </ul>
    </div>
 </div>

 <div class="item">
    <div class="content"><h2>Item 2</h2></div>
    <div class="phone">
        <ul class="phone-multiple">
        <li>022222</li>
            <li>033333</li>
        </ul>
    </div>
 </div>

 <div class="item">
    <div class="content"><h2>Item 3</h2></div>
    <div class="phone">
        <ul class="phone-single">
            <li>02345 678901</li>
        </ul>
    </div>
 </div>

 <div class="item">
    <div class="content"><h2>Item 4</h2></div>
    <div class="phone">
        <ul class="phone-multiple">
            <li>099999999</li>
            <li>088888888</li>
        </ul>
    </div>
 </div>
</html>
Dimitre Novatchev
Another great example Dimitre, thanks for this. I see what you mean about it being more readable. The reason the XML wasn't well formed is that it's simply a snippet of a document, the actual XML is a very messy document that seemed pointless to post on here.
RichW
+1, That's a much cleaner xpath expression than mine. It's been a while since I worked regularly with xpath, so it seem like I've forgotten a bit. :)
Emil H