tags:

views:

23

answers:

2

I am currently working a developer tracker for a game without using regex. I have hit a road block when trying to parse html at certain parts.

What I am trying to parse:

<td class="alt1" id="td_post_139718"> 
<!-- message, attachments, sig --> 
        <!-- icon and title --> 
        <div class="smallfont"> 
            <img class="inlineimg" src="images/icons/icon1.gif" alt="Default" border="0" /> 
            <strong>Re: TERA's E3 2010 Coverage</strong> 
        </div> 

My Code:

$titleArray = array();
        foreach($idArray as $id) {
            $title = $dom->getElementById('td_post_'.$id);
            $smallFont = $title->getElementsByTagName("div");
            echo $smallFont->nodeValue;
        }

It yields:

Notice: Undefined property: DOMNodeList::$nodeValue in C:\wamp\www\crawler\crawler.php on line 71

Notice: Undefined property: DOMNodeList::$nodeValue in C:\wamp\www\crawler\crawler.php on line 71

Notice: Undefined property: DOMNodeList::$nodeValue in C:\wamp\www\crawler\crawler.php on line 71

I am trying to find the text within a that is within a dynamic .

I've tried all sorts of combinations to try and get it to work but I've been able to achieve it.

+1  A: 

The ::getElementsByTagName gives a node list. You have to iterate through it to retrieve the individual <div>s. Example:

foreach ($title->getElementsByTagName("div") as $smallFont)) {
    echo htmlspecialchars($smallFont->nodeValue), "<br />;
}

You can also use the textContent property instead. See e.g. this discussion.

Artefacto
Note that it returns a `DOMNodeList` even if there's only one value in the list. You can treat the node list as an array, or use the `item` method: `echo $smallFont->item(0)->nodeValue`
Charles
This method did work after I added a couple more iterations to seperate it from the message data below this code.Do you think it would be more efficient to use cURL regex or run 2 iterations of DOM?
Honzo
It would be more **correct** to use DOM, as regular expressions do not have enough sophistication to work correctly with HTML.
Artefacto
A: 

getElementsByTagName returns a DOMNodeList, not a single node. You'll have to access the individual node from the list before trying to access nodeValue:

echo $smallFont->item(0)->nodeValue;
Dexter