views:

1239

answers:

5

I'm trying to grab a specific bit of raw text from a web site. Using this site and other sources, I learned how to grab specific images using simpleXML and xpath.

However the same approach doesn't appear to be working for grabbing raw text. Here's what's NOT working right now.

// first I set the xpath of the div that contains the text I want
$xpath = '//*[@id="storyCommentCountNumber"]';

// then I create a new DOM Document
$html = new DOMDocument();

// then I fetch the file and parse it (@ suppresses warnings).
@$html->loadHTMLFile($url);

// then convert DOM to SimpleXML
$xml = simplexml_import_dom($html);   

// run an XPath query on the div I want using the previously set xpath
$commcount = $xml->xpath($xpath);
print_r($commcount);

Now when I'm grabbing an image, that commcount object would return an array that contains the images source in it somewhere.

In this case, I want that object to return the raw text contained in the "storyCommentCountNumber" div. But that text doesn't appear to be contained in the object, just the name of the Div.

What am I doing wrong? I can kind of see that this approach is only for grabbing HTML elements and the bits inside of them, not raw text. How do I get the text inside that div?

Thanks!

A: 

Try checking this page out.

:)

Salty
+1  A: 

Can you include a sample of the HTML (including maybe a few lines before and after the element you are selecting?) and the output from print_r()?

You might try the following to see if it helps you out:

if ( count($commcount) > 0 ) {
    $divContent = $commcount[0]->asXml();
    print $divContent;
}
Beau Simensen
+2  A: 

One thing to note, is that when you are using print_r or var_dump on SimpleXML objects you won't see the "text" of the object (or sometimes the attributes). So to see everything you should output full XML string using $variable->AsXml().

And to get the text you need to cast the SimpleXml object to a string. This automatically pulls out the innerText.

 /* remember $commcount is always an array from the xpath */
 foreach($commcount as $str)
 {
     echo (string)$str;
 }

Hopefully the above can give you a start.

null
There's no need for the cast. Echo does that already.
Ionuț G. Stan
+1  A: 

I know you are trying to use SimpleXML, but I would think that grabbing raw text would be easier with a regular expression.

unclerojelio
A: 

The raw text inside the div is not part of the div element itself, rather it is part of the first child node of the div element. There should be a text node within the div that contains the data you are looking for.

acrosman