views:

46

answers:

3

hi there!

i'm working on some piece of code that should get the contents of a very specific html-tag of an html-document given.

$html = "<html>..........truncated.........<div>blablabla<br />xy</div>.....";
$dom = new DomDocument();
$dom->loadHTML($html);

$divs = $dom->getElementsByTagName('div');

echo $divs->item(0)->nodeValue.'<br>';

the html-code is just an example but shows the very problem i'm experiencing: i want to get the content of this DIV and i NEED the inner tags to be kept! what nodeValue (as well as "textContent") does, is returning the contents of the correct node with all inner tags stripped (http://docs.php.net/manual/en/class.domnode.php)

i'm out of ideas how to get this right atm... what i need is the equivalent to javascripts "innerHTML" or so... but i cant find such a method :(

how do i get this right?

A: 

Have you seen phpQuery? Might be too much for what you're trying to accomplish but it's worth taking a look at.

Marko
+1  A: 

This solution looks promising:

http://www.linked.com.mt/blog/code/php/php-domnode-tostring-xml/

$temp_doc = new DOMDocument('1.0', 'UTF-8');
$temp_node = $temp_doc->importNode($myDomNode, TRUE);
$temp_doc->appendChild($temp_node);
$my_node_as_string = $temp_doc->saveHTML();
Andrew67
+1  A: 

DOM is only good at parsing well-formed and 100% valid XML, so unless you're using 100% valid XHTML, it's going to fail.

What you want to use is the PHP Simple HTML DOM Parser library.

There are a great many tutorials on that site to help you w/ what you need.

hopeseekr