That's a common problem with DOM : you have to do a bit more work if you want to get the content of a tag, and the content of all its children.
Basically, you have to loop over the child nodes of the one you've matched with your XPath query, to get their contents.
There is a solution proposed in one one the user notes on the manual page of the DOMElement
class -- see this note.
Integrating this solution into the code you already have should give you something that looks like this for the declaration of the HTML string, with sub-tags :
$html = <<<HTML
<div class="main">
<div class="text">
<p>
Capture this <strong>text</strong> <em>1</em>
</p>
<p>
And some other <strong>text</strong>
</p>
</div>
</div>
HTML;
And, to extract the data from that HTML string, you can use something like that :
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//div[@class="main"]/div[@class="text"]');
foreach ($tags as $tag) {
$innerHTML = '';
// see http://fr.php.net/manual/en/class.domelement.php#86803
$children = $tag->childNodes;
foreach ($children as $child) {
$tmp_doc = new DOMDocument();
$tmp_doc->appendChild($tmp_doc->importNode($child,true));
$innerHTML .= $tmp_doc->saveHTML();
}
var_dump(trim($innerHTML));
}
The only thing that has changed is the content of the foreach
loop : instead of just using $tag->nodeValue
, you have to iterate over the child elements.
Which gives me the following output :
string '<p>
Capture this <strong>text</strong> <em>1</em>
</p>
<p>
And some other <strong>text</strong>
</p>' (length=150)
Which is the full content of the <div>
tag that was matched, and all its children -- including the tags.
Note : there are often interesting ideas and solution in the users notes of the manual ;-)