I am parsing a HTML document with XPATH and I want to keep all the inner html tags.
The html in question is a unordered list with many list elements.
<ul id="adPoint1"><li>Business</li><li>Contract</li></ul>
I am parsing the document using the following PHP code
$dom = new DOMDocument();
@$dom->loadHTML($output);
$this->xpath = new DOMXPath($dom);
$testDom = $this->xpath->evaluate("//ul[@id='adPoint1']");
$test = $testDom->item(0)->nodeValue;
echo htmlentities($test);
For some reason the output always has the html tags omitted from it. I assume that this is because XPATH was not intended to be used in this way, but is there anyway around this?
I would really like to continue using XPATH as I already use it for parsing other areas of the page (single a href elements) without a problem.
EDIT: I know that there is a better way to get the data by iterating through the child elements of the UL. There is a more complicated part of the page which I also want to parse (block of javascript), but I am trying to provide an easier to understand example.
The actual block of code that I want is
<script language="javascript">document.write(rot_decode('<u7>Pbagnpg Qrgnvyf</u7><qy vq="pbagnpgQrgnvyf"><qg>Cu:</qg><qq>(58) 0078 8455</qq></qy>'));</script>
It has the problem that it omits all the closing tags but keeps the opening tags. I'm guessing it's because XPATH is trying to parse the inner elements rather than just treating it as a string.
If I try and select the script element with
$testDom = $this->xpath->evaluate("//div[@id='businessDetails']/script");
$test = $testDom->item(0)->nodeValue;
echo htmlentities($test);
my output will be, which you can see is missing all the closing tags.
document.write(rot_decode('<u7>Pbagnpg Qrgnvyf<qy vq="pbagnpgQrgnvyf"><qg>Cu:<qq>(58) 0078 8455'));