I'm using XPath to select an section from an HTML page. However when I use XPath to extract the node, it correctly selects only the text surrounding the HTML tags and not the HTML tags themselves.
Sample HTML
<body>
<div>
At first glance you may ask, “what <i>exactly</i>
do you mean?” It means that we want to help <b>you</b> figure...
</div>
</body>
I have the following XPath
/body/div
I get the following
At first glance you may ask, “what do you mean?” It means that we want to help figure...
I want
At first glance you may ask, “what <i>exactly</i> do you mean?” It means that we want to help <b>you</b> figure...
If you notice in the Sample HTML there is a <i/>
and <b />
HTML tags in the content. The words within those tags are "lost" when I extract the content.
I'm using SimpleXML in PHP if that makes a difference.