tags:

views:

56

answers:

3

Let's say the HTML contains 15 table tags, before each table there is a div tag with some text inside. I need to get the text from the div tag that is directly before the 10th table tag in the HTML markup. How would I do that?

The only way I can think of is to use explode('<table', $html) to split the HTML into parts and then get the last div tag from the 9th value of the exploded array with regular expression. Is there a better way?

I'm reading through the PHP DOM documentation but I cannot see any method that would help me with this task there.

A: 

If it is valid XHTML, you can use SimpleXml or DOM.
If not use SimpleHTML.

Usage examples of all three are plenty on SO.

You do not want to use Regex or string function on HTML.

Gordon
+1  A: 

You load your HTML into a DOMDocument and query it with this XPath expression:

//table[10]/preceding-sibling::div[1]

This would work for the following layout:

<div>Some text.</div>
<table><!-- #1 --></table>
  <!-- ...nine more... -->
<div>Some other text.</div> <!-- this would be selected -->
<table><!-- #10 --></table>
  <!-- ...four more... -->

XPath is capable of doing really complex node lookups with ease. If the above expression does not yet work for you, probably very little is required to make it do what you want.

HTML is structured data represented as a string, this is something substantially different from being a string. Don't give in to the temptation of doing stuff like this with string handling functions like explode(), or even regex.

Tomalak
A: 

And If you don't feel like learning xpath you can use the same old-school DOM walking techniques you would use with javascript in the browser.

document.getElementsByTagName('table')[9]

then crawl your way up the .previousSibling values until you find one that isn't a TextNode and is a DIV

Just gotta look for the php DOM way of doing it.

I've found that php's DOMDocument works pretty OK with not-so-perfect html and then once you have the DOM I think you can even pass that into a SimpleXML and work with it XML-style even though the original html/xhtml data string wasn't perfect.

thinsoldier