tags:

views:

45

answers:

2

I have the following HTML snippet

<tr>
<td class="1">...</td>
<td class="2">...</td>
<td class="3">...</td>
<td class="4">...</td>
</tr>
etc...

I basically have N rows, and each row contains 4 TD's each with a unique class. I would like a simple way to split out all the rows and TD's by class so I can choose what data I want to use.

I expect the easiest way to achieve this would be regex (maybe two). One to split up the TR's then another to split up the TDs (by class preferably)

Thanks

+3  A: 

Obligatory: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

Use a DOMParser or SimpleHTMLDom for html

Mike B
The DOMParser worked a treat
Chris
+1  A: 

Regex isn't typically a good way to parse HTML, I would recommend using SimpleXML http://www.php.net/manual/en/book.simplexml.php and running XPath queries on the data.

Michael
Using SimpleXML gives me loads of errors due to incorrectly formatted HTML and inline javascript. How would one get around this?
Chris
Hmm, I don't know of a good php library that handles errors well, in python I'd look to beautiful soup or lxml. When I've run into this I have (without pride) used regex. The SimpleHTMLDom library that Mike B suggested above claims to handle invalid html, maybe this would be worth a look?
Michael