views:

72

answers:

3

Possible Duplicate:
RegEx match open tags except XHTML self-contained tags

Hello,

i m stuck in weird regex problem, i m parsing some html table in php

regex i m using : <td[^>]*>(h.*?)</td>

<td>other data</td> <td>other data</td><td>Data_needed</td> <td>--</td>

but its matching all other data too

now i want to match it to <td>Data_needed</td> <td>--</td>

i tried some regex's which gives ouput like

other data</td> <td>other data</td><td>Data_needed</td> <td>--

starting from first <td> to last </td>

but i want Data_needed from <td>Data_needed</td> <td>--</td>

thanks for any help

+8  A: 

Do not use regex for parsing HTML or XML (including XHTML). Ever.

Use an HTML or XML parser instead. A quick search for "php html parsing" turned up this tool, Simple HTML DOM, as the first hit. PHP also has DOM and SAX tools built in.

Thomas Owens
Obligatory link: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
Oded
@Oded: I was going to dig that up. Thanks. I made my first sentence link to that post.
Thomas Owens
@Thomas, "dig that up". Hello, fellow http://digg.com user.
shamittomar
i particularly need regex in this case, thanks for your answer
Kevin
@Kevin: No, you don't need or want regex when parsing HTML. If you were given that as a requirement, that needs to be addressed immediately - you are using the wrong tools for the job.
Thomas Owens
see i just want one thing from whole page and there are lots of <td>'s but i dont want all <td>'s, how can i define that which one i need and which one i don't, i already know such things but no one is caring to make a perfect regex match lol
Kevin
+3  A: 

You can use the Simple HTML DOM for that instead.

A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!

Sarfraz
i particularly need regex in this case, thanks for your answer
Kevin
Suggested third party alternatives to [SimpleHtmlDom](http://simplehtmldom.sourceforge.net/) that actually use [DOM](http://php.net/manual/en/book.dom.php) instead of String Parsing: [phpQuery](http://code.google.com/p/phpquery/), [Zend_Dom](http://framework.zend.com/manual/en/zend.dom.html), [QueryPath](http://querypath.org/) and [FluentDom](http://www.fluentdom.org).
Gordon
thanks Gordon, but i m not parsing all page and elements, i just need one thing from the entire page :)
Kevin
A: 

general html parsing shouldn't be done using regex, but if your HTML is simple and not nested you can try

.*<td[^>]*>(.*?)</td>\s*<td>--</td>
Scott Evernden
well it not working :(
Kevin