Hi
I want to parse content from
<td>content</td>
and
<td *?*>content</td>
and
<td *specific td class*>content</td>
How can i make this with regex, php and preg match?
Hi
I want to parse content from
<td>content</td>
and
<td *?*>content</td>
and
<td *specific td class*>content</td>
How can i make this with regex, php and preg match?
If you have an HTML document, you really shouldn't use regular expressions to parse it : HTML is just not "regular" enough for that.
A far better solution would be to load your HTML document using a DOM parser -- for instance, DOMDocument::loadHTML
and Xpath queries often do a really great job !
<td>content</td>
: <td>([^<]*)</td>
<td *specific td class*>content</td>
: <td[^>]*class=\"specific_class\"[^>]*>([^<]*)<
I think this sums it up pretty good.
In short, don't use regular expressions to parse HTML. Instead, look at the DOM classes and especially DOMDocument::loadHTML
@OP, here's one way
$str = <<<A
<td>content</td>
<td *?*>content</td>
<td *specific td class*>content</td>
<td *?*> multiline
content </td>
A;
$s = explode("</td>",$str);
foreach ($s as $a=>$b){
$b=preg_replace("/.*<td.*>/","",$b);
print $b."\n";
}
output
$ php test.php
content
content
content
multiline
content