tags:

views:

259

answers:

3

I have a situation. I read in a html page using php using this piece of code $body = file_get_contents('index.htm');

Now in the index.htm file is a piece of html code like below that I sometimes need to remove/depends on criteria, so sometimes it needs to be removed and other times not.

<td><table><tr><td></td></tr></table></td>

How do I remove the whole table section between the td tags using PHP.

+2  A: 

If you are lucky enough that your page is XML then you could form a DOM and remove the from the DOM. Otherwise a regular expression should be easy as long as you don't have nested <table>s (in which case it's still possible but more tricky).

Mike McQuaid
You can get at nesting too if you use recursion lookaround.
eyelidlessness
Er, recursion *and* lookaround. Of course those aren't strictly "regular", they're PCRE extensions.
eyelidlessness
+1  A: 

One way to do it can be

$str = '<td><table><tr><td></td></tr></table></td>';
preg_match('/(<td>)(<table>.*<\/table>)(<\/td>)/',$str,$matches);

the resulting array

Array
(
    [0] => <td><table><tr><td></td></tr></table></td>
    [1] => <td>
    [2] => <table><tr><td></td></tr></table>
    [3] => </td>
)

can be used to recreate the

 '<td></td>'

without the table section

Anonymous
Your solution does not work if the TD has attributes, or the TABLE.
Christian Toma
You are right, but he didn't specify in the question that he had attributes, however the regexp can be adatpted to deal with variable attrs, or (as I suppose) the html that he needs to remove is always of the same kind, so he can hardcode attributes in the regexp.
Anonymous
+1  A: 

You can remove the table between td's using a regular expression replacement.

$html=preg_replace('/<td([^>]*)><table[^>]*>.*<\/table><\/td>/', '<td$1></td>', $html);

This also works if you have attributes in your or in your

I tried it myself (RegEx Tester) and it works, hope it also works for you.

Christian Toma