views:

46

answers:

2

Let's say this is my HTML:

<ul>
    <li><div style="width: 10em;">Hello</div><div class="ble"></div></li>
</ul>

I want to get this:

<ul>
    <li>Hello</li>
</ul>

As you can see, all div opening and closing tags were removed but not their content!

This is what I have so far:

$patterns = array();
$patterns[0] = '/<div.*>/';
$patterns[1] = '/</div>/';
$replacements = array();
$replacements[2] = '';
$replacements[1] = '';
echo preg_replace($patterns, $replacements, $html);
+1  A: 

replace '/<div.*>/' with '/<div.*?>/' This will remove greedy behavior of the * and match the first > encountered.

Also, you need to escape the backslash in your pattern for matching the closing tag - use:

'/<\/div>/';
Gopi
I get this error: Warning: preg_replace() [function.preg-replace]: Unknown modifier 'd' in...
Richard Knop
@Richard, escape the backslash in `'/</div>/'`: `'/<\/div>/'`, or use a different delimiter: `'#</div>#'`
Bart Kiers
@richard m afraid I may not be able to help you in that as I am not a php guy.
Gopi
@Bart K. That worked :)
Richard Knop
@Gopi just add the backshlash as bart K. suggested and I will accept your answer.
Richard Knop
@Richard - so edited.
Dominic Rodger
A: 

I would start with replacing both <div[^>]*> and </div[^>]*>with nothing. Though I know little about the specific PHP regex engine, the following sed worked fine:

pax> cat qq.in
<ul>
    <li><div style="width: 10em;">Hello</div><div class="ble"></div></li>
</ul>

pax> cat qq.in | sed -e 's/<div[^>]*>//g' -e 's/<\/div>//g'
<ul>
    <li>Hello</li>
</ul>

In fact, you should be able to combine that into one regex </?div[^>]*>:

pax> cat qq.in | sed -r -e 's_</?div[^>]*>__g'
<ul>
    <li>Hello</li>
</ul>
paxdiablo