views:

242

answers:

1

This question is related to a similar case, namely http://stackoverflow.com/questions/2488950/removing-inline-styles-using-php

The solution there does not remove i.e: <font face="Tahoma" size="4">

But let's say I have a mixed bag of inline styles and properties, like this:

<ul style="padding: 5px; margin: 5px;">
    <li style="padding: 2px;"><div style="border:2px solid green;">Some text</div></li>
    <li style="padding: 2px;"><font face="arial,helvetica,sans-serif" size="2">Some text</font></li>
    <li style="padding: 2px;"><font face="arial,helvetica,sans-serif" size="2">Some text</font></li>  
</ul>

What regExp is needed to achieve this result?

<ul>
    <li><div>Some text</div></li>
    <li><font>Some text</font></li>
    <li><font>Some text</font></li>  
</ul>

Thanks for reading the question, any help is appreciated.

+3  A: 

As usual, regex isn't ideal for parsing HTML; it's very possible you'd be better off with an actual HTML parser.

That said...

$noattributes = preg_replace('/<(\w+) [^>]+>/', '<$1>', $original);

...will replace any opening tags that contain attributes with the corresponding tag w/o attributes. It might, however, accidentally also hit "tags" that are contained within quoted attributes of other tags (and thus not actually tags themselves). It will also cause problems with self-closing tags (it'll replace <br /> with <br>) - though this can be avoided if the self-closing tags don't have a space between the tag name and the slash.

Amber
Like so?$formatted = preg_replace('<(\w+) [^>]+>,'<$1>', $text);
bakkelun
See my edited version; you have to remember to delimit the regex.
Amber
I agree, using HTML parsing is better
TravisO
Yes, of course. The thing is, I'm not parsing an entire XML/HTML document, I'm using xPath to retrieve the section I need, but the description for each item can contain some HTML (like the example provided). Using regExp on this section shouldn't hit too much performance-wise, should it?
bakkelun
Probably not. If you're going to be using the same regexp multiple times, the PCRE evaluator usually caches the compiled form of the regex for you, so there's not too much of a hit.
Amber