tags:

views:

24

answers:

3

How to exclude style attribute from HTML string with regular expressions?

For example if we have following inline HTML string:

<html><body style="background-color:yellow"><h2 style="background-color:red">This is a heading</h2><p style="background-color:green">This is a paragraph.</p></body></html>

When apply the regular expression matching, matched result should look like:

<html><body ><h2 >This is a heading</h2><p >This is a paragraph.</p></body></html>

A: 

You simply need to replace the style tags with nothing, here's an example how to do so with PHP:

$text = preg_replace('/\s+style="[^"]*"/', '', $text);
reko_t
A: 

It is mostly answered that regex's in most cases are not suitable for HTML, so you should provide the language in which you plan to implement this.

However a regex like this will replace the heading:

<h2\s+style="background-color:red">
// replace with
<h2>

The regex for the paragraph tag is analogous (replace 'h2' with 'p' and 'red' with 'green').

splash
+1  A: 

You can't parse HTML with regular expressions because HTML is not regular.

Of course you can cut corners at your own peril, for example by searching for style\s*=\s*"[^"]*" and replacing that with nothing, but that will remove any occurence of style="anything" from your text.

Tim Pietzcker