views:

290

answers:

3

Hi,

I've been trying to formulate a regular expression to remove any attributes that may be present in html tags but I'm having trouble doing this and Google doesn't seem to provide any answers either.

Basically my input string looks something like

<p style="font-family:Arial;" class="x" onclick="doWhatever();">this text</p>
<img style="border:0px" src="pic.gif" />

and I would like to remove any attributes inside the tag to produce a string like:

<p>this text</p>
<img src="pic.gif" />

Does anybody know a regex for doing this? I'm using Regex.Replace in C# by the way.

Thanks,

Tim

+2  A: 

There are really excellent tools for handling this sort of task in .NET without having to resort to the regex hammer. This will also be more reliable than a regular expression based solution.

I'd suggest that you take a look at HTML Agility Pack.

Mark Byers
thanks Mark, the HTML Agility pack sounds like it'll do what I want even though it's more code.
tt83
A: 

either that or using jquery each to go trough all html elements and remove attr. or from particular element. Why would you be doing that anyway?

c0mrade
tt83
+1  A: 

HTML is easiest interfaced with using a DOM, but if you really want to do this using a regex you could probably take advantage of that you want to remove all attributes, e.g. leave nothing left but the tag. IMO you should use a DOM parser instead.

pthulin