tags:

views:

163

answers:

2

Consider the following markup

<p align=center width='100' height=\"200\" attr=test>aasasd</p>

In order to make this markup valid i want to wrap quotes where they are required.

From the above exmaple i want to apply quotes so the markup will be:

<p align="center" width='100' height="200" attr="test">aasasd</p>

Does anyone know any regex patterns for this purpose?

Im using C#.

EDIT: Looks like i might have to do this another way. Can someone provide me with a Regular Expression to match these values:

align=center 
attr=test

Thanks

+2  A: 

Regex is probably not the right approach to this problem. Have a look at tidyfornet which is a .Net wrapper of HTML Tidy, a Java package which generates valid HTML/XHTML from tag soup.

Asaph
I thought so, apart from external libraries is it possible to do this with lets say XSLT transformation?
@Wololo: No. XSLT requires valid XML as input. The transformer would fail on the unquoted attributes.
Asaph
Thanks for your help so far. Unfortunately we cant install 3rd part libraries on our client machines. In fact i was looking for a regex pattern to match the unquoted values. The markup can be anything align=center was just sn example :)
@Wololo: I removed the regex from my answer. It would be impractical to come up with an effective regex to match unquoted attributes in tag soup. There are simply too many weird possibilities. It's unfortunate that you cannot install 3rd party libraries on your client's machine. This is a serious limitation. It means you'll have to reinvent a few wheels (a situation no programmer wants to be in). I would do what you can to make the case for HTML Tidy. Could you use it as a web service? That would avoid installing a 3 party library directly on the server (albeit in a pretty cheesy way).
Asaph
A: 

Something like this should work: /=('|\\"|\s*)([\w])*('|\\"|\s*)\b/

dimus