views:

45

answers:

2

hey Guys,

I need to run a string of html through a regex function that checks to see if the attributes are closed in quotes, and if they aren't then close them.

for example i want

<img src=http://www.domain.com/image.gif border=0>

to turn into

<img src='http://www.domain.com/image.gif' border='0'>

Can anyone help me?

+3  A: 

How about using Tidy? Regexp really isn't the way to go around HTML.

Jakub Hampl
I really only need to enclose the SRC='' attribute. So regex will be good for this I believe. It doesn't have to work perfectly on a large scale, just work perfectly for <img src=> situations
atwellpub
Well (at your own peril): `preg_replace("/=([^'" ]+)/", "=\"$1\"", $html)`. It's pretty simple but should do the job in simple and standard cases.
Jakub Hampl
Thanks for your help, in my test case that yields :<IMG SRC="06032102.jpg WIDTH=250 HEIGHT=200 BORDER=0 ALIGN=left>"Can you help me refine the code?
atwellpub
A forgot to escape and the end angled bracket: `preg_replace("/=([^'\"> ]+)/", "=\"$1\"", $html)`. This works on my php.
Jakub Hampl
+2  A: 

Trying to parse, or validate, HTML is a complex job best not attempted with a regex. There are just too many possibilities for it to be efficient.

Jakub got there before me, but I agree. Use tools that exist for the job like HTML Tidy - http://tidy.sourceforge.net/

It can fix invalid HTML, see a nice overview at http://www.w3.org/People/Raggett/tidy/

There is some PHP integration at http://uk3.php.net/tidy

simonrjones