tags:

views:

61

answers:

3

I have raw HTML and I need to set all IMG src="http://foo".

This is the RegEx I have so far, and it seems to work. In my environment, it is safe to assume that tags are uppercase and attributes are lowercase. I am doing this in .Net, but I don't think that the platform really matters here. \s is any whitespace in the .Net RegEx engine.

Can anybody improve on it?

Regex.Replace(htmlText, "(<IMG[^>]*\\ssrc=\")([^\"]*)(\"[^>]*>)", "$1http://foo$3")
A: 

Maybe allow for multiple spaces with \s+

Sarah Vessels
A: 

not a regexpert by any means, but try txt2re.com

maybe this will get you started: http://txt2re.com/index-ruby.php3?s=%3CIMG%20src=%22http://foo.bar/baz.jpg%22%20/%3E&amp;1

+2  A: 

Match the entire IMG tag first, and then match the src="([^\"]*)" attribute, replacing it, giving you a replacement string for the src="..." part only.

You can then use the original match, and search for the whole tag, and replace it with this whole tag.

maxwellb
So the search for image tag would be "<[Ii][Mm][Gg][^>]*>", and you could catch the src tag as lowercase, or insensitive in a similar way.
maxwellb