tags:

views:

126

answers:

3

A web page contains lots of image elements:

<img src="myImage.gif" width="180" height="18" />

But they may not be very well-formed, for example, the width or height attribute may be missing. And it also may not be properly closed with /. The src attribute is always there.

I need a regular expression that wraps these with a hyperlink having href set to the src of the img.

<a href="myImage.gif" target="_blank"><img src="myImage.gif" width="180" height="18" /></a>

I can successfully locate the images using this regexp in this editor: http://gskinner.com/RegExr/:

<img src="([^<]*)"[^<]*>

But what is the next step?

+1  A: 

In JavaScript, use string.replace() with $1 being the part you matched:

str.replace(/<img src="([^<]*)"[^<]*>/, 
    '"<a href="$1" target="_blank"><img src="$1" width="180" height="18" /></a>')

Or better still capture the whole image tag (now the src is $2 since it's in the second capture):

s.replace(/(<img src="([^<]*)"[^<]*>)/, '"<a href="$2" target="_blank">$1</a>')
Motti
I am not working in javascript but this should be a useful one. Thank you.
bobo
+4  A: 

A DOM-based method is best, but if that regex works (not easy to accomplish for general HTML input) to match the desired <img> elements, with the value of the src attribute captured in \1, then just replace the whole match (captured in \0) with:

<a href="\1" target="_blank">\0</a>

In Java, the backreferences in replacement string will be $0 and $1; I'm not sure what language you're using so adjust accordingly.

In Java, though, something like this would work:

String imgHrefed = str.replaceAll(
   "<img src=\"([^<]*)\"[^<]*>",
   "<a href=\"$1\" target=\"_blank\">$0</a>"
);

It wasn't clear from your question what to do with any other attributes that the <img> may have. The above replacement keeps them as they are. If you also want to rewrite them (i.e. you're not just wrapping <img> in an <a> anymore), then perhaps you want to rewrite to this:

<a href="\1" target="_blank"><img src="\1" width="180" height="18" /></a>
polygenelubricants
All answers are quite similar, since this one gets the most votes, I follow the majority that this is the best one.
bobo
+1  A: 

In .net the regex is basically the same as javascript in most cases but the notation of the surrounding code would be slightly different.

    string imageHtmlSnippet = @"<img src=""myImage.gif"" width=""180"" height=""18"" />";
    string imageHtmlReplacement = @"<a href=""$1"" target=""_blank""><img src=""$1"" width=""180"" height=""18"" /></a>";

    Regex findImages = new Regex(@"<img src=""([^<]*)""[^<]*>");

    string fixedHtmlSnippet = findImages.Replace(imageHtmlSnippet, imageHtmlReplacement);

HOWEVER - this regex will fail if the src isn't the first attribute on the tag. I dont have time to fix it because I should already be out the door :)

In truth you should be looking to a html parsing library such as HtmlAgilityPack to parse it (if you are working in .net):

rtpHarry
I am not working in any particular language. I am just using that editor and try to wrap the images with a hyperlink in a html document. But your code snippets should be useful when I work in .NET. Thank you.
bobo