tags:

views:

201

answers:

5

So say I have some html with an image tag like this:

<p> (1) some image is below:
<img src="/somwhere/filename_(1).jpg">
</p>

I want a regex that will just get rid of the parenthesis in the filename so my html will look like this:

<p> (1) some image is below:
<img src="/somwhere/filename_1.jpg">
</p>

Does anyone know how to do this? My programming language is C#, if that makes a difference...

I will be eternally grateful and send some very nice karma your way. :)

A: 

In this simple case, you could just use string.Replace, for example:

string imgFilename = "/somewhere/image_(1).jpg";
imgFilename = imgFilename.Replace("(", "").Replace(")", "");

Or do you need a regex for replacing the complete tag inside a HTML string?

AndiDog
i need to avoid replacing parentheis in the html body (other tags, text, etc) and ONLY remove the parenthesis when its inside the <img> tags src attribute.
fregas
Regex cannot perform that task. You would have to use an HTML parser.
bobince
+1  A: 

This (rather dense) regex should do it:

string s = Regex.Replace(input, @"(<img\s+[^>]*src=""[^""]*)\((\d+)\)([^""]*""[^>]*>)", "$1$2$3");
Nick Higgs
A: 
Regex.Replace(some_input, @"(?<=<\s*img\s*src\s*=\s*""[^""]*?)(?:\(|\))(?=[^""]*?""\s*\/?\s*?>)", "");

Finds ( or ) preceded by <img src =" and, optionally, text (with any whitespace combination, though I didn't include newline), and followed by optional text and "> or "/>, again with any whitespace combination, and replaces them with nothingness.

Jay
+1  A: 

I suspect your job would be much easier if you used the HTML Agility that can help you to do this instead of regex's judging from the answers, it will make parsing the HTML a lot easier for you to achieve what you are trying to do.

Hope this helps, Best regards, Tom.

tommieb75
This is what i ended up doing. The RegEx just wasn't working, and part of this might be because I had to do it through a 3rd party library. Instead I just grabbed all the records that had the html, pumped it into HtmlAgility, stripped out the junk from image as well as anchor tags, and it was all good.Thanks everybody.
fregas
+1  A: 

Nick's solution is fine if the file names always match that format, but this one matches any parenthesis, anywhere in the attribute:

s = Regex.Replace(@"(?i)(?<=<img\s+[^>]*\bsrc\s*=\s*""[^""]*)[()]", "");

The lookbehind ensures that the match occurs inside the src attribute of an img tag. It assumes the attribute is enclosed in double-quotes (quotation marks); if you need to allow for single-quotes (apostrophes) or no quotes at all, the regex gets much more complicated. I'll post that if you need it.

Alan Moore