tags:

views:

45

answers:

4

This regular expression is only returning one match. (I'm looking to retrieve all image sources/locations (such as 'folder/image.png' contained in the src attribute in the img html tag).

Sample input string:

input = @"<p>here is an image</p><img attr=""ahwer"" src=""~/Images/logo.png"" st=""abc""/><p>some more text here</p>";
            s += @"<p>test</p><img src=""a.jpg"" /><img src=""folder/image.png"" />"

Pattern

pattern = @"<img.*src=""([^""]*)"".*/>";

The MatchCollection count is always 1 (oddly, only the last match, in this case 'folder/image.png'. Whenever I change the pattern to simply 'img', it finds all three image tags. So, it's likely my regex pattern is incorrect. I'm no regex guru and would appreciate any help.

+3  A: 

Do not parse HTML using regular expressions.

Instead, you should use the HTML Agility Pack, like this:

var doc = new HtmlDocument();
doc.Load(path);  
//Or 
doc.Parse(source);

var paths = doc.DocumentElement.Descendants("img")
                               .Select(img => img.Attributes["src"].Value);
SLaks
Thanks for the link. However, I won't be doing major HTML manipulation so I'd rather not have to use a third party library.
Gabe
+1  A: 

Try pattern = @"<img.*?src=""([^""]*)"".*?/>"; - using .*? the matches should be non-greedy (i.e. not consume everything they can before matching vs. the next part).

Will A
This is it. Getting all three matches now. I thought of this just seconds before you posted it. Thanks!
Gabe
A: 

The interior of your regex is too permissive, and it allows the match to swallow all of the image tags in one go.

But really, you shouldn't try to use a regex to parse HTML. Madness lies that way...

JSBangs
A: 

Try the pattern

pattern = @"(?<=.src="")[\w\/\.~]+";
Millionbonus