views:

2150

answers:

3

Can anyone help me by explaining how to extract image urls from HTML File in C#

A: 

I think I need more information about your problem... But try to search for a regular expression containing "img src= ..>"

maybe something like '#<\s*img [^>]src\s=\s*(["\'])(.*?)\1#im'

You can read something about this problem in this article

http://zytzagoo.net/blog/2008/01/23/extracting-images-from-html-using-regular-expressions/

Jonathan
Regex is notoriously painful for handling the full set of xml/html possibilities.
Marc Gravell
+12  A: 

The HTML Agility Pack can do this - just use a query like //img and access the src - like so:

string html;
using (WebClient client = new WebClient()) {
    html = client.DownloadString("http://www.google.com");
}
HtmlDocument doc = new HtmlDocument();        
doc.LoadHtml(html);
foreach(HtmlNode img in doc.DocumentNode.SelectNodes("//img")) {
    Console.WriteLine(img.GetAttributeValue("src", null));
}
Marc Gravell
+1 I painfully wrote a SO vote counter/tag using Regex yesterday. This would have helped a lot.
Mehrdad Afshari
Wouldn't it be easier to use a regex?
Peter Wone
Will this only extract img elements that are children of the topmost node?
mirezus
@mirezus - no the // means any level
Marc Gravell
+1  A: 

You have to parse the HTML and check the img tag use the following link it includes C# library for parsing HTML tags i faced your problem b4 and i used this library and working well with me Parsing HTML tags

Ahmy