Can anyone help me by explaining how to extract image urls from HTML File in C#
A:
I think I need more information about your problem... But try to search for a regular expression containing "img src= ..>"
maybe something like '#<\s*img [^>]src\s=\s*(["\'])(.*?)\1#im'
You can read something about this problem in this article
http://zytzagoo.net/blog/2008/01/23/extracting-images-from-html-using-regular-expressions/
Jonathan
2009-04-26 09:40:41
Regex is notoriously painful for handling the full set of xml/html possibilities.
Marc Gravell
2009-04-26 09:44:32
+12
A:
The HTML Agility Pack can do this - just use a query like //img and access the src - like so:
string html;
using (WebClient client = new WebClient()) {
html = client.DownloadString("http://www.google.com");
}
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
foreach(HtmlNode img in doc.DocumentNode.SelectNodes("//img")) {
Console.WriteLine(img.GetAttributeValue("src", null));
}
Marc Gravell
2009-04-26 09:43:28
+1 I painfully wrote a SO vote counter/tag using Regex yesterday. This would have helped a lot.
Mehrdad Afshari
2009-04-26 09:51:34
Will this only extract img elements that are children of the topmost node?
mirezus
2010-10-23 16:13:33
+1
A:
You have to parse the HTML and check the img tag use the following link it includes C# library for parsing HTML tags i faced your problem b4 and i used this library and working well with me Parsing HTML tags
Ahmy
2009-04-26 09:45:04