ansaurus

Question

How can I use HTML Agility Pack to retrive all the images from a website?

Answer 1

+1 A:

You can do this using LINQ, like this:

var document = new HtmlWeb().Load(url);
var urls = document.DocumentNode.Descendants("img")
                                .Select(e => e.GetAttributeValue("src", null))
                                .Where(s => !String.IsNullOrEmpty(s));

EDIT: This code now actually works; I had forgotten to write document.DocumentNode.

SLaks 2010-01-21 23:56:47

What object type is document in your example? I can't use the .Descendants method. Please check my edit.

Sergio Tapia 2010-01-22 00:01:00

I forgot to include `.DocumentNode`.

SLaks 2010-01-22 00:09:21

also check you are using the latest beta as the linq functionality is new

rtpHarry 2010-04-06 23:06:02

Answer 2

A:

Based on their one example, but with modified XPath:

 HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm");
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//img")
 {
    image_links[] = link["src"];
 }

I don't know this extension, so I'm not sure how to write out the array to somewhere else, but that will at least get you your data. (Also, I don't define the array correctly, I'm sure. Sorry).

Edit

Using your example:

public void GetAllImages()
    {
        WebClient x = new WebClient();
        string source = x.DownloadString(@"http://www.google.com");

        HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
        document.Load(source);

        foreach(HtmlNode link in document.DocumentElement.SelectNodes("//img")
        {
          image_links[] = link["src"];
       }


    }

Anthony 2010-01-22 00:04:44

ansaurus

tags:

views:

answers:

How can I use HTML Agility Pack to retrive all the images from a website?

Edit

related questions