views:

97

answers:

1

I need to load a specific webpage from a site that has multiple images on the site. I need to extract these images but I can't do this manually because the names of each image have no pattern and there will be hundreds of sites. I have a silverlight application to load the webpage in an iframe and I intended on extracting the html for this webpage and then retrieving the image source for each image from the extracted code and then populating a listbox.

I can load the web page in iframe with no problem, but I don't know how to retrieve the html code for the webpage.

public Page()
    {
        InitializeComponent();

        System.Windows.Browser.HtmlElement myFrame = System.Windows.Browser.HtmlPage.Document.GetElementById("ifHtmlContent");
        if (myFrame != null)
        {
            myFrame.SetStyleAttribute("width", "1024");
            myFrame.SetStyleAttribute("height", "768");
            myFrame.SetAttribute("src", txtURI.Text);
            myFrame.SetStyleAttribute("left", "0");
            myFrame.SetStyleAttribute("top", "50");
            myFrame.SetStyleAttribute("visibility", "visible");            
        }
    }

    private void UserControl_Loaded(object sender, RoutedEventArgs e)
    {
        this.Button_Click(sender, e);
    }

    private void Button_Click(object sender, RoutedEventArgs e)
    {
        System.Windows.Browser.HtmlElement myFrame = System.Windows.Browser.HtmlPage.Document.GetElementById("ifHtmlContent");
        if (myFrame != null) myFrame.SetAttribute("src", txtURI.Text);
    }

    private void txtURI_KeyDown(object sender, KeyEventArgs e)
    {
        if (e.Key == Key.Enter)
            this.Button_Click(sender, e);
    }
A: 

The following article may offer some help: http://jesseliberty.com/2010/05/03/screen-scraping-when-all-you-have-is-a-hammer/

Klinger
Thanks, I had looked at that article, but it turns out screen scraping is difficult in silverlight because of cross-domain problems so you need to have the html file before you can parse it (ie need to run out of browser) which is not appropriate for my project.
Jeremy