tags:

views:

1043

answers:

5

I'm writing a specialized crawler and parser for internal use and I require the ability to take a screenshot of a web page in order to check what colours are being used throughout. The program will take in around ten web addresses and will save them as a bitmap image, from there I plan to use LockBits in order to create a list of the five most used colours within the image. To my knowledge it's the easiest way to get the colours used within a web page but if there is an easier way to do it please chime in with your suggestions.

Anyway, I was going to use this program until I saw the price tag. I'm also fairly new to C#, having only used it for a few months. Can anyone provide me with a solution to my problem of taking a screenshot of a web page in order to extract the colour scheme?

A: 

Here's an article that explains how to do it with a tool called IECapt:

Screenshot of Webpage with ASP.NET

Andy West
IECapt works, but it requires spawning a process which is slow
John JJ Curtis
+6  A: 

A quick and dirty way would be to use the WinForms WebBrowser control and draw it to a bitmap. Doing this in a standalone console app is slightly tricky because you have to be aware of the implications of hosting a STAThread control while using a fundamentally asynchronous programming pattern. But here is a working proof of concept which captures a web page to an 800x600 BMP file:

namespace WebBrowserScreenshotSample
{
    using System;
    using System.Drawing;
    using System.Drawing.Imaging;
    using System.Threading;
    using System.Windows.Forms;

    class Program
    {
        [STAThread]
        static void Main()
        {
            int width = 800;
            int height = 600;

            using (WebBrowser browser = new WebBrowser())
            {
                browser.Width = width;
                browser.Height = height;
                browser.ScrollBarsEnabled = true;

                // This will be called when the page finishes loading
                browser.DocumentCompleted += Program.OnDocumentCompleted;

                browser.Navigate("http://stackoverflow.com/");

                // This prevents the application from exiting until
                // Application.Exit is called
                Application.Run();
            }
        }

        static void OnDocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
            // Now that the page is loaded, save it to a bitmap
            WebBrowser browser = (WebBrowser)sender;

            using (Graphics graphics = browser.CreateGraphics())
            using (Bitmap bitmap = new Bitmap(browser.Width, browser.Height, graphics))
            {
                Rectangle bounds = new Rectangle(0, 0, bitmap.Width, bitmap.Height);
                browser.DrawToBitmap(bitmap, bounds);
                bitmap.Save("screenshot.bmp", ImageFormat.Bmp);
            }

            // Instruct the application to exit
            Application.Exit();
        }
    }
}

To compile this, create a new console application and make sure to add assembly references for System.Drawing and System.Windows.Forms.

UPDATE: I rewrote the code to avoid having to using the hacky polling WaitOne/DoEvents pattern. This code should be closer to following best practices.

UPDATE 2: You indicate that you want to use this in a Windows Forms application. In that case, forget about dynamically creating the WebBrowser control. What you want is to create a hidden (Visible=false) instance of a WebBrowser on your form and use it the same way I show above. Here is another sample which shows the user code portion of a form with a text box (webAddressTextBox), a button (generateScreenshotButton), and a hidden browser (webBrowser). While I was working on this, I discovered a peculiarity which I didn't handle before -- the DocumentCompleted event can actually be raised multiple times depending on the nature of the page. This sample should work in general, and you can extend it to do whatever you want:

namespace WebBrowserScreenshotFormsSample
{
    using System;
    using System.Drawing;
    using System.Drawing.Imaging;
    using System.IO;
    using System.Windows.Forms;

    public partial class MainForm : Form
    {
        public MainForm()
        {
            this.InitializeComponent();

            // Register for this event; we'll save the screenshot when it fires
            this.webBrowser.DocumentCompleted += 
                new WebBrowserDocumentCompletedEventHandler(this.OnDocumentCompleted);
        }

        private void OnClickGenerateScreenshot(object sender, EventArgs e)
        {
            // Disable button to prevent multiple concurrent operations
            this.generateScreenshotButton.Enabled = false;

            string webAddressString = this.webAddressTextBox.Text;

            Uri webAddress;
            if (Uri.TryCreate(webAddressString, UriKind.Absolute, out webAddress))
            {
                this.webBrowser.Navigate(webAddress);
            }
            else
            {
                MessageBox.Show(
                    "Please enter a valid URI.",
                    "WebBrowser Screenshot Forms Sample",
                    MessageBoxButtons.OK,
                    MessageBoxIcon.Exclamation);

                // Re-enable button on error before returning
                this.generateScreenshotButton.Enabled = true;
            }
        }

        private void OnDocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
            // This event can be raised multiple times depending on how much of the
            // document has loaded, if there are multiple frames, etc.
            // We only want the final page result, so we do the following check:
            if (this.webBrowser.ReadyState == WebBrowserReadyState.Complete &&
                e.Url == this.webBrowser.Url)
            {
                // Generate the file name here
                string screenshotFileName = Path.GetFullPath(
                    "screenshot_" + DateTime.Now.Ticks + ".png");

                this.SaveScreenshot(screenshotFileName);
                MessageBox.Show(
                    "Screenshot saved to '" + screenshotFileName + "'.",
                    "WebBrowser Screenshot Forms Sample",
                    MessageBoxButtons.OK,
                    MessageBoxIcon.Information);

                // Re-enable button before returning
                this.generateScreenshotButton.Enabled = true;
            }
        }

        private void SaveScreenshot(string fileName)
        {
            int width = this.webBrowser.Width;
            int height = this.webBrowser.Height;
            using (Graphics graphics = this.webBrowser.CreateGraphics())
            using (Bitmap bitmap = new Bitmap(width, height, graphics))
            {
                Rectangle bounds = new Rectangle(0, 0, width, height);
                this.webBrowser.DrawToBitmap(bitmap, bounds);
                bitmap.Save(fileName, ImageFormat.Png);
            }
        }
    }
}
bobbymcr
Sorry for the huge delay, the code seems to work well, but I am struggling with using it within a form I have. I'm probably doing something stupid, but if you could give me a hand with it it'd be very appreciated.
EnderMB
DrawToBitmap is not supported and will fail sometimes, leaving a blank black or blank white bitmap
John JJ Curtis
A: 

Just use http://www.websnapr.com. You can do 100,000 images / month. There is a tiny little watermark that should not influence your colors (if it does, just dont take the bottom right corner into account). The added benefit is that they have most popular url's cached, so you will get very fast response times.

You'll need to use HttpWebRequest to download the binary of the image. Here's an example:

    HttpWebRequest request = HttpWebRequest.Create("http://images.websnapr.com/?size=s&url=http%3A%2F%2Fwww.google.com") as HttpWebRequest;
    Bitmap bitmap;
    using (Stream stream = request.GetResponse().GetResponseStream())
    {
        bitmap = new Bitmap(stream);
    }
    // now that you have a bitmap, you can do what you need to do...

* I am not affiliated with www.websnapr.com *

John JJ Curtis
I'll give this method a try and see how it affects the data extraction aspect.
EnderMB
A: 

Check this out. This seems to do what you wanted and technically it approaches the problem in very similar way through web browser control. It seems to have catered for a range of parameters to be passed in and also good error handling built into it. The only downside is that it is an external process (exe) that you spawn and it create a physical file that you will read later. From your description, you even consider webservices, so I dont think that is a problem.

In solving your latest comment about how to process multiple of them simultaneously, this will be perfect. You can spawn say a parallel of 3, 4, 5 or more processes at any one time or have the analysis of the color bit running as thread while another capturing process is happening.

For image processing, I recently come across Emgu, havent used it myself but it seems fascinating. It claims to be fast and have a lot of support for graphic analysis including reading of pixel color. If I have any graphic processing project on hand right now I will give this a try.

Fadrian Sudaman
A: 

you may also have a look at QT jambi http://qt.nokia.com/doc/qtjambi-4.4/html/com/trolltech/qt/qtjambi-index.html

they have a nice webkit based java implementation for a browser where you can do a screenshot simply by doing sth like:

    QPixmap pixmap;
    pixmap = QPixmap.grabWidget(browser);

    pixmap.save(writeTo, "png");

Have a look at the samples - they have a nice webbrowser demo.

Marc