ansaurus

Question

Answer 1

A:

Check out this example: http://www.example-code.com/csharp/spider.asp (It was the first hit on Google).

I think writing such an appliction is quite useful to get more familiar with C# (as it seems that you want to write the application for training purposes).

0xA3 2009-01-16 13:24:18

Answer 2

+2 A:

You’re not looking for spidering, you’re looking for screen scraping.

Bombe 2009-01-16 13:24:32

Answer 3

A:

Because the browser simply renders the underlying content, the most flexible approach would be to parse the underlying content (html/css/js/whatever) yourself.

I would create a parsing engine that looks for the things your spider application needs.

This could be a basic string searching algorithm which looks for href="" for example and reads the values in order to produce new requests and continue spidering. Your engine could be written to only look for things it is interested in and extended in that way for more functionality.

Martin 2009-01-16 13:27:28

Answer 4

+2 A:

I'd have to agree with Bombe, it sounds more like you want HTML Screen Scraping. It requires lots of parsing, and if the page your scraping ever changes, your app will break, however here's a small example of how to do it:

WebClient webClient = new WebClient(); 
const string strUrl = "http://www.yahoo.com/"; 
byte[] reqHTML; 
reqHTML = webClient.DownloadData(strUrl); 
UTF8Encoding objUTF8 = new UTF8Encoding(); 
string html = objUTF8.GetString(reqHTML);

Now the html variable has the entire HTML in it, and you can start parsing away.

BFree 2009-01-16 14:33:33

ansaurus

tags:

views:

answers:

Creating a simple 'spider'

related questions