WebCrawling Dynamic Links

views:

answers:

WebCrawling Dynamic Links

Hi Everyone,

Anybody has any idea on crawling websites that have dynamic pages/queries? I mean if I click a certain link, it has different values every I try to reload it in a web browser. Now my webcrawler could not download the contents of these pages. Please advise.

You might want to look at this question which details how to write a crawler or look at the source code for http://searcharoo.net/ which contains a good crawler (see here).

Kane 2010-05-04 08:35:16

Hi Kane, thanks for your reply and Searcharoo is interesting however, if there's anyone out there who can pinpoint how this (how to download pages from dynamic links) can be done, that can be of big help. Looking at the codes of Searcharoo, I might take some time to understand their architecture.

Jojo 2010-05-04 08:49:09

+1 A:

it would be the same way even it is dynamic or not. actually a crawler is only a mater of 3 things

The url
The data it sent to server if it is a POST Method then
The cookie if authentication is required

that's all,

the common problem when doing crawler:

Miss-guess of default page [index.html, index.php, default.aspx etc].. actually it will work without it for all method [POST/GET]
One of each field name is not written exactly
ASP.Net form viewstate id field (i forgot the name) but i can be achieve easily
Dynamic page generated by javascript. this one is the hardest part and the most cases even google still have problem about this.

hope that help.

ktutnik 2010-08-08 13:08:24

ansaurus

tags:

views:

answers:

WebCrawling Dynamic Links

related questions