views:

5

answers:

0

A friend of mine wants to collect data from an website. I recommended that spidering would be a fast way of automating the process. But when I saw the website, I found that it wasn't so simple at all.

First a login with a captcha thwarts most spidering software, there is no way that I can manually log in and use the cookie to get all other webpages.
Secondly, all pages are linked using <div onclick="window.open('/blahblah.asp?id=123')>,where ids are not consecutively incremented. This thwarts wget.
Finally, all the data are in pages that use hiding/showing div's for navigation.

Does anyone have an idea if there is a quick(dirty) solution to this?