views:

180

answers:

3

Hello,

I need to read data from an online database that's displayed using an aspx page from the UN. I've done HTML parsing before, but it was always by manipulating query-string values. In this case, the site uses asp.net postbacks. So, you click on a value in box one, then box two shows, click on a value in box 2 and click a button to get your results.

Does anybody know how I could automate that process?

Thanks,

Mike

+1  A: 

You may still only need to send one request, but that one request can be rather complicated. ASP.Net is notoriously difficult (though not impossible) to screen scrape. Between event validation and the ViewState, it's tricky to get your requests just right. The simplest way to do it is often to use a sniffer tool like fiddler to see exactly what the http request looks like, and then just mimic that request.

If you do still need to send two requests, it's because the first request also places some state in a session somewhere, and that means whatever you use to send those requests needs to be able to send them with the same session. This often means supporting cookies.

Joel Coehoorn
A: 

Watin would be my first choice. You would code the selecting and clicking, then parse the HTML after.

consultutah
+1  A: 

I'd look at HtmlAgilityPack with the FormProcessor addon.

Jon Galloway