views:

192

answers:

1

I am attempting to get some information from a website, the info that I need is located on the missouri.edu site (so it's publicly available). Here is the process that I need to accomplish: - Navigate to https://webapps.missouri.edu/ODDSearchEngine/oddsearch - search for a department name like "business" - Click any of the department names, like "Business College, Advancement" - I need to be able to programatically view the source of the page that is output after clicking "Business College, Advancement".

I would like to be able to get the source of each page for each department under business (or whatever department I put in, like "Accounting").

Is this possible with a Windows program? It looks like the "ODDSearchEngine" that runs this is a Java applet. I'm not sure how to interface with it to get the pages.

For reference, if you put the address into my existing program that is output by the ODDSearchEngine it returns the source code of the Search page with 2 "java.lang.NullPointerException" errors.

Is there an easy way to get this information through .Net?

+1  A: 

I recently used Watin for a similar task (but it required logging into and tracking a cookie). Watin basically simulates a user visiting a web site. It's probably overkill (and slow) for what you need.

Another alternative I played around with was HttpWebRequest/Response. This seems like it should satisfy your needs. You can also use HTML Agility Pack to work with the HTML you receive.

llamaoo7
Those all look like they might do the trick, thanks very much. I'm leaning toward HttpWebRequest/Response.
Pselus
Actually, none of them worked.I got the HttpWebRequest/Response stuff to get to the point of clicking the "Go" button for me, but from there if I try to get the sub-page ("Business College, Advancement") it still gives the same errors about java.lang.NullPointerException.WatiN might work, except I can't see how to get the source once I've gone through the "click" process. Not to mention that the page is so poorly formatted that there is no defining characteristic about the links to click other than that the address they point to has a different "ggid" at the end.
Pselus