ansaurus

Question

web scraping to fill out (and retrieve) search forms?

Answer 1

A:

WebRequest req = WebRequest.Create("http://www.URLacceptingPOSTparams.com");

req.Proxy = null;
req.Method = "POST";
req.ContentType = "application/x-www-form-urlencoded";

//
// add POST data
string reqString = "searchtextbox=webclient&searchmode=simple&OtherParam=???";
byte[] reqData = Encoding.UTF8.GetBytes (reqString);
req.ContentLength = reqData.Length;
//
// send request
using (Stream reqStream = req.GetRequestStream())
  reqStream.Write (reqData, 0, reqData.Length);

string response;
//
// retrieve response
using (WebResponse res = req.GetResponse())
using (Stream resSteam = res.GetResponseStream())
using (StreamReader sr = new StreamReader (resSteam))
  response = sr.ReadToEnd();

// use a regular expression to break apart response
// OR you could load the HTML response page as a DOM

(Adapted from Joe Albahri's "C# in a nutshell")

Mitch Wheat 2009-07-23 07:24:44

Thank you - good to know it is possible! ...I am guessing. (not too familiar with .NET, though I hear it is all the rage...)

Stephen 2009-07-23 19:42:08

Answer 2

+2 A:

Hey Stephen,

Beautiful Soup is great for parsing webpages- that's half of what you want to do. Python, Perl, and Ruby all have a version of Mechanize, and that's the other half:

http://wwwsearch.sourceforge.net/mechanize/

Mechanize let's you control a browser:

# Follow a link
browser.follow_link(link_node)

# Submit a form
browser.select_form(name="search")
browser["authors"] = ["author #1", "author #2"]
browser["volume"] = "any"
search_response = br.submit()

With Mechanize and Beautiful Soup you have a great start. One extra tool I'd consider is Firebug, as used in this quick ruby scraping guide:

http://www.igvita.com/2007/02/04/ruby-screen-scraper-in-60-seconds/

Firebug can speed your construction of xpaths for parsing documents, saving you some serious time.

Good luck!

mixonic 2009-07-23 12:26:47

Great!! Thank you - very helpful!

Stephen 2009-07-23 19:40:58

Stephen! Mark me an answer! I'm racing a co-worker to 100 points :-)

mixonic 2009-07-23 20:38:06

I'm trying! I just got an OpenID but it tells me I have to have 15 reputation to vote up?? Sorry, first time on stackoverflow... is it this complicated?

Stephen 2009-07-24 05:42:52

Heh, Thanks Stephen. You can always pick an answer, but you need 10 points to vote things up.

mixonic 2009-07-24 11:00:06

Ah... sorry I could not do more, but your answer was super helpful!

Stephen 2009-07-24 13:06:23

I'll help you out a the vote.

BenMaddox 2009-07-26 17:35:47

ansaurus

tags:

views:

answers:

web scraping to fill out (and retrieve) search forms?

related questions