views:

71

answers:

3

I’m working on Web Scraping using C# HttpWebRequest/HttpWebResponse. For the most part this process has gone smoothly. But after POSTing my way through several pages, I have gotten stuck with what seems to be an inconsistency between testing with the Web Browser and the HttpWebRequest/HttpWebResponse calls.

The problem occurs when I land on a page containing an input element that has a name similar to this: “RidiculouslyLongInputName.RidiculouslyLongInputName.RidiculouslyLongInputName.@RidiculouslyLong”

POSTing a value for this input element causes a 500 error when using HttpWebRequest but works fine when POSTing through the browser. If I remove this input value from the post data the the HttpWebRequest will not get the 500 error. But then I'm stuck with a data validate issue from the website.

Any idea on why HttpWebRequest is failing?

+1  A: 

It's times like these when packet sniffers come in extremely useful for seeing exactly what kind of data is flowing through and what the difference is.

http://www.wireshark.org/

Is a great tool for things like this.

Filter down to only the domains you're interested in, then send off the packet with HttpWebRequest. Save the packet data somewhere. Repeat but do the request through the browser. Check the difference.

If it is indeed an issue with POST variables, it should be evident in the HTTP payload.

Jamie Wong
A: 

Not sure why you are running into the problem, but I would recommend grabbing a copy of Fiddler and taking a look at what the browser is sending in the POST request. It is possible there is something less than obvious going on.

ckramer
A: 

You can also use Firebug extension with Firefox. With this extension installed and enabled, go through the entire scenario in Firefox. FIrebug will tell you the exact request/response sent by the browser. You can then duplicate that as much as possible using HttpWebRequest

feroze