views:

239

answers:

2

I am trying to get results for a batch of queries to this demographics tools page: http://adlab.microsoft.com/Demographics-Prediction/DPUI.aspx

The POST action on the form calls the same page (_self) and is probably posting some event data. I read on another post here at stackoverflow that aspx pages typically need some viewstate and validation data. Do I simply save these from a request, re-send in a POST request?

Or is there a cleaner way to do this? One of those aspx viewstate parameters is about a 1000 characters and the incredible ugliness of pasting that into my code makes me think there HAS to be a better way. Any and all references to stuff I can read up will be helpful, thanks!

A: 

Perhaps mechanize may be of use.

Ignacio Vazquez-Abrams
Thanks for the suggestion. I tried mechanize and got some html parsing errors. Looking into whether I can run the page through lxml or Beautifulsoup to clean it up and push back into Browser()
Cygorger
A: 

Use urllib2. Your POST data is a simple Python dictionary. Very easy to edit and maintain.

If your form contains hidden fields -- some of which are encoded -- then you need to do a GET to get the form and the various hidden field seed values.

Once you GET the form, you can add the necessary input values to the given, hidden values and POST the response back again.

Also, you'll have to be sure that you handle any cookies. urllib2 will help with that, also.

After all, that's all a browser does, and it works in a browser. Browser's don't know ASPX from CGI from WSGI, so there's no magic because it's ASPX. You sometimes have to do a GET before a POST to get values and cookies set up properly.

S.Lott
Hi, I am using urllib2 currently, but it isn't entirely clear to me what variables I should include in the POST data. I wonder if this phrasing is right: What do .aspx pages usually need?
Cygorger
@Cygorger: There's no "usually". You have to go to the page, view source to see what the form is, and then work out what's required. If it works through Javascript, you have to read the Javascript.
S.Lott
S.lott: Thanks for your reply. I should clarify what I mean by that. Is there anything different I would need to do with an aspx page compared to , say, an html page with Javascript. I am not sure about this, but the POST query here seems to rely on sending some hidden values back to the server, and whether I can simple store these from an earlier call and use again (this doesn't seem to work).This post seemed relevant: http://stackoverflow.com/questions/1480356/how-to-submit-query-to-aspx-page-in-python
Cygorger