tags:

views:

162

answers:

3

I want to programatically download the contents of a web page, but that page is generated as the result of a POST and I cant seem to get it working.

This is the page: http://jp.translink.com.au/mobile/Input.aspx

You can enter the following values to see how it works:

From: Coorparoo Railway Station

To: Central Railway Station

I have monitored the traffic with tcpdump and have recreated it using code to the best of my ability. Here is the test code:

http = Net::HTTP.new("jp.translink.com.au", 80)
path = "/mobile/Input.aspx"

# GET request -> so the host can set his cookies
resp, data = http.get(path, nil)
cookie = resp.response['set-cookie']

viewstate = data.match(/"__VIEWSTATE" value="([^"]+)"/)[1]

# POST request -> logging in
data = "__VIEWSTATE=#{viewstate}&FromTextBox=mitchelton+railway+station&FromModeList=stopLandmark&ToTextBox=morayfield+railway+station&ToModeList=stopLandmark&VehicleList%3A1=on&HourList=11&MinuteList=40&NoonList=PM&DateList=0&goButton=Go%21"
headers = {
  'Cookie' => cookie,
  'Referer' => 'http://jp.translink.com.au/mobile/Input.aspx',
  'origin' => 'http://jp.translink.com.au',
  'Content-Type' => 'application/x-www-form-urlencoded',
  'User-Agent' => 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-us) AppleWebKit/530.19.2 (KHTML, like Gecko) Version/4.0.2 Safari/530.19',
  'Accept' => 'application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5',
  'Accept-Language' => 'en-us',
  'Accept-Encoding' => 'gzip, deflate'
}

resp, data = http.post(path, data, headers)

# Output on the screen -> we should get either a 302 redirect (after a successful login) or an error page
puts 'Code = ' + resp.code
puts 'Message = ' + resp.message
resp.each {|key, val| puts key + ' = ' + val}
puts data

I get response telling me to redirect to an error page. Does anyone know how to do this successfully?

EDIT: Thank you few that responded to my question. I below is the solution to my problem :)

require 'mechanize'
agent = WWW::Mechanize.new
initial_page = agent.get('http://jp.translink.com.au/mobile/Input.aspx')
initial_form = initial_page.form('InputForm')
initial_form.FromTextBox = 'Mitchelton Railway Station'
initial_form.radiobuttons_with(:name => 'FromModeList')[1].check
initial_form.ToTextBox = 'Morayfield Railway Station'
initial_form.radiobuttons_with(:name => 'ToModeList')[1].check
initial_form.checkbox_with(:name => 'VehicleList:0').uncheck
initial_form.checkbox_with(:name => 'VehicleList:2').uncheck
go_button = initial_form.buttons[0]
result_page = agent.submit(initial_form, go_button)
puts result_page.body
+1  A: 

Without looking into the details, your cookie is surely wrong. You're getting

Set-Cookie: ASP.NET_SessionId=2wo3lv455p2mbfimbmyyqoua; path=/

....and then should send that back without the part ; path=/, like:

Cookie: ASP.NET_SessionId=2wo3lv455p2mbfimbmyyqoua

EDIT: Also, where's your Content-Length?

EDIT 2: Where's your Host? You cannot do without the host, unless the web site would also work based on IP addresses only. Which, in this case, in fact it does... Nevertheless: compare the browsers headers to your own...

EDIT 3: You need to encode the value for VIEW_STATE.

(Note that, for example, the Firefox extension LiveHTTPHeaders may make live easier than tcpdump. Using the Replay option I can see that the cookie is not required, but the __VIEW_STATE indeed is. You'll also see it's encoded, and differs from the value received with the GET.)

Arjan
Whats wrong with my 'Cookie' => cookie in the headers collection?
Nippysaurus
Your `cookie = resp.response['set-cookie']` includes `; path=/` -- right?
Arjan
Ah, sorry, I didnt understand what you meant before, but I tried without the path bit and that didnt help anything. I have the Content-Length in there too. The response I get is the following:<html><head><title>Object moved</title></head><body> <h2>Object moved to <a href='/mobile/FunctionFailed.aspx?aspxerrorpath=/mobile/Input.aspx'>here</a>.</h2> </body></html>
Nippysaurus
+5  A: 

I wouldn't start using the Net library from scratch, there's plenty of gems out their that are custom built to do what you looking to do, have a look at something like mechanize or maybe Webrat or nokogiri.

Anything these days is scrapable, if you run into more serious problems (like ajax page content generation ) you may have to resort to driving an instance of a Browser programmatically - webrat integrates with Selenium, a testing tool that allows you to drive a browser from code and inspect the live browser dom. This approach is slow though so try mechanize first, it should be able to do what you want.

David Burrows
A: 

Go with mechanize. If that fails, then you can use firewatir to automate firefox from Ruby.

Geo