ansaurus

Question

Answer 1

A:

Ruby should throw an exception if it can't handle HTTPS. At least, it should. Maybe the website is compressing the XML and maybe you need to uncompress before parsing it? See what headers are returned when you try to access the XML. If you are using Firefox, try using HttpLiveHeaders.

sri 2009-10-21 04:57:00

Answer 2

A:

Interesting idea to check headers. The successful browser sequence shows this from HttpLiveHeaders:

https://secure.somesite.com/my/account/download_transactions.php?&amp;type=xml

GET /my/account/download_transactions.php?type=xml HTTP/1.1
Host: secure.somesite.com
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: <obscured>

HTTP/1.x 200 OK
Date: Wed, 21 Oct 2009 13:13:08 GMT
Server: Apache/2.2
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: must-revalidate, post-check=0,pre-check=0
Pragma: public
Content-Disposition: attachment; filename=stuff.xml
Connection: close
Transfer-Encoding: chunked
Content-Type: application/xml

I've tried to match all the HTTP header bits by literally cutting and pasting the "accepts" from the above them into my request, but the XML file returned is still screwed up.

A hexdump of the returned response from my code shows a lot of 00x and FFx, and the words "root" and "entry" near each other. A WireShark dump of the unsuccessful ruby sequence is less helpful since it shows the SSL-encoded Application Data. But clearly a chunk of data is being returned.

START DUMP
00000000: d0 cf 11 e0 a1 b1 1a e1 - 00 00 00 00 00 00 00 00  ................
00000010: 00 00 00 00 00 00 00 00 - 3b 00 03 00 fe ff 09 00  ........;.......
00000020: 06 00 00 00 00 00 00 00 - 00 00 00 00 01 00 00 00  ................
00000030: 04 00 00 00 00 00 00 00 - 00 10 00 00 00 00 00 00  ................
00000040: 01 00 00 00 fe ff ff ff - 00 00 00 00 05 00 00 00  ................
00000050: ff ff ff ff ff ff ff ff - ff ff ff ff ff ff ff ff  ................
00000060: ff ff ff ff ff ff ff ff - ff ff ff ff ff ff ff ff  ................
00000070: ff ff ff ff ff ff ff ff - ff ff ff ff ff ff ff ff  ................
... and so on... non 00 and FF's appear much further down.

I'm not sure what to try next. Any suggestions?

RubyNube 2009-10-22 05:15:26

Answer 3

A:

Fixed the problem myself. Turns out this particular site does not seem to use "basic authentication", and I was required to execute a specific login screen to produce a usable cookie. I also simplified the solution by using "Mechanize", a gem that handles much of the leg-work of HTTP activity.

require 'rubygems'
require 'mechanize'

login_username = "theusername"
login_password = "thepassword"

# get login page
agent = WWW::Mechanize.new
agent.user_agent_alias = 'Mac Safari'
page = agent.get('https://somesite.com/login.php')

# fill out login form and submit
form = page.forms[0] # use first form on page
form['form[username]'] = login_username
form['form[password]'] = login_password
page = agent.submit(form)

# process returned page 
if page.uri.to_s.include?("login") 
  puts '---- LOGIN FAILED ----'
else
  puts '---- LOGIN SUCCESSFUL ----'
  xml_data = agent.get('https://secure.somesite.com:443/download_transactions.php?type=xml')
  puts xml_data.body
end

The thing that threw me was the way to set the form fields, which for some reason were different than the examples I've seen doing this.

RubyNube 2009-10-26 05:39:48

ansaurus

tags:

views:

answers:

Can't get XML data via HTTPS with Ruby

related questions