views:

519

answers:

3

Alright, so here's the dealio: I'm working on a Ruby app that'll take data from a website, and aggregate that data into an XML file.

The website I need to take data from does not have any APIs I can make use of, so the only thing I can think of is to login to the website, sequentially load the pages that have the data I need (in this case, PMs; I want to archive them), and then parse the returned HTML.

The problem, though, is that I don't know of any ways to programatically simulate a login session.

Would anyone have any advice, or know of any proven methods that I could use to successfully login to an https page, and then programatically load pages from the site using a temporary cookie session from the login? It doesn't have to be a Ruby-only solution -- I just wanna know how I can actually do this. And if it helps, the website in question is one that uses Microsoft's .NET Passport service as its login/session mechanism.

Any input on the matter is welcome. Thanks.

A: 

You can try use wget to fetch the page. You can analyse login process with this app www.portswigger.net/proxy/.

Maciek Sawicki
A: 

For what it's worth, you could check out Webrat. It is meant to be used a tool for automated acceptance tests, but I think you could use it to simulate filling out the login fields, then click through links by their names, and grab the needed HTML as a string. Haven't tried doing anything like it, tho.

Toms Mikoss
+5  A: 

Mechanize

Mechanize is ruby library which imititates the behaviour of a web browser. You can click links, fill out forms und submit them. It even has a history and remebers cookies. It seems your problem could be easily solved with the help of mechanize.

The following example is taken from http://mechanize.rubyforge.org/mechanize/:

a = WWW::Mechanize.new
a.get('http://rubyforge.org/') do |page|
  # Click the login link
    login_page = a.click(page.links.text(/Log In/))

    # Submit the login form
    my_page = login_page.form_with(:action => '/account/login.php') do |f|
    f.form_loginname  = ARGV[0]
    f.form_pw         = ARGV[1]
  end.click_button

  my_page.links.each do |link|
    text = link.text.strip
    next unless text.length > 0
    puts text
  end
end
johannes