views:

438

answers:

2

How would one go about creating a site that will log you into other sites and gather your data. For instance, how mint.com allows you to input all your online bank details and it gathers your data for viewing within Mint.

If someone could point me in the direction with some keywords or any scripts, it would be much appreciated.

+2  A: 

This really depends on what you are wanting to do. For example, Mint.com leverages, or did at one point in time, an SDK from a company called Yodlee. This SDK/Library uses a screen scraping technology to acquire the data on behalf of Mint.com's customers.

Jordan S. Jones
+2  A: 

In general you need to automate site access and parsing, aka scraping. There are usually two tricky areas to watch out for: 1) authentication 2) whatever you're scraping will typically require you to inspect its HTML closely while you determine what you're trying to accomplish.

I wrote a simple ruby app which scrapes and searches Apple's refurbished store a while back that you can check out here as an example (keep in mind it could certainly use improvement, but may get you going):

http://grapple.xorcyst.com

I've written similar stuff to grab data from my bank accounts (I'm not too keen on giving mint my credentials) using mechanize and hpricot, as well as job sites, used car dealerships etc, so it's flexible if you want to put in the effort.

It's a useful thing to do, but you need to be careful not to violate any use policies and the like.

Here's another quick example that grabs job postings to show you how simple it can be

#!/usr/bin/ruby

require 'rubygems'
require 'mechanize'
require 'hpricot'
require 'open-uri'

url = "http://tbe.taleo.net/NA2/ats/careers/jobSearch.jsp?org=DIGITALGLOBE&cws=1"
site = WWW::Mechanize.new { |agent| agent.user_agent_alias = 'Mac Safari' }
page = site.get(url)

search_form = page.form("TBE_theForm")
search_form.org = "DIGITALGLOBE"
search_form.cws = "1"
search_form.act = "search"
search_form.WebPage = "JSRCH"
search_form.WebVersion = "0"
search_form.add_field!('location','1') #5
search_form.add_field!('updatedWithin','2')

search_results = site.submit(search_form)
doc = Hpricot(search_results.body)

puts "<b>DigitalGlobe (Longmont)</b>"

doc.search("//a").each do |a|
  if a.to_s.rindex('rid=') != nil
    puts a.to_s.gsub('"','')
  end
end
Peter Elespuru