ansaurus

Question

HTML Scraping with Hpricot (Using Ruby on Rails)

Answer 1

+2 A:

Hi Erika,

It looks like the site is doing something with the User-Agent. If I change that property to match what my version of Firefox sends, I get the full response body. When I left the property as 'ruby', the response was incomplete. Not sure what the root cause is, but this seemed to alleviate the symptoms.

require 'rubygems'
require 'hpricot'
require 'open-uri'

doc = open("http://yellowpages.com.mt/Malta-Search/Radio-In-Malta-Gozo.aspx", 'User-Agent'=>'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2') { |f| Hpricot(f) }
puts doc.search('h6')

Hope this helps!

Eric 2009-11-10 00:54:51

Worked like a charm!Thank you so much!! you are a life saver <3

Erika 2009-11-10 01:41:42

ansaurus

tags:

views:

answers:

HTML Scraping with Hpricot (Using Ruby on Rails)

related questions