I'm trying to parse a webpage using open-uri + hpricot but it seems to be a problem in the parsing proccess as the gems don't bring me the things I want.
Specifically I want to get this div (whose id is 'pasajes') in this url:
I write this code:
require 'nokogiri'
require 'hpricot'
require 'open-uri'
document = Hpricot(open('http://www.despegar.com.ar/')) # WITH HPRICOT
document2 = Nokogiri::HTML(open('http://www.despegar.com.ar/')) # WITH NOKOGIRI
pasajes = document.search("//div[@id='pasajes']")
pasajes2 = document2.xpath("//div[@id='pasajes']")
But it bring NOTHING! I've tried lot of things in both hpricot and nokogiri:
- I try giving the absolute path to that div
- I try CSS path with selectors
- I try with hpricot search shortcut (doc//"div#pasajes")
- Almost every posible relative path to reach the 'pasajes' div
Finally i found a horrible solution. I have used the watir library and after open a web browser, i have passed the html to hpricot. In this way hpricot DO RECOGNIZE the 'pasajes' div. But i don't want just to open a web-browsere only for parsing purposes...
What I'm doing wrong? Is open-uri working bad? Is hpricot?