ansaurus

Question

Html / Script Scraping Google Map using Hpricot (Ruby On Rails)

Answer 1

+1 A:

This type of screen scraping won't work because you're trying to grab elements that are added to the page dynamically after the page's HTML has been sent to the browser. In this case, the browser is hpricot, and all it's seeing is the content as sent from the server, rather than the content after the page's javascript has been run.

The reason that Firebug can see the elements you're trying to grab is that Firebug analyzes the current state of a page in the browser, which includes the dynamic scripty goodness from Google Maps.

Tim S. Van Haren 2009-11-10 18:05:44

Answer 2

+4 A:

This was a fun one. It can be done, but it's going to take more that hpricot. I noticed while sniffing that a webservice is being called to populate the latitude and longitude. Here's what you can do to get to that information:

Scrape the site like you're normally doing, but look for a call to the LoadMap javascript function. The line will look something like:

<script type='text/javascript'>LoadMapByDetail(1668154, 0, 1)</script>

Parse the id out and call the webservice. This will end up looking something like:

require 'rubygems'
require 'hpricot' 
require 'open-uri' 
require 'soap/wsdlDriver'

WSDL_URL="http://yellowpages.com.mt/Web_Service/SearchMap.asmx?WSDL" 
soap = SOAP::WSDLDriverFactory.new(WSDL_URL).create_rpc_driver 
response = soap.GetCoordByDetail(:mainDetailID => '1668154', :type => '1')
soap.reset_stream response.getCoordByDetailResult.anyType.each { |x| puts x.anyType }

You see the latitude and longitude in the output:

35.88805
14.46627

Hope this helps. Good luck!

Eric 2009-11-10 19:48:35

you are seriously a genius Eric! thank you so much, i wouldn't have arrived to a solution without your help. Thanks once again

Erika 2009-11-10 23:41:42

ansaurus

tags:

views:

answers:

Html / Script Scraping Google Map using Hpricot (Ruby On Rails)

related questions