views:

66

answers:

2

I've successfully used ruby (1.8) and nokogiri's css parsing to pull out front facing data from web pages.

However I now need to pull out some data from a series of pages where the data is in the "meta" tags in the source code of the page.

One of the lines I need is the following:

<meta name="geo.position" content="35.667459;139.706256" />

I've tried using xpath put haven't been able to get it right.

Any help as to what syntax is needed would be much appreciated.

Thanks

+1  A: 

This is a good case for a CSS attribute selector. For example:

doc.css('meta[name="geo.position"]').each do |meta_tag|
  puts meta_tag['content'] # => 35.667459;139.706256
end

The equivalent XPath expression is almost identical:

doc.xpath('//meta[@name = "geo.position"]').each do |meta_tag|
  puts meta_tag['content'] # => 35.667459;139.706256
end
Jordan
Wow thanks I had no idea you could use the css selector for meta tags too.If I wanted to get the lat/long from the js does the same apply? <script type="text/javascript"> //<![CDATA[ function onLoad() { if (GBrowserIsCompatible()) { var map = new GMap2(document.getElementById("map")); map.addControl(new GSmallMapControl()); var point1 = new GLatLng(35.667459, 139.706256); map.setCenter(point1, 15, G_NORMAL_MAP); var marker = new GMarker(point1,{clickable:false}); map.addOverlay(marker); } } //]]> </script>
rollbahn
No, Nokogiri doesn't do Javascript. You could extract the Javascript from the HTML using Nokogiri, then use a regex to get the lat/long. ` doc.at('script').content[/GLatLng\\(([^)]+)\\)/,1] # => "35.667459, 139.706256"` for instance.
Greg
Aha okay awesome thanks very much for your help - that has really made things much clearer.
rollbahn
+1  A: 
require 'nokogiri'

doc = Nokogiri::HTML('<meta name="geo.position" content="35.667459;139.706256" />')
doc.at('//meta[@name="geo.position"]')['content'] # => "35.667459;139.706256"
Greg