views:

60

answers:

2

I'd like to slurp the following data about historical inventions into a convenient Ruby data structure:

http://yootles.com/outbox/inventions.xml

Note that all the data is in the XML attributes.

It seems like there should be a quick solution with a couple lines of code. With Rails there'd be Hash.from_xml though I'm not sure that would handle the attributes properly. In any case, I need this as a standalone Ruby script. Nokogiri seems overly complicated for this simple task based on this code that someone posted for a similar problem: http://gist.github.com/335286. I found a purportedly simple solution using hpricot but it doesn't seem to handle the XML attributes. Maybe that's a simple extension? Finally there's ROXML but that looks even more heavyweight than nokogiri.

To make the question concrete (and with obvious ulterior motives), let's say that an answer should be a complete Ruby script that slurps the XML from the above URL and spits out CSV like this:

id, invention, year, inventor, country
RslCn, "aerosol can", 1926, "Erik Rotheim", "Norway"
RCndtnng, "air conditioning", 1902, "Willis Haviland Carrier", "US"
RbgTmtv, "airbag, automotive", 1952, "John Hetrick", "US"
RplnNgnpwrd, "airplane, engine-powered", 1903, "Wilbur and Orville Wright", "US"

I'll work on my own answer and post it too unless someone beats me to the punch with something clearly superior. Thanks!

+1  A: 

Using REXML and open-uri:

require "rexml/document"
require "open-uri"

doc = REXML::Document.new open( "http://yootles.com/outbox/inventions.xml" ).read

puts [ 'id', 'invention', 'year', 'inventor', 'country' ].join ','
doc.root.elements.each do |invention|
  inventor = invention.elements.first
  data = []
  data << invention.attributes['id']
  data << '"' + invention.attributes['name'] + '"'
  data << invention.attributes['year']
  data << '"' + inventor.attributes['name'] + '"'
  data << '"' + inventor.attributes['country'] + '"'
  puts data.join ','
end
John Drummond
A: 

It turned out to be simpler than I thought with Nokogiri:

require 'rubygems'
require 'nokogiri' # needs sudo port install libxslt and stuff; see nokogiri.org
require 'open-uri'

@url = 'http://yootles.com/outbox/inventions.xml'

doc = Nokogiri::XML(open(@url))
puts("id, invention, year, inventor, country")
doc.xpath("//invention").each{ |i| 
  inventor = i.xpath("inventor").first
  print i['id'], ", \"", i['name'], "\", ", i['year'], ", \"", 
  inventor['name'], "\", \"", inventor['country'], "\"\n"
}
dreeves
+1 for Nokogiri
Greg