views:

476

answers:

4

I have a collection of stories in an XML format. I would like to parse the file and return each story as either hash or Ruby object, so that I can further manipulate the data within a Ruby script.

Does Nokogiri support this, or is there a better tool/library to use?

The XML document has the following structure, returned via Pivotal Tracker's web API:

<?xml version="1.0" encoding="UTF-8"?>
<stories type="array" count="145" total="145">
  <story>
    <id type="integer">16376</id>
    <story_type>feature</story_type>
    <url>http://www.pivotaltracker.com/story/show/16376&lt;/url&gt;
    <estimate type="integer">2</estimate>
    <current_state>accepted</current_state>
    <description>A description</description>
    <name>Receivable index listing will allow selection viewing</name>
    <requested_by>Tony Superman</requested_by>
    <owned_by>Tony Superman</owned_by>
    <created_at type="datetime">2009/11/04 15:49:43 WST</created_at>
    <accepted_at type="datetime">2009/11/10 11:06:16 WST</accepted_at>
    <labels>index ui,receivables</labels>
  </story>
  <story>
    <id type="integer">17427</id>
    <story_type>feature</story_type>
    <url>http://www.pivotaltracker.com/story/show/17427&lt;/url&gt;
    <estimate type="integer">3</estimate>
    <current_state>unscheduled</current_state>
    <description></description>
    <name>Validations in wizards based on direction</name>
    <requested_by>Matthew McBoggle</requested_by>
    <created_at type="datetime">2009/11/17 15:52:06 WST</created_at>
  </story>
  <story>
    <id type="integer">17426</id>
    <story_type>feature</story_type>
    <url>http://www.pivotaltracker.com/story/show/17426&lt;/url&gt;
    <estimate type="integer">2</estimate>
    <current_state>unscheduled</current_state>
    <description>Manual payment needs a description field.</description>
    <name>Add description to manual payment</name>
    <requested_by>Tony Superman</requested_by>
    <created_at type="datetime">2009/11/17 15:10:41 WST</created_at>
    <labels>payment process</labels>
  </story>
  <story>
    <id type="integer">17636</id>
    <story_type>feature</story_type>
    <url>http://www.pivotaltracker.com/story/show/17636&lt;/url&gt;
    <estimate type="integer">3</estimate>
    <current_state>unscheduled</current_state>
    <description>The SMS and email templates needs to be editable by merchants.</description>
    <name>Notifications are editable by the merchant</name>
    <requested_by>Matthew McBoggle</requested_by>
    <created_at type="datetime">2009/11/19 16:44:08 WST</created_at>
  </story>
</stories>
+1  A: 

I think you can stick to this answer.

A simpler one can be found here.

khelll
+1  A: 

This xml is generated by Rails' ActiveRecord#to_xml method. If you are using rails, you should be able to use Hash#from_xml to parse it.

Paul McMahon
I'm not using Rails in this instance.
mlambie
+2  A: 

Kind of one-liner solution would be something like this:

# str_xml contains your xml
xml = Nokogiri::XML.parse(str_xml)
xml.search('//story').to_a.map{|node| node.children.inject({}){|a,c| a[c.name] = c.text if c.class == Nokogiri::XML::Element; a}}

which returns an array of hashes:

>> xml.search('//story').to_a.map{|node| node.children.inject({}){|a,c| a[c.name] = c.text if c.class == Nokogiri::XML::Element; a}}
=> [{"id"=>"16376", "story_type"=>"feature", "url"=>"http://www.pivotaltracker.com/story/show/16376", "estimate"=>"2", "current_state"=>"accepted", "description"=>"A description", "name"=>"Receivable index listing will allow selection viewing", "requested_by"=>"Tony Superman", "owned_by"=>"Tony Superman", "created_at"=>"2009/11/04 15:49:43 WST", "accepted_at"=>"2009/11/10 11:06:16 WST", "labels"=>"index ui,receivables"}, {"id"=>"17427", "story_type"=>"feature", "url"=>"http://www.pivotaltracker.com/story/show/17427", "estimate"=>"3", "current_state"=>"unscheduled", "description"=>"", "name"=>"Validations in wizards based on direction", "requested_by"=>"Matthew McBoggle", "created_at"=>"2009/11/17 15:52:06 WST"}, {"id"=>"17426", "story_type"=>"feature", "url"=>"http://www.pivotaltracker.com/story/show/17426", "estimate"=>"2", "current_state"=>"unscheduled", "description"=>"Manual payment needs a description field.", "name"=>"Add description to manual payment", "requested_by"=>"Tony Superman", "created_at"=>"2009/11/17 15:10:41 WST", "labels"=>"payment process"}, {"id"=>"17636", "story_type"=>"feature", "url"=>"http://www.pivotaltracker.com/story/show/17636", "estimate"=>"3", "current_state"=>"unscheduled", "description"=>"The SMS and email templates needs to be editable by merchants.", "name"=>"Notifications are editable by the merchant", "requested_by"=>"Matthew McBoggle", "created_at"=>"2009/11/19 16:44:08 WST"}]

However, this ignores all XML attributes, but you haven't said what to do with them anyway... ;)

Mladen Jablanović
+2  A: 

You can leverage the Hash extensions in ActiveSupport. Then you just need to parse your document in Nokogiri and then convert the nodeset result into a hash. This method will preserve attribute typing (eg integers, dates, arrays). (Of course if you're using Rails you don't have to require/include active support or nokogiri if you have it in your environment. I'm assuming a pure Ruby implementation here.)

require 'rubygems'
require 'nokogiri'
require 'activesupport'

include ActiveSupport::CoreExtensions::Hash

doc = Nokogiri::XML.parse(File.read('yourdoc.xml'))
my_hash = doc.search('//story').map{ |e| Hash.from_xml(e.to_xml)['story'] }

This will produce an array of hashes (one for each story node), and preserve the typing based on the attributes, as demonstrated below:

my_hash.first['name']
=> "Receivable index listing will allow selection viewing"

my_hash.first['id']
=> 16376

my_hash.first['id'].class
=> Fixnum

my_hash.first['created_at'].class
=> Time
Nicholas C