views:

2482

answers:

4

Is there an easy way to convert a Nokogiri XML document to a Hash?

Something like Rails' Hash.from_xml.

+2  A: 

I use this code with libxml-ruby (1.1.3). I have not used nokogiri myself, but I understand that it uses libxml-ruby anyway. I would also encourage you to look at ROXML (http://github.com/Empact/roxml/tree) which maps xml elements to ruby objects; it is built atop libxml.

# USAGE: Hash.from_libxml(YOUR_XML_STRING)
require 'xml/libxml'
# adapted from 
# http://movesonrails.com/articles/2008/02/25/libxml-for-active-resource-2-0

class Hash 
  class << self
        def from_libxml(xml, strict=true) 
          begin
            XML.default_load_external_dtd = false
            XML.default_pedantic_parser = strict
            result = XML::Parser.string(xml).parse 
            return { result.root.name.to_s => xml_node_to_hash(result.root)} 
          rescue Exception => e
      # raise your custom exception here
          end
        end 

        def xml_node_to_hash(node) 
          # If we are at the root of the document, start the hash 
          if node.element? 
           if node.children? 
              result_hash = {} 

              node.each_child do |child| 
                result = xml_node_to_hash(child) 

                if child.name == "text"
                  if !child.next? and !child.prev?
                    return result
                  end
                elsif result_hash[child.name.to_sym]
                    if result_hash[child.name.to_sym].is_a?(Object::Array)
                      result_hash[child.name.to_sym] << result
                    else
                      result_hash[child.name.to_sym] = [result_hash[child.name.to_sym]] << result
                    end
                  else 
                    result_hash[child.name.to_sym] = result
                  end
                end

              return result_hash 
            else 
              return nil 
           end 
           else 
            return node.content.to_s 
          end 
        end          
    end
end
Ahsan Ali
Awesome! I just needed to change `= strict` to `= false`. Thanks!
Ivan
Hmmm, actually that code doesn't handle attributes...
Ivan
Ah ... Sorry about that, the files I've been working with don't have any attributes (legacy xml !).
Ahsan Ali
+6  A: 

I posted a modified version of the Ashan Ali's code which works with attributes and uses Nokogiri

dimus
+1  A: 

Have a look at the simple mix-in I made for the Nokogiri XML Node.

http://github.com/kuroir/Nokogiri-to-Hash

Here's a usage example:

require 'rubygems'
require 'nokogiri'
require 'nokogiri_to_hash'
html = '
  <div id="hello" class="container">
    <p>Hello! visit my site <a href="http://kuroir.com"&gt;Kuroir.com&lt;/a&gt;&lt;/p&gt;
  </div>
'
p Nokogiri.HTML(html).to_hash
=> [{:div=>{:class=>["container"], :children=>[{:p=>{:children=>[{:a=>{:href=>["http://kuroir.com"], :children=>[]}}]}}], :id=>["hello"]}}]
kuroir