tags:

views:

27

answers:

1

I've been looking around for libraries that will allow me to get a multidimensional hash of a given XHTML string.

XHTML:

<div class="class-1 class-2" id="my-id">
    <div class="classy">
    </div>
</div>

Expected Hash:

hash = {
:div => {
  :class => ['class-1', 'class-2'],
  :id => ['my-id'],
  :children => {
    :div => {
      :class => ['classy']
    }
  }
}
}
+1  A: 

Your example does not really give a well defined definition of what should be returned. Are text nodes ignored? What happens if an element has multiple <div> child elements? What happens if the outer <div> element has an attribute named children?

In addition to that, you probably shouldn't build a structure like this if you have a way of using the built-in data structure of the XML/HTML parsing library of your choice, and using XPath queries to arrive at the data nodes you want.

Disregarding all of the above, here is a simple start that may come close to what you have in mind.

require "nokogiri"

class Nokogiri::XML::Node
  def to_hash
    # Build hash of attributes. Attribute values are split into arrays.
    contents = Hash[attributes.collect { |name, value|
      [name.to_sym, value.to_s.split(/\s+/)] }]

    # Add array of child hashes recursively.
    if element_children.any?
      contents[:children] = element_children.collect { |child| child.to_hash }
    end

    # Return new hash with the element name as single key.
    { name.to_sym => contents }
  end
end

Use as follows:

doc = Nokogiri::XML('<div class="class-1 class-2" id="my-id">
    <div class="classy">
    </div>
</div>')

doc.root.to_hash
#=> { :div =>
#     { :class => ["class-1", "class-2"],
#       :children =>
#         [ { :div =>
#             { :class => ["classy"] }
#           } ],
#       :id => ["my-id"]
#     }
#   }
molf
Thanks for pointing me out to the right direction: here's the result: http://github.com/kuroir/Nokogiri-to-Hash
kuroir