Your example does not really give a well defined definition of what should be returned. Are text nodes ignored? What happens if an element has multiple <div>
child elements? What happens if the outer <div>
element has an attribute named children
?
In addition to that, you probably shouldn't build a structure like this if you have a way of using the built-in data structure of the XML/HTML parsing library of your choice, and using XPath queries to arrive at the data nodes you want.
Disregarding all of the above, here is a simple start that may come close to what you have in mind.
require "nokogiri"
class Nokogiri::XML::Node
def to_hash
# Build hash of attributes. Attribute values are split into arrays.
contents = Hash[attributes.collect { |name, value|
[name.to_sym, value.to_s.split(/\s+/)] }]
# Add array of child hashes recursively.
if element_children.any?
contents[:children] = element_children.collect { |child| child.to_hash }
end
# Return new hash with the element name as single key.
{ name.to_sym => contents }
end
end
Use as follows:
doc = Nokogiri::XML('<div class="class-1 class-2" id="my-id">
<div class="classy">
</div>
</div>')
doc.root.to_hash
#=> { :div =>
# { :class => ["class-1", "class-2"],
# :children =>
# [ { :div =>
# { :class => ["classy"] }
# } ],
# :id => ["my-id"]
# }
# }