views:

139

answers:

1

What is the fastest, one-liner/shortest way to get an Array of "strings/that/are/paths" from an XML file, using Nokogiri preferably. I'd like to build the array with an arbitrary attribute name ('id' in this case), but also knowing how to do it for the element name would be helpful.

So this:


<root id="top">
    <nodeA id="almost_top">
     <nodeB id="a_parent">
      <nodeC id="im_a_node"/>
      <nodeD id="child_node"/>
     </nodeB>
     <nodeB id="child"/>
    </nodeA>
</root>

to this:


[
  "top",
  "top/almost_top",
  "top/almost_top/a_parent",
  "top/almost_top/a_parent/im_a_node",
  "top/almost_top/a_parent/child_node",
  "top/almost_top/child"
]

Thanks so much.

+2  A: 

Not exactly one-liner and not exactly sure how fast, but this should work:

require 'nokogiri'

s = '<root id="top">
    <nodeA id="almost_top">
        <nodeB id="a_parent">
                <nodeC id="im_a_node"/>
                <nodeD id="child_node"/>
        </nodeB>
        <nodeB id="child"/>
    </nodeA>
</root>'

xml = Nokogiri::XML.parse s

def node_list elem, &proc
  return [] unless elem.class == Nokogiri::XML::Element
  str = proc.call(elem)
  [str] + elem.children.inject([]){|a,c| a+node_list(c,&proc)}.map{|e| "#{str}/#{e}"}
end

puts node_list(xml.root){|e| e['id']}.inspect
puts node_list(xml.root){|e| e.name}.inspect

which outputs:

jablan@jablan-hp:~/dev$ ruby traverse_xml.rb 
["top", "top/almost_top", "top/almost_top/a_parent", "top/almost_top/a_parent/im_a_node", "top/almost_top/a_parent/child_node", "top/almost_top/child"]
["root", "root/nodeA", "root/nodeA/nodeB", "root/nodeA/nodeB/nodeC", "root/nodeA/nodeB/nodeD", "root/nodeA/nodeB"]
Mladen Jablanović