views:

97

answers:

2

What is the fastest/shortest/one-liner (not possible :p) way to build a unique tree of elements from a tree where many of the elements are duplicated/missing in some nodes, given the tree has a defined set of nodes (which we'd use this algorithm to figure out so we don't have to manually do it).

It could be XML/JSON(hash), or whatever. So something like this:


root {
    nodes {
     nodeA {}
     nodeB {
      subNodeA {}
     }
    }
    nodes {
     nodeA {
      subNodeA {}
     }
     nodeB {
      subNodeX {}
     }
    }
}

...converted to this:


root {
    nodes {
     nodeA {
      subNodeA {}
     }
     nodeB {
      subNodeA {}
      subNodeX {}
     }
    }
}

Same with xml:


<root>
    <nodes>
     <nodeA/>
     <nodeB>
      <subNodeA/>
     </nodeB>
    </nodes>
    <nodes>
     <nodeA>
      <subNodeA/>
     </nodeA>
     <nodeB>
      <subNodeX/>
     </nodeB>
    </nodes>
</root>


<root>
    <nodes>
     <nodeA>
      <subNodeA/>
     </nodeA>
     <nodeB>
      <subNodeA/>
      <subNodeX/>
     </nodeB>
    </nodes>
</root>

The xml/json files could be decently large (1MB+), so having to iterate over every element depth-first or something seems like it would take a while. It could also be as small as the example above.

A: 

I believe it's solved here.

khelll
Nope. Similar, but different problems. He wants to do a merge between the first `<nodes>` element and the second `<nodes>` element.
Bob Aman
+3  A: 

This'll get you a set of unique paths:

require 'nokogiri'
require 'set'

xml = Nokogirl::XML.parse(your_data)
paths = Set.new
xml.traverse {|node| next if node.text?; paths << node.path.gsub(/\[\d+\]/,"").sub(/\/$/,"")}

Does that get you started?

[response to question in comment]

Adding attibute-paths is also easy, but let's go at least a little bit multi-line:

xml.traverse do |node|
  next if node.text?
  paths << (npath = node.path.gsub(/\[\d+\]/,"").sub(/\/$/,""))
  paths += node.attributes.map {|k,v| "#{npath}@#{k}"}
end
glenn mcdonald
Good answer, I love Set. Not used enough.
dalyons
no way, hard-frickin-core. that's it, I can't believe it, one line! thanks so much glenn. how would you add to it to include unique paths to all the attributes :)? /root/nodes/nodeA/@customAttribute.
viatropos
Answer added above, for formatting's sake.
glenn mcdonald