views:

374

answers:

3

What is the best way to build an XML tree in Ruby if you have an Array of string paths?


paths = [
  "nodeA1",
  "nodeA1/nodeB1/nodeC1",
  "nodeA1/nodeB1/nodeC1/nodeD1/nodeE1",
  "nodeA1/nodeB1/nodeC2",
  "nodeA1/nodeB2/nodeC2",
  "nodeA3/nodeB2/nodeC3"
]
xml = 
<nodeA1>
    <nodeB1>
     <nodeC1>
      <nodeD1>
       <nodeE1/>
      </nodeD1>
     </nodeC1>
     <nodeC2/>
    </nodeB1>
    <nodeB2>
     <nodeC2/>
     <nodeC3/>
    </nodeB2>
</nodeA1>

My first thought is to to split the path string to an array, and compare its depth and content to the previous array, but then if I get to path "nodeA1/nodeB1/nodeC1/nodeD1/nodeE1", when I go back down to "nodeA1/nodeB1/nodeC2", the [1] node is the common ancestor, but keeping track of that is messy, the way I've been doing it at least.

I would like to make it recursive also, so I could process each nest level in its own function, but haven't come to any semi-universal solution yet.

Any ideas or things you guys commonly do when you run into this problem?

Thanks! Lance

A: 

Looks like another version of this question.

So you could just define a tree structure and create the nodes for each string in the list. Then write an output method which prints out the tree as xml.

If you want to go without defining a tree structure, you have to make sure that the list is sorted as in your example. Then loop over the list and compare each line with the previous one:

  • For all nodes in the previous line that are not part of the current one, write a closing tag (in reverse order)
  • For all nodes in the current line that are not part of the previous line, write an opening tag.

This solution cannot produce self-closing tags ("<nodeE1/>") since this requires comparing with the previous and the next line.

And this solution is not recursive, but I think that the problem isn't a recursive one neither... (or I just didn't understand exactly, why you wanted a recursive function)

MartinStettner
+3  A: 

REXML is your friend! You're getting XPaths, so use 'em!

require 'rexml/document'

paths = [
  "nodeA1",
  "nodeA1/nodeB1/nodeC1",
  "nodeA1/nodeB1/nodeC1/nodeD1/nodeE1",
  "nodeA1/nodeB1/nodeC2",
  "nodeA1/nodeB2/nodeC2",
  "nodeA3/nodeB2/nodeC3"
]

x = REXML::Document.new
x.elements << "xml"

paths.each do |p|
  steps = p.split(/\//)
  steps.each_index do |i|
    unless REXML::XPath.first(x,"/xml/" + steps[0..i]*"/")
      REXML::XPath.first(x,"/xml/" + steps[0...i]*"/").elements << steps[i]
    end
  end
end
puts x.to_s

Note that your example data has both nodeA1 and nodeA3 at the top level, so I started with a root called "xml" here. If the "3" was a typo, and nodeA1 was really your root (as your sample XML output suggests), you can delete the 'x.elements << "xml"' line and change all the "/xml/"s to just "/".

glenn mcdonald
Very cool! I've been using nokogiri lately because there's so many negative things about REXML, but this makes me want to give it a shot. Why do you choose REXML over hpricot or nokogiri, or could you use em all together?
viatropos
Rexml ships with Ruby installation otherwise I believe you can use either hpricot or nokogiri
khelll
Right, I used REXML because you have it already, and it works fine for the task at hand. No judgment vs hpricot or nokogiri implied.
glenn mcdonald
+3  A: 

This is very similar to this question. Here's a modified version based upon sris's answer:

paths = [
  "nodeA1",
  "nodeA1/nodeB1/nodeC1",
  "nodeA1/nodeB1/nodeC1/nodeD1/nodeE1",
  "nodeA1/nodeB1/nodeC2",
  "nodeA1/nodeB2/nodeC2",
  "nodeA3/nodeB2/nodeC3"
]

tree = {}

paths.each do |path|
  current  = tree
  path.split("/").inject("") do |sub_path,dir|
    sub_path = File.join(sub_path, dir)
    current[sub_path] ||= {}
    current  = current[sub_path]
    sub_path
  end
end

def make_tree(prefix, node)
  tree = ""
  node.each_pair do |path, subtree| 
    tree += "#{prefix}<#{File.basename(path)}"
    if subtree.empty?
      tree += "/>\n"
    else
      tree += ">\n"
      tree += make_tree(prefix + "\t", subtree) unless subtree.empty?
      tree += "#{prefix}</#{File.basename(path)}>\n"
    end
  end
  tree
end

xml = make_tree "", tree
print xml


Edit:

Here is a modified version that builds an actual XML document using Nokogiri. I think it's actually easier to follow than the string version. I also removed the use of File, because you don't actually need it to meet your needs:

require 'nokogiri'

paths = [
  "nodeA1",
  "nodeA1/nodeB1/nodeC1",
  "nodeA1/nodeB1/nodeC1/nodeD1/nodeE1",
  "nodeA1/nodeB1/nodeC2",
  "nodeA1/nodeB2/nodeC2",
  "nodeA3/nodeB2/nodeC3"
]

tree = {}

paths.each do |path|
  current  = tree
  path.split("/").each do |name|
    current[name] ||= {}
    current  = current[name]
  end
end

def make_tree(node, curr = nil, doc = Nokogiri::XML::Document.new)
  #You need a root node for the XML.  Feel free to rename it.
  curr ||= doc.root = Nokogiri::XML::Node.new('root', doc)
  node.each_pair do |name, subtree|
      child = curr << Nokogiri::XML::Node.new(name, doc)
      make_tree(subtree, child, doc) unless subtree.empty?
  end
  doc
end

xml = make_tree tree
print xml


Edit 2:

Yes, it is true that in Ruby 1.8 hashes aren't guaranteed to maintain insertion order. If that's an issue, there are ways to work around it. Here's a solution that retains order but doesn't bother with recursion and is much simpler for it:

require 'nokogiri'

paths = [
  "nodeA1",
  "nodeA1/nodeB1/nodeC1",
  "nodeA1/nodeB1/nodeC1/nodeD1/nodeE1",
  "nodeA1/nodeB1/nodeC2",
  "nodeA1/nodeB2/nodeC2",
  "nodeA3/nodeB2/nodeC3"
]

doc = Nokogiri::XML::Document.new
doc.root = Nokogiri::XML::Node.new('root', doc)

paths.each do |path|
  curr = doc.root
  path.split("/").each do |name|
    curr = curr.xpath(name).first || curr << Nokogiri::XML::Node.new(name, doc)
  end
end

print doc
Pesto
this looks interesting, I'll try that out. The only thing is that I would like to use the xml libraries instead of using strings.
viatropos
I have modified it to create an actual Nokogiri XML document. Hope that helps.
Pesto
If you're doing this in Ruby 1.8x, creating the intermediate hash loses the order of the children below any given parent. So if order matters, this solution only works in 1.9.
glenn mcdonald
That's not a requirement of the question, but it's a fair point.
Pesto
That Edit 2 version is nice. Much faster than mine, even ignoring the REXML/Nokogiri difference.
glenn mcdonald
Edit 2 is super sick, man. I'm gonna try that out immediately.
viatropos