tags:

views:

98

answers:

2

Suppose I have this structure:

<one>
   <two>
     <three>3</three>
   </two>

   <two>
     <three>4</three>
   </two>

   <two>
     <three>3</three>
   </two>
</one>

Is there anyway of getting to this :

<one>
  <two>
    <three>3</three>
  </two>

  <two>
    <three>4</three>
  </two>

</one>

using Ruby's libraries? I managed to get this using Nokogiri. From my tests, it appears to work, but maybe there's another approach, a better one.

A: 

This page explains XML parsing in Ruby a little bit http://developer.yahoo.com/ruby/ruby-xml.html

This page explains some of the reasons why you want to use a proper parser over something like regular expressions: http://htmlparsing.icenine.ca

At a glance, the approach you're using doesn't seem horrible.

genio
I'm using an XML parser here. Nokogiri.
Geo
Yea, I haven't ever used Nokogiri. I was meaning this answer to be more of a way to provide suggestions about other parsers I've actually heard of. I included the htmlparsing web site out of force of habit. I answer way too many markup parsing questions every day on IRC. Sorry. :)
genio
+3  A: 

How about one that does the whole thing in two lines?

seen = Hash.new(0)
node.traverse {|n| n.unlink if (seen[n.to_xml] += 1) > 1}

If there's a possibility of the same node appearing under two different parents, and you don't want those to be considered duplicates, you can change that second line to:

node.traverse {|n| n.unlink if (seen[(n.parent.path rescue "") + n.to_xml] += 1) > 1}
glenn mcdonald
Would you please paste the whole suggested solution?
khelll
Great solution! Seems I overthinked mine :D
Geo
That is the whole solution. Other than requiring Nokogiri and setting node=Nokogiri::XML(data), as in his example code.
glenn mcdonald
Initially, I thought that what you posted was some kind of pseudocode. Then I ran it, and it worked great. Knowing your libs definitely pays off in the long run. Thanks again!
Geo