tags:

views:

571

answers:

8
+2  Q: 

xml diff in ruby?

What is the best/fastest way to merge two xml documents with ruby?

I have two xml files, one that's formatted so it is visually appealing, one that isn't (and it has comments and whitespaces stripped) that has a few changes to some of the nodes throughout, and it gets changed often. So I'm trying to figure out a simple and efficient solution to check what's changed (they may not all have IDs), and merge the old document with the formatted document.

+1  A: 

Are the changes only in the stripped file? In other words, is the visually appealing file a master file which only changes based on propagating the changes in the stripped file, or do both files get edited independently? If not both getting edited, can you just diff the stripped file against the last instance of itself and then applies those changes?

naven87
A: 

I've wanted similar functionality myself in the past (mainly for unit testing xml generation) but I've never found a good solution. I'd assume at some point you're going to want to compare two DOMs and look for differences.

You could maybe take a look at how this 'xml subset matcher' tool does things for inspiration.

Pete Hodgson
A: 

checking out the answers to this question may well help

Pete Hodgson
A: 

You are probably going to need to implement your own diffing logic. None of the XML parsing libraries for Ruby support document diffing. In trying to develop the logic you can look at the == operator for LibXML::XML::Node which allows comparison of two Node objects based on their XML representation.

LibXML API Docs

James Thompson
A: 

Ara Howard posted a snippet a few months ago for comparing XML documents: Comparing XML.

Andy Stewart
A: 

Would formatting the XML be an option?

require "rexml/document"
formatter = REXML::Formatters::Pretty.new( 2 )
xml = REXML::Document.new '<cheese><name>Stilton</name><weight>250</weight><expire_date>2009-12-25</expire_date></cheese>'
formatter.write( xml, $stdout )

# Outputs:
#<cheese>
#  <name>
#    Stilton
#  </name>
#  <weight>
#    250
#  </weight>
#  <expire_date>
#    2009-12-25
#  </expire_date>
#</cheese>

You could also use Chilkat Ruby XML component, it's freeware.

require 'chilkat'
xml = Chilkat::CkXml.new()
xml.LoadXml("<cheese><name>Stilton</name><weight>250</weight><expire_date>2009-12-25</expire_date></cheese>")
print xml.getXml() + "\n";

# Outputs:
# 
# <?xml version="1.0" encoding="utf-8" ?>
# <cheese>
#     <name>Stilton</name>
#     <weight>250</weight>
#     <expire_date>2009-12-25</expire_date>
# </cheese>
Jonas Elfström
A: 

I'm afraid the only way to do this is to hand code it. I've written my own XML diff algorithms and it's much easier if you keep IDs hanging around. Generic XML diff utilities will act in unpredictable ways.

If you've ever let your your SCM try to automerge different versions of the same XML file you'll see just how hard this is to do right, even in expensive commercial tools.

David Ortiz
A: 

If you are creating the xml through ruby I would suggest parsing the xml into ruby objects, comparing those, then re-outputting the differences.

The other option I would suggest would be to pretty-print both using tidy, then text diff the two and parse the result.

Chuck Vose