views:

132

answers:

3

I've been wracking my brain trying to solve this problem. This is my first time using any scripting language for this kind of work, and I guess I might've picked a hard job to start with. Essentially, what I need to do is transform some basic XML into a heavier XML structure.

Example :

Translate the following :

<xml>
<test this="stuff">13141</test>
<another xml="tag">do more stuff</another>
<xml>

Into this :

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Package>
<Package version="1.0">
  <tests>
    <test name="stuff">
      <information>13141</information>
    </test>
  </tests>
  <anothers>
    <another name="tag">
      <information>do more stuff</information>
    </another>
  </anothers>
</Package>

I've tried doing it manually via regex, but that is a lot of work to do. I've tried storing, for example, multiple test tags into an array, so I can save them into the tests tag in the second example, but I can't seem to keep track of everything. I've looked into REXML and Hpricot, but can't figure out how to use them to properly do this.

So, basically, what I'm asking is : Does anyone have any ideas on how I might be able to manage this in a more efficient way?

+2  A: 

Look into XSLT. I only have a passing familiarity with the technology, but its use is to transform XML documents from one form to another, which sounds like what you need.

Nick Lewis
Thanks, I'll look into this.
Dexodro
This kind of work is EXACTLY what XSLT is for.
dacracot
Look at Oxygen from http://www.oxygenxml.com/ for a nicely implemented XSLT IDE. It has all the debugging capabilities that you would expect from a professional IDE and can really jump start the learning process.
dacracot
A: 

Hpricot and Builder in combination may provide what you're looking for. The steps would be:

  1. Read in XML with Hpricot
  2. Pick out what elements you want
  3. Spit out your new XML (through Builder) by iterating over elements from Hpricot
Benjamin Oakes
I'll look into this, as well. Thank you.
Dexodro
A: 
require 'rubygems'
require 'hpricot'
require 'activesupport'

source = <<-XML
<xml>
<test this="stuff">13141</test>
<another xml="tag">do more stuff</another>
</xml>
XML

def each_source_child(source)
  doc = Hpricot.XML(source)

  doc.at('xml').children.each do |child|
    if child.is_a?(Hpricot::Elem)
      yield child
    end
  end
end

output = Hpricot.build do |doc|
  doc << '<?xml version="1.0" encoding="UTF-8"?>'
  doc << '<!DOCTYPE Package>'
  doc.tag! :Package, :version => '1.0' do |package|
    each_source_child(source) do |child|
      package.tag! child.name.pluralize do |outer|
        outer.tag! child.name, :name => child.attributes.values.first do |inner|
          inner.tag! :information do |information|
            information.text! child.innerText
          end
        end
      end
    end
  end
end

puts output

there will be no whitespaces between tags

tig