views:

486

answers:

2

I'm attempting to parse XML in the following format (from the European Central Bank data feed) using libxml-ruby:

<?xml version="1.0" encoding="UTF-8"?>
<gesmes:Envelope xmlns:gesmes="http://www.gesmes.org/xml/2002-08-01" 
                 xmlns="http://www.ecb.int/vocabulary/2002-08-01/eurofxref"&gt;
  <gesmes:subject>Reference rates</gesmes:subject>
  <gesmes:Sender>
    <gesmes:name>European Central Bank</gesmes:name>
  </gesmes:Sender>
  <Cube>
    <Cube time="2009-11-03">
      <Cube currency="USD" rate="1.4658"/>
      <Cube currency="JPY" rate="132.25"/>
      <Cube currency="BGN" rate="1.9558"/>
    </Cube>
  </Cube>
</gesmes:Envelope>

I'm loading the document as follows:

require 'rubygems'
require 'xml/libxml'
doc = XML::Document.file('eurofxref-hist.xml')

But I'm struggling to come up with the correct namespace configuration to allow XPATH queries on the data.

I can extract all the Cube nodes using the following code:

doc.find("//*[local-name()='Cube']")

But given that both the parent node and child nodes are both called Cube this really doesn't help me iterate over just the parent nodes. Perhaps I could modify this XPATH to only find those nodes with a time parameter?

My aim is to be able to extract all the Cube nodes which have a time attribute (i.e. <Cube time="2009-11-03">) so I can then extract the date and iterate over the exchange rates in the child Cube nodes.

Can anyone help?

+2  A: 

either of these will work:

/gesmes:Envelope/Cube/Cube - direct path from root
//Cube[@time] - all cube nodes (at any level) with a time attribute


Ok, this is tested and working

arrNS = ["xmlns:http://www.ecb.int/vocabulary/2002-08-01/eurofxref", "gesmes:http://www.gesmes.org/xml/2002-08-01"]
doc.find("//xmlns:Cube[@time]", arrNS)
Zack
Neither of these actually works, they return no nodes. I tried the first one myself initially to no avail. Interestingly, if I remove all the namespacing and use a root tag of 'test' then '/test/Cube/Cube' does indeed work as expected. Any ideas?
Olly
See edit above for working code. Took a fair amount of trial and error to get
Zack
Aha! Thanks for this. I actually figured out a solution which I've just posted, but your solution saves me a link of code :)
Olly
A: 

So I figured this out. The root node defines two namespaces, one with a prefix, one without:

xmlns:gesmes="http://www.gesmes.org/xml/2002-08-01
xmlns="http://www.ecb.int/vocabulary/2002-08-01/eurofxref"

When a prefix is defined, you can quite easily reference the prefix namespaced names. Using the XML from the original question, this XPATH:

/gesmes:Envelope/gesmes:subject

Will return "Reference rates".

Because the Cube nodes are not prefixed, we first need to define a namespace prefix for the global namespace. This is how I achieved this:

doc = XML::Document.file('eurofxref-hist-test.xml')
context = XML::XPath::Context.new(doc)
context.register_namespace('euro', 'http://www.ecb.int/vocabulary/2002-08-01/eurofxref')

Once this is defined, finding the Cube nodes with time attributes is trivial:

context.find("//euro:Cube[@time]").each {|node| .... }
Olly