views:

67

answers:

2

First the xml: http://api.chartlyrics.com/apiv1.asmx//GetLyric?lyricId=90&lyricCheckSum=9600c891e35f602eb6e1605fb7b5229e

doc = Nokogiri::XML(open("http://api.chartlyrics.com/apiv1.asmx//GetLyric?lyricId=90&lyricCheckSum=9600c891e35f602eb6e1605fb7b5229e"))

Successfully will grab the document content.

After this point i am unable to get inside and grab data and i am not sure why?

For example, i would expect:

doc.xpath("//LyricArtist")

To kick back the artist but it does not.

I have tried the same thing with other feeds, such as the default RSS feed that any wordpress installation provides and if i do something like:

doc.xpath("//link")

I get a list of all the "links".

I am definitely missing something and would love your input. thank you!!

A: 

It doesn't like something in the namespace or schema.

uri = "http://api.chartlyrics.com/apiv1.asmx//GetLyric?LyricId=90&lyricCheckSum=9600c891e35f602eb6e1605fb7b5229e"
x = open(uri).read()
x = x.sub(/<.*?>/,'').sub(/<.*?>/,'<GetLyricResult>')
doc = Nokogiri::XML(x)
puts doc.xpath('//LyricArtist').text()
jeem
+3  A: 

The XML elements are namespace qualified and bound to http://api.chartlyrics.com/.

If you view the XML you will notice the document element has a namespace decalred:

<GetLyricResult xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://api.chartlyrics.com/"&gt;

In order to match on an element that is bound to a namespace, you either need to declare a namespace prefix bound to that URI and use that namespace prefix in your XPATH expression, or use an XPATH expression that either ignores the namespaces or matches differently.

You can match on elements and then use local-name() to match the element name, regardless of the declared namespace.

//*[local-name()='LyricArtist']

If you want to be more exact, you can use local-name() to match the element name and namespace-uri() to match the declared namespace.

//*[local-name()='LyricArtist' and namespace-uri()='http://api.chartlyrics.com/']

The second example would prevent matching on elements with the same local-name() that were bound to different namespaces. Might not be a problem for this specific instance, but is something that you should be aware of. Namespaces are used to uniquely qualify nodes and allow different vocabularies to use the same "name" for something without worrying about a conflict.

Mads Hansen
Mads,Thank you so much for taking the time to explain this to me.I have homework now and need to read up on XML/Namespacing because I feel this should have been obvious to me and it was not.I do have an additional question though; is Nokogiri good for what i'm trying to do here?Based on my research i found that Nokogiri was faster than most other libraries (specific to XML parsing) and i enjoy the syntax for the most part.Any other suggestions?
mzz
Mads,i found this works too:doc.xpath('//xmlns:LyricArtist')The reason is here: http://tenderlovemaking.com/2009/04/23/namespaces-in-xml/Check "bonus round". Mads, again, thank you SO Much
mzz
If that "bonus round" syntax works, go with it. It's shorter syntax and easier to write. As long as you understand what namespaces are and how namespace-prefixes work, then you are prepared for when it may matter and how to handle it.
Mads Hansen