views:

167

answers:

2

Hello,

Quick question:

I have XML with the following code:

<Experiment>
<mzData version="1.05" accessionNumber="1635">
<description>
<admin>
<sampleName>Fas-induced and control Jurkat T-lymphocytes</sampleName> 
<sampleDescription>
<cvParam cvLabel="MeSH" accession="D017209" name="apoptosis" /> 
<cvParam cvLabel="UNITY" accession="D2135" name="Jurkat cells" /> 
<cvParam cvLabel="MeSH" accession="D019014" name="Antigens, CD95" /> 
</sampleDescription>
</admin>
</description>
</mzData>
</Experiment>
</ExperimentCollection>

I also have the following code:

require 'rubygems'
require 'nokogiri'

doc = Nokogiri::XML(File.open("my.xml"))

sampleName = doc.xpath( "/ExperimentCollection/Experiment/mzData/description/admin/sampleName" ).text
sampleDescription = doc.xpath( "/ExperimentCollection/Experiment/mzData/description/admin/sampleDescription/MeSH/@accession" ).text
puts sampleName + " " + sampleDescription

foo = sampleName + " " + sampleDescription 
f = File.new("my.txt","w")
f.write(foo) 
f.close()

The code grabs the sampleName just fine, but not the accession letters/numbers. I only want to grab all the letters/numbers after MeSH -> accession (D017209 and D019014). What do I have to change in the doc.xpath command to make this work?

Bob

+2  A: 
doc.xpath( "/ExperimentCollection/Experiment/mzData/description/admin/sampleDescription/MeSH/@accession" )

Returns nothing because there is no tag MeSH. You need to replace MeSH with cvParam[@cvLabel=\"MeSH\"] (read: a cvParam tag which has an attribute cvLabel with the value MeSH).

Once you fixed that xpath will return a collection of Nokogiri::XML::Attr objects. By calling text on that collection you will get back the string value of the first element. Since you want all of the elements you should instead use map(&:text) (or map {|n| n.text} in ruby 1.8.6) which will return an array containing the string value of each accession attribute (i.e. ["D017209", "D019014"] for the example XML-file).

Since you seem to be confused, here's a clarification:

@Bobby: When I said "xpath will return a collection of Nokogiri::XML::Attr objects", I meant just that. You call xpath and then xpath creates and returns a collection of Attr objects. In no way did I mean that you should manually create any Attr objects yourself.

And when I said you should use map, I just meant you should call map on the collection returned by xpath (though instead of using map you can just call puts with the collection as an argument).

  1. So what you need to do is 1. fix your xpath like I described.
  2. use xpath with the fixed xpath to get a collection
  3. use puts to print it

In other words:

require 'rubygems'
require 'nokogiri'

doc = Nokogiri::XML(File.open("my.xml"))

common_prefix = "/ExperimentCollection/Experiment/mzData/description/admin"
sample_name = doc.xpath( common_prefix+"/sampleName" ).text
accessions = doc.xpath( common_prefix+
               "/sampleDescription/cvParam[@cvLabel=\"MeSH\"]/@accession" )

puts sample_name
puts accessions
sepp2k
Thanks very much for your help. What commands can I use to call upon Nokogiri::XML::Attr objects?
Bobby
@Bobby: To get their values just use `value` or `text`. There are of course many other things you can do with Attr objects (basically all the things you can do with any Node object - get its parent for example). They're all documented on the API page for [XML::Node](http://nokogiri.org/Nokogiri/XML/Node.html).
sepp2k
Say I wanted to just save the text into a variable, then display it. Shouldn't something like this should work then? m = Nokogiri::XML::Attr puts m
Bobby
@Bobby: Well, no because you're setting `m` to the class `Nokogiri::XML::Attr`, not to an instance of it. But yes, if `m` points to an `Attr`, `puts m` will work.
sepp2k
Bobby
A: 

Here is a simple way to do it, although this is probably too clever, because you'll probably want to do other things as well:

File.open("my.txt","w") do |f|
  doc.xpath('//cvParam[@cvLabel="MeSH"]').each {|n| f << "#{n['name']} #{n['accession']}\n"}
end

You may need a more selective xpath statement.

Eric W.