views:

98

answers:

1

I am new to programming so bear with me. I have an XML document that looks like this:

File name: PRIDE1542.xml

<ExperimentCollection version="2.1">
<Experiment>
    <ExperimentAccession>1015</ExperimentAccession>
    <Title>**Protein complexes in Saccharomyces cerevisiae (GPM06600002310)**</Title>
    <ShortLabel>GPM06600002310</ShortLabel>
    <Protocol>
        <ProtocolName>**None**</ProtocolName>
    </Protocol>
    <mzData version="1.05" accessionNumber="1015">
        <cvLookup cvLabel="RESID" fullName="RESID Database of Protein Modifications" version="0.0" address="http://www.ebi.ac.uk/RESID/" />
        <cvLookup cvLabel="UNIMOD" fullName="UNIMOD Protein Modifications for Mass Spectrometry" version="0.0" address="http://www.unimod.org/" />
        <description>
            <admin>
                <sampleName>**GPM06600002310**</sampleName>
                <sampleDescription comment="Ho, Y., et al., Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002 Jan 10;415(6868):180-3.">
                    <cvParam cvLabel="NEWT" accession="4932" name="Saccharomyces cerevisiae (Baker's yeast)" value="Saccharomyces cerevisiae" />
                </sampleDescription>
                            </admin>
        </description>
        <spectrumList count="0" />
          </mzData>
    </Experiment>
</ExperimentCollection>

I want to take out the text in between <Title>, <ProtocolName>, and <SampleName> and put into a text file (I tried bolding them to making it easier to see). I have the following code so far (based on posts I saw on this site), but it seems not to work:

>> require 'rubygems'
>> require 'nokogiri'
>> doc = Nokogiri::XML(File.open("PRIDE_Exp_Complete_Ac_10094.xml"))
>> @ExperimentCollection = doc.css("ExperimentCollection Title").map {|node| node.children.text }

Can someone help me?

A: 

Try to access them using xpath expressions. You can enter the path through the parse tree using slashes.

puts doc.xpath( "/ExperimentCollection/Experiment/Title" ).text
puts doc.xpath( "/ExperimentCollection/Experiment/Protocol/ProtocolName" ).text
puts doc.xpath( "/ExperimentCollection/Experiment/mzData/description/admin/sampleName" ).text
Nikolaus Gradwohl
That worked beautifully thank you. Now I have 2 questions: '<sampleName>GPM06600002310</sampleName> <sampleDescription comment="Ho, Y., et al., Ident. of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002 Jan 10;415(6868):180-3."><cvParam cvLabel="NEWT" accession="4932" name="Saccharo cerevisiae (Yeast)" value="Saccharo cerevisiae" /> </sampleDescription>'I tried to use the same doc.xpath( method to get the text that appears after <sampleDescription comment=....> and <cvParam name=...> but didn't work. Also, how could I write the output to a text file?
Bobby
to access a xml-attribute you have to add an @ before the last part of the xpath expresseion (eg: "/element1/element2/@attributename" )to write the values into a file simply open the file for writing using f = File.open("filename.txt","w")and write to it using f.write("whatever")then close it using f.close()
Nikolaus Gradwohl