views:

105

answers:

0

Edit: I figured out the answer. The problem was namespaces. The following code got it to work: @doc.xpath("xmlns:msms_pipeline_analysis/@date").to_s. I'd put this down as the answer, but this question has been closed.

I'm using Nokogiri to parse pepXML files from different peptide search engines. I have two pepXML files, both of which appear, inasmuch as I can tell, to be of correct format, and puts Nokogiri::XML(IO.read(file)) will output the whole XML file for both files.

The problem is, doc.xpath("any valid xpath") will parse the tag from one of the files, but not the other. No errors are given, so I have no idea why it won't parse. Anyone know of any reasons why Nokogiri wouldn't parse something out?

File 1 that works:

<?xml version="1.0"?>
<!DOCTYPE msms_pipeline_analysis PUBLIC "-//NCBI//pepXML/EN" "pepXML.dtd">
<msms_pipeline_analysis date="2010-05-24T13:54:07" summary_xml="/home/jashi/pipeline/pipeline0.01/data/test-forward_omssa_output_1.pep.xml">
  <msms_run_summary base_name="/home/jashi/pipeline/pipeline0.01/data/test-forward_omssa_output_1.pep.xml" raw_data_type="raw" raw_data=".mzXML">
    <sample_enzyme name="Trypsin">
      <specificity sense="C" cut="KR" no_cut="P"/>
    </sample_enzyme>
  </msms_run_summary>
</msms_pipeline_analysis>

File 2 that doesn't:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="pepXML_std.xsl"?>
<msms_pipeline_analysis date="2010:05:24:13:51:39" summary_xml="/usr/local/src/tpp-4.3.1/build/linux//home/jashi/pipeline/pipeline0.01/data/fast-forward_tandem_output_1.pep.xml" xmlns="http://regis-web.systemsbiology.net/pepXML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://sashimi.sourceforge.net/schema_revision/pepXML/pepXML_v114.xsd"&gt;
  <msms_run_summary base_name="/home/jashi/pipeline/pipeline0.01/data/fast-forward_tandem_output_1.xml" search_engine="X! Tandem" raw_data_type="raw" raw_data=".?">
    <sample_enzyme name="trypsin">
      <specificity cut="KR" no_cut="P" sense="C"/>
    </sample_enzyme>
  </msms_run_summary>
</msms_pipeline_analysis>

And I'm using this: @doc.xpath("msms_pipeline_analysis/@date").to_s