views:

113

answers:

1

Given a search term, how to search the attributes of nodes in an XML and return the XML which contains only those nodes that match the term along with their parents all the way tracing to the root node.

Here is an example of the input XML:

<root>
  <node name = "Amaths"> 
    <node name = "Bangles"/> 
  </node>
  <node name = "C">
    <node name = "Dangles">
      <node name = "E"> 
        <node name = "Fangles"/> 
      </node>
    </node>
    <node name = "Gdecimals" />
  </node>
  <node name = "Hnumbers"/> 
  <node name = "Iangles"/> 
</root>

The output I'm looking for the search term "angles":

<root>
  <node name = "Amaths"> 
    <node name = "Bangles"/> 
  </node>
  <node name = "C">
    <node name = "Dangles">
      <node name = "E"> 
        <node name = "Fangles"/> 
      </node>
    </node>
  </node>
  <node name = "Iangles"/> 
</root>

The XPath that I use to search the xml is "//*[contains(@name,'angles')]"

I'm using Nokogiri in Ruby to search the XML which provides me a NodeSet of all nodes that match the term. I cannot figure out how to construct back the XML from that set of nodes.

Thanks!

EDIT: Fixed the example should have been . Thanks Dimitre.

EDIT 2: Fixed the xml again for well-formedness.

+2  A: 

First, do note that the presented wanted output is incorrect and the following element has no end tag later in the document:

<node name = "C">

The results of evaluating an XPath expressions can be a set of nodes from the XML document, but these notes can't be altered by XPath.

This XPath expression selects the

nodes that match the term along with their parents all the way tracing to the root node

//*[contains(@name,'angles') and not(node())]/ancestor::*

However, the nodes are not changed and they contain all their children, meaning that the complete subtree rooted in Root still is a the subtree of Root in the returned result.

In case you want to obtain a new document (set of nodes) with different structure than the original XML document, you have to use another language that is hosting XPath. There are many such languages, such as XSLT, XQuery and any language with an XML DOM implementation.

Here is an XSLT transformation, producing the wanted result:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match="*[not(descendant-or-self::*[contains(@name, 'angles')])]"/>
</xsl:stylesheet>

when this transformation is applied on the provided XML document(corrected to be well-formed):

<root>
  <node name = "Amaths">
    <node name = "Bangles"/>
  </node>
  <node name = "C">
    <node name = "Dangles">
      <node name = "E">
        <node name = "Fangles"/>
      </node>
      <node name = "Gdecimals" />
    </node>
  </node>
  <node name = "Hnumbers"/>
  <node name = "Iangles"/>
</root>

the wanted (correct) result is produced:

<root>
   <node name="Amaths">
      <node name="Bangles"/>
   </node>
   <node name="C">
      <node name="Dangles">
         <node name="E">
            <node name="Fangles"/>
         </node>
      </node>
   </node>
   <node name="Iangles"/>
</root>
Dimitre Novatchev
@Dimitre: Thanks a ton! About the mistake in the output, I have fixed the question. Will try out your solution and let you know. Thanks once again.
Vijay Dev
@Vijay-Dev: The XML document is still non-well-formed. I have changed my answer to include your latest XML document (corrected to be well-formed) and the new result.
Dimitre Novatchev
@Dimitre: Thanks for pointing out the error. Fixed it!
Vijay Dev
@Dimitre: Would you mind answering a variant of this? I need to include all child nodes of a node which matched the search term in the output xml, no matter what their "name" attribute is. How to go about modifying the XPath to achieve this? Thanks!
Vijay Dev
@Vijay-Dev: If I understand you well, this is the first XPath expression in my answer.
Dimitre Novatchev
Wrote another xsl:template to match "*[(ancestor-or-self::*[contains(@name, \'angles\')])]" Seems to work fine. Pls let me know if there is a better way.
Vijay Dev