views:

123

answers:

6

Hi there,

I have three xml files of similar structure and I would like to use an xpath-expression to extract all matching nodes in these files and write them to a thrid one.

Do you know a good tool to handle this?

I am thinking of something like

$supermagicxpathtool -x "//whoopdee" file1.xml file2.xml file3.xml > resultfile.xml
A: 

xmlstarlet can extract the nodes, but I'm not certain that it can join the results like that.

Ignacio Vazquez-Abrams
"Dear XMLStarlet users, you may have noticed that the development of xmlstarlet has somewhat stalled"Sorry, but that is not an option.
er4z0r
Looks like xgrep comes close to what I am looking for
er4z0r
It's not a word processor, that needs to keep adding features to keep up with the Joneses. It does the job that it's supposed to. This is not "stagnation", this is "maintenance mode".
Ignacio Vazquez-Abrams
+2  A: 

XPath can only select nodes, it cannot write to a file.

In XPath 1.0 there is no standard way to reference in a single expression nodes belonging to more than one XML document. If the programming language that is hosting XPath is XSLT, then the document nodes of the three XML documents can be in three separate xsl:variables: $doc1, $doc2 and $doc3.

$doc1//whoopdee | $doc2//whoopdee | $doc3//whoopdee

Alternatively, the XSLT document() function can be used directly:

    document('file1.xml')//whoopdee 
  | document('file2.xml')//whoopdee 
  | document('file3.xml')//whoopdee

To output the result of either XPath expressions above, using XSLT one would simply write:

<xsl:copy-of select="$doc1//whoopdee | $doc2//whoopdee | $doc3//whoopdee">

or

<xsl:copy-of select=
   "document('file1.xml')//whoopdee 
  | document('file2.xml')//whoopdee 
  | document('file3.xml')//whoopdee
">

In XPath 2.0 one can use the standard doc() function and will not depend on the host of XPath.

Command-line:

One can use any XSLT processor, which allows command line instantiation. Most XSLT processors do allow this. They also allow simple parameters to be passed in the command line -- usually in the format name=value. Finally, most XSLT processors allow the destination file for the result to be specified as an option. Here is a link to the Saxon documentation of its command-line usage:

http://www.saxonica.com/documentation/using-xsl/commandline.html

Dimitre Novatchev
A: 

xmlstarlet can copy a node to another document (so this seems like a first step to a solution):

# code example from:
# "How to copy a node to another document",
# http://sourceforge.net/projects/xmlstar/forums/forum/226076/topic/3558346

xml sel -R -t -c / -c "document('f2.xml')" f1.xml | \
       xml ed -m /xml-select/Module_0 /xml-select/cnpsXML/Destinations/Module_0/Filter_1 | \
       xml sel -t -c /xml-select/* - | xml fo 

# In pseudo code:
# 1. Combine both documents into one (using -R to keep the combo a valid XML file - genius!)
# 2. Move the element from f2.xml to its final destination

To extract all matching nodes to plain (tagless) text or xsl we can do the following:

xmlstarlet sel -t -m "//whoopdee" -v '@*' -v '.' -n file1.xml > resultfile

xmlstarlet sel -C -t -m "//whoopdee" -v '@*' -v '.' -n file1.xml > resultfile.xsl
xml tr resultfile.xsl file1.xml
cigit
A: 

So building on my previous post xmlstarlet seems to get the job done like so:

xmlstarlet sel -R -t -c / -c "document('file2.xml')" -c "document('file3.xml')" file1.xml | \
       xmlstarlet sel -R -t -c /xml-select/*/whoopdee - | xmlstarlet fo > resultfile.xml 

xmlstarlet val resultfile.xml
cigit
A: 

Using xml-cat of the xml-coreutils package adds to the Unix look & feel:

xml-cat file1.xml file2.xml file3.xml | \
   xmlstarlet sel -R -t -c /root/whoopdee - | \
   xmlstarlet fo > resultfile.xml 
yabt
A: 

You seem to be looking for the tool xpath which resides in the package libxml-xpath-perl in Ubuntu and most likely Debian and -based distros.

xpath [-s suffix] [-p prefix] [-q] -e query [-e query] ... [file] ...
Zash