When doing shell scripting, typically data will be in files of single line records like csv. It's really simple to handle this data with grep
and sed
. But I have to deal with XML often, so I'd really like a way to script access to that XML data via the command line. What are the best tools?
views:
4272answers:
8At the moment, the best solution I've found is hpricot, which provides XPath & CSS selectors and a DOM. But it's only available in ruby, so I can't easily use it in a shell script.
EDIT I've found some more promising tools:
fxgrep: Uses its own XPath-like syntax to query documents. Written in SML, so installation may be difficult.
LT XML: XML toolkit derived from SGML tools, including
sggrep
,sgsort
,xmlnorm
and others. Uses its own query syntax. The documentation is very formal. Written in C. LT XML 2 claims support of XPath, XInclude and other W3C standards.xmlgrep2: simple and powerful searching with XPath. Written in Perl using XML::LibXML and libxml2.
XQSharp: Supports XQuery, the extension to XPath. Written for the .NET Framework.
xml-coreutils: Laird Breyer's toolkit equivalent to GNU coreutils. Discussed in an interesting essay on what the ideal toolkit should include.
xmldiff: Simple tool for comparing two xml files.
I haven't had a chance to try any of these, but xml-coreutils seems the best documented and most unix oriented.
FURTHER EDIT
I've removed xmltk from this list. It doesn't seem to have package in debian, ubuntu, fedora, or macports. It also hasn't had a release since 2007, and uses non-portable build automation. I can't recommend it unless it becomes more portable.
JEdit has a plugin called "XQuery" which provides querying functionality for XML documents.
Not quite the command line, but it works!
Decide on what operations you want to do on XML files and create a script (in Python, Perl perhaps) that exposes that functionality through arguments for shell scripts to use.
I've found xmlstarlet to be pretty good at this sort of thing.
http://xmlstar.sourceforge.net/
Should be available in most distro repositories, too. An introductory tutorial is here:
Depends on exactly what you want to do.
XSLT may be the way to go, but there is a learning curve. Try xsltproc and note that you can hand in parameters.
XQuery might be a good solution. It is (relatively) easy to learn and is a W3C standard.
I would recommend XQSharp for a command line processor.
To Joseph Holsten's excellent list, I add the xpath command-line script which comes with Perl library XML::XPath. A great way to extract information from XML files:
xpath -q -e '/entry[@xml:lang="fr"]' *xml
There is also xml2
and 2xml
pair. It will allow usual string editing tools to process XML.
Example. q.xml:
<?xml version="1.0"?>
<foo>
text
more text
<textnode>ddd</textnode><textnode a="bv">dsss</textnode>
<![CDATA[ asfdasdsa <foo> sdfsdfdsf <bar> ]]>
</foo>
xml2 < q.xml
/foo=
/foo= text
/foo= more text
/foo=
/foo/textnode=ddd
/foo/textnode
/foo/textnode/@a=bv
/foo/textnode=dsss
/foo=
/foo= asfdasdsa <foo> sdfsdfdsf <bar>
/foo=
xml2 < q.xml | grep textnode | sed 's!/foo!/bar/baz!' | 2xml
<bar><baz><textnode>ddd</textnode><textnode a="bv">dsss</textnode></baz></bar>
P.S. There are also html2
/ 2html
.