tags:

views:

474

answers:

4

Need to parse some basic XML (one root element, 3-4 subelements, 1-3 attributes each) from a ksh script (ideally stick to ksh, given the script already exists and it's just trying to read some extra configuration created in XML by another program).

I know I can use sed and do pattern matching, but it's not foolproof given that the input XML could change and attributes could be duplicated on the various subelements (or new subelements).

So far, I'm thinking of using an XSLT against the XML to extract the few attributes (for specific elements) that the ksh script cares about as individual fields. I can use Oracle for this given we are a DB-driven product, and Oracle would always be installed on our systems, but that seems a bit heavy handed.

Any other safe approach to extract specific attributes from the input XML in a cross-platform manner that doesn't require access to 3rd-party parser/transformer?

A: 

Can't do it entirely in ksh, but try python xml?

If you want lightweight, you might try libxml2 and a small C program.

Anarchofascist
+1  A: 

You might want to take a look at this pure bash implementation, if keeping it all in shell script is that important.

That said, other scripting languages such as Python and Perl are also highly portable, and will make your life a lot easier. Perl's XML::Twig module, for instance, comes with an end-user script called "xml_grep", which can already be passed the --text_only option to extract just the text of a node found from a complex search. It shouldn't be that much harder to modify it to return a specified attribute as well.

Zed
+1  A: 

Depending on your meaning of "parsing" XMLStarlet may be a good option. It's completely command-line driven and supports selection and editing of XML files, as well as XSLT.

jonathan-stafford
A: 

Rather use CSV for parsing, it will not only simplify the logic but the conversion from xls to csv is easily achieved.

Sachin Chourasiya