tags:

views:

80

answers:

3

I want to add an attribute to every tag in my xml, which is incrementing using either awk, sed, perl or plain shell cmd

For Eg:

<tag1 key="123">
  <tag2 abc="xf d"/>
  <tag3 def="d2 32">
   </tag3>
</tag1>

I am expecting the following output

<tag1 key="123" order="1">
  <tag2 abc="xf d" order="2"/>
  <tag3 def="d2 32" order="3">
   </tag3>
</tag1>

If possible I am not looking on any dependencies(Twig,LibXML), pure string manipulation.

A: 

Normally you should use a proper parser to process xml. But in awk:

awk 'match($0, /<[^\/>]+/) { \
     $0 = substr($0, 1, RSTART+RLENGTH-1) " order=\"" ++i "\"" \
          substr($0, RSTART+RLENGTH) \
     }; 1'

I look for a opening tag (without the > or /> part) on every the line. If found, put the string order="i" after it, while incrementing i. The single 1 on the last line just always executes awk's default action: { print $0 }.

I updated the regular expression to work on your revised input. It fails as soon as you have multiple opening tags on a single line, etc.

schot
key property is not mandatory, I will modify the example accordingly
aeh
Why the downvotes? I know (and mention) that regex are not a replacement for proper XML parsing, but if that's what the OP wants/needs.
schot
@schot : The downvotes are due to the strong sentiments that the SO community has against using regular expressions for manipulating XML/HTML documents. It's probably got nothing to do with your answer.
Zaid
@Zaid: OK, so be it. @aeh: I'm curious, did my updated version satisfy your requirements?
schot
@schot: It would fail if I have "/" in the value
aeh
@aeh: Yes, of course... You could change the regex to `/<[^>]*[^\/>]/`, but this is starting to demonstrate your data is to complex for simple regex tricks.
schot
@schot: Thanx I don't know much of scripting. I wanted a lead. The following regex works for me /<[^/?!][^ |>]+/
aeh
+4  A: 

I like Perl's XML::Twig for this sort of thing. You'll have to adjust it for whatever you are doing so you visit all the elements you want to affect. To handle parents before children, a queue is probably what you want:

use XML::Twig;

my $xml = <<'XML';
<tag1 key="123">
  <tag2 key="1234"/>
  <tag3 key="12345">
   </tag3>
</tag1>
XML

my $twig = XML::Twig->new(
    pretty_print => 'indented',
    );
$twig->parse( $xml );
my @queue = ( $twig->root );

my $n = 1;  
while( my $elem = shift @queue ) {
    next unless $elem->tag =~ /\Atag[123]\z/;
    $elem->set_att( order => $n++ );
    push @queue, $elem->children( qr/\Atag/ );
    }

$twig->print;

The output from this script is:

<tag1 key="123" order="1">
  <tag2 key="1234" order="2"/>
  <tag3 key="12345" order="3"></tag3>
</tag1>
brian d foy
If possible I am not looking on any dependencies(Twig), pure string manipulation. Also string "123" might not always be present. will edit the example
aeh
@aeh, trying to manipulate XML without using a proper XML parser is always risky. You may get away with it if your XML is "normal" enough and the change you're making is simple, but there are no guarantees. Also, the `[123]` has nothing to do with `key="123"`. It's a character class; that line is looking for tags named tag1 or tag2 or tag3.
cjm
@cjm: sorry for misinterpretation(I don't know perl) however even tags need not be tag1..2..3. I understand without using a proper parser it would not be appropriate to manipulate XML. I have a simple requirement and I am just trying out if somebody has a clean solution without any dependency on the parser.
aeh
You don't have a simple requirement though. It's a very tough subject that only looks simple to you because you haven't fallen off the cliff enough.
brian d foy
+2  A: 

It's pretty simple with XML::LibXML and a drop of XPath.

#!/usr/bin/perl

use strict;
use warnings;

use XML::LibXML;

my $counter = 1;

my $xp = XML::LibXML->new->parse_file('test.xml');

foreach($xp->findnodes('//*')) { # '//*' returns all nodes
  $_->setAttribute('order', $counter++);
}

print $xp->toString;
davorg
If possible no dependencies(LibXML). sorry for not mentioning earlier.
aeh
Parsing XML without using an XML parser is a really bad idea.I strongly recommend removing whatever restriction is preventing you from using CPAN modules. Without CPAN you're using a crippled version of Perl.
davorg
@davorg: Thanx I understand manipulating an XML without an XML parser is not appropriate. But my requirement is simple related to the structure and does not depend on strict semantics. I am giving it a try. Also I cannot have further dependencies.
aeh
But it's the structure of XML that is so hard to parse correctly with regular expressions. Every time you think you've got it right, there's another corner case that breaks your code."Also I cannot have further dependencies."You keep saying that, but you haven't explained why. This is the heart of your problems. This is the issue that you need to address.
davorg
@davorg: This is something I want to do it along with the build time of a project. Now if I add a dependency the whole chain(developer, CI) of systems have to updated. which is not under my control.
aeh
@aeh: read the link I added to your question above. Work to reform your institution, rather than living within its unreasonable and unjustified restrictions.
Ether