ansaurus

Question

Adding an incrementing value attribute to every tag in xml using script

Answer 1

A:

Normally you should use a proper parser to process xml. But in awk:

awk 'match($0, /<[^\/>]+/) { \
     $0 = substr($0, 1, RSTART+RLENGTH-1) " order=\"" ++i "\"" \
          substr($0, RSTART+RLENGTH) \
     }; 1'

I look for a opening tag (without the > or /> part) on every the line. If found, put the string order="i" after it, while incrementing i. The single 1 on the last line just always executes awk's default action: { print $0 }.

I updated the regular expression to work on your revised input. It fails as soon as you have multiple opening tags on a single line, etc.

schot 2010-08-31 07:34:46

key property is not mandatory, I will modify the example accordingly

aeh 2010-08-31 07:46:01

Why the downvotes? I know (and mention) that regex are not a replacement for proper XML parsing, but if that's what the OP wants/needs.

schot 2010-08-31 08:35:26

@schot : The downvotes are due to the strong sentiments that the SO community has against using regular expressions for manipulating XML/HTML documents. It's probably got nothing to do with your answer.

Zaid 2010-08-31 09:32:23

@Zaid: OK, so be it. @aeh: I'm curious, did my updated version satisfy your requirements?

schot 2010-08-31 12:23:35

@schot: It would fail if I have "/" in the value

aeh 2010-08-31 13:04:56

@aeh: Yes, of course... You could change the regex to `/<[^>]*[^\/>]/`, but this is starting to demonstrate your data is to complex for simple regex tricks.

schot 2010-08-31 13:10:49

@schot: Thanx I don't know much of scripting. I wanted a lead. The following regex works for me /<[^/?!][^ |>]+/

aeh 2010-08-31 13:43:32

Answer 2

+4 A:

I like Perl's XML::Twig for this sort of thing. You'll have to adjust it for whatever you are doing so you visit all the elements you want to affect. To handle parents before children, a queue is probably what you want:

use XML::Twig;

my $xml = <<'XML';
<tag1 key="123">
  <tag2 key="1234"/>
  <tag3 key="12345">
   </tag3>
</tag1>
XML

my $twig = XML::Twig->new(
    pretty_print => 'indented',
    );
$twig->parse( $xml );
my @queue = ( $twig->root );

my $n = 1;  
while( my $elem = shift @queue ) {
    next unless $elem->tag =~ /\Atag[123]\z/;
    $elem->set_att( order => $n++ );
    push @queue, $elem->children( qr/\Atag/ );
    }

$twig->print;

The output from this script is:

<tag1 key="123" order="1">
  <tag2 key="1234" order="2"/>
  <tag3 key="12345" order="3"></tag3>
</tag1>

brian d foy 2010-08-31 08:19:39

If possible I am not looking on any dependencies(Twig), pure string manipulation. Also string "123" might not always be present. will edit the example

aeh 2010-08-31 08:26:15

@aeh, trying to manipulate XML without using a proper XML parser is always risky. You may get away with it if your XML is "normal" enough and the change you're making is simple, but there are no guarantees. Also, the `[123]` has nothing to do with `key="123"`. It's a character class; that line is looking for tags named tag1 or tag2 or tag3.

cjm 2010-08-31 08:32:46

@cjm: sorry for misinterpretation(I don't know perl) however even tags need not be tag1..2..3. I understand without using a proper parser it would not be appropriate to manipulate XML. I have a simple requirement and I am just trying out if somebody has a clean solution without any dependency on the parser.

aeh 2010-08-31 08:41:36

You don't have a simple requirement though. It's a very tough subject that only looks simple to you because you haven't fallen off the cliff enough.

brian d foy 2010-08-31 10:38:11

Answer 3

+2 A:

It's pretty simple with XML::LibXML and a drop of XPath.

#!/usr/bin/perl

use strict;
use warnings;

use XML::LibXML;

my $counter = 1;

my $xp = XML::LibXML->new->parse_file('test.xml');

foreach($xp->findnodes('//*')) { # '//*' returns all nodes
  $_->setAttribute('order', $counter++);
}

print $xp->toString;

davorg 2010-08-31 08:28:00

If possible no dependencies(LibXML). sorry for not mentioning earlier.

aeh 2010-08-31 08:33:40

Parsing XML without using an XML parser is a really bad idea.I strongly recommend removing whatever restriction is preventing you from using CPAN modules. Without CPAN you're using a crippled version of Perl.

davorg 2010-08-31 08:45:20

@davorg: Thanx I understand manipulating an XML without an XML parser is not appropriate. But my requirement is simple related to the structure and does not depend on strict semantics. I am giving it a try. Also I cannot have further dependencies.

aeh 2010-08-31 08:50:41

But it's the structure of XML that is so hard to parse correctly with regular expressions. Every time you think you've got it right, there's another corner case that breaks your code."Also I cannot have further dependencies."You keep saying that, but you haven't explained why. This is the heart of your problems. This is the issue that you need to address.

davorg 2010-08-31 10:55:15

@davorg: This is something I want to do it along with the build time of a project. Now if I add a dependency the whole chain(developer, CI) of systems have to updated. which is not under my control.

aeh 2010-08-31 11:14:01

@aeh: read the link I added to your question above. Work to reform your institution, rather than living within its unreasonable and unjustified restrictions.

Ether 2010-08-31 16:22:23

ansaurus

tags:

views:

answers:

Adding an incrementing value attribute to every tag in xml using script

related questions