tags:

views:

38

answers:

1

I am using XML::Twig to parse my input xml using Perl. I need to extact a particular node in this XML and validate that node to see if it has multiple <p> tags and then count words in those P tags. For example:

<XML> 
<name>
</name>
<address>
<p id="1">a b c d </p>
<p id="2">y y y </p>
</address>
</XML>

Output:

Address has 2 paragraph tags with 7 words.

Any suggestions?

A: 

Here is one way to do it:

use strict;
use warnings;
use XML::Twig;

my $xfile = q(
<XML>  
<name> 
</name> 
<address> 
<p id="1">a b c d </p> 
<p id="2">y y y </p> 
</address> 
</XML> 
);

my $t = XML::Twig->new(
    twig_handlers => { 'address/p' => \&addr}
);
my $pcnt = 0;
my $wcnt = 0;
$t->parse($xfile);
print "Address has $pcnt paragraph tags with $wcnt words.\n";

sub addr {
    my ($twig, $add) = @_;
    my @words = split /\s+/, $add->text();
    $wcnt += scalar @words;
    $pcnt++;
}

__END__

Address has 2 paragraph tags with 7 words.

XML::Twig has a dedicated website with documentation and a Tutorial to describe the handler technique used above.

toolic
thanks so much for the answer.. workds great and also for the link to tutorials!
Desai
I have another twist to the problem, i need to escape entities inside the p tags so that should not be counted. What i tried in the addr sub was sub addr { my ($twig, $add) = @_; my $local = $add->text();$_ = $local;s/\]+\;//sg; $local = $_; my @words = split /\s+/, $local; $wcnt += scalar @words; $pcnt++; } but the above does not replace entities.. what am i missing? Please help!
Desai
aaah.. i got it.. i was missing this...$local = encode_entities($local);
Desai