tags:

views:

81

answers:

4

my question is: how to pass some arguments to XML:Twig's handler, and how to return the result from the handler.

Here is my code, which hardcoded:

<counter name = "music", report type = "month", stringSet index = 4>.

How to implement this by using arguments $counter_name, $type, $id? and how to return the result of string_list? Thanks (sorry I did not post the xml file here because I have some trouble to do that. anything within < and > are ignored).

use XML::Twig;

sub parse_a_counter {

     my ($twig, $counter) = @_;
     my @report = $counter->children('report[@type="month"]');

     for my $report (@report){

         my @stringSet = $report->children('stringSet[@index=”4”]');
         for my $stringSet (@stringSet){

             my @string_list = $stringSet->children_text('string');
             print @string_list;  #  in fact I want to return this string_list,
                                  #  not just print it.
         }
     }

     $counter->flush; # free the memory of $counter
}

my $roots = { 'counter[@name="music"]' => 1 };

my $handlers = { counter => \&parse_a_counter };

my $twig = new XML::Twig(TwigRoots => $roots,
                         TwigHandlers => $handlers);

$twig->parsefile('counter_test.xml');
+1  A: 

DISCLAIMER: I have not used Twig myself, so this answer might not be idiomatic - it is a generic "how do I keep state in a callback handler" answer.

Three ways of passing information in and out of the handlers are:

ONE. State held in a static location

package TwigState;

my %state = ();
# Pass in a state attribute to get
sub getState { $state{$_[0]} }
 # Pass in a state attribute to set and a value 
sub setState { $state{$_[0]} = $_[1]; }

package main;

sub parse_a_counter { # Better yet, declare all handlers in TwigState
     my ($twig, $element) = @_;
     my $counter = TwigState::getState('counter');
     $counter++;
     TwigState::setState('counter', $counter);
}

TWO. State held in a $t (XML::Twig object) itself in some "state" member

# Ideally, XML::Twig or XML::Parser would have a "context" member 
# to store context and methods to get/set that context. 
# Barring that, simply make one, using a VERY VERY bad design decision
# of treating the object as a hash and just making a key in that hash.
# I'd STRONGLY not recommend doing that and choosing #1 or #3 instead,
# unless there's a ready made context data area in the class.
sub parse_a_counter {
     my ($twig, $element) = @_;
     my $counter = $twig->getContext('counter');
     # BAD: my $counter = $twig->{'_my_context'}->{'counter'};
     $counter++;
     TwigState::setState('counter', $counter);
     $twig->setContext('counter', $counter);
     # BAD: $twig->{'_my_context'}->{'counter'} = $counter;
}

# for using DIY context, better pass it in with constructor:
my $twig = new XML::Twig(TwigRoots    => $roots,
                         TwigHandlers => $handlers
                         _my_context  => {});

THREE. Make the handler a closure and have it keep state that way

DVK
I was planning to add a closure example later but looks like draegtun beat me to it in a separate answer
DVK
A: 

The simplest way is to make parse_a_counter return a sub (ie. closure) and store the results in a global variable. For eg:

use strict;
use warnings;
use XML::Twig;

our @results;      # <= put results in here

sub parse_a_counter {
    my ($type, $index) = @_;

    # return closure over type & index
    return sub {
        my ($twig, $counter) = @_;
        my @report = $counter->children( qq{report[\@type="$type"]} );

        for my $report (@report) {
            my @stringSet = $report->children( qq{stringSet[\@index="$index"]} );

            for my $stringSet (@stringSet) {
                my @string_list = $stringSet->children_text( 'string' );
                push @results, \@string_list; 
            }
        }
    };
}

my $roots    = { 'counter[@name="music"]' => 1 };
my $handlers = { counter => parse_a_counter( "month", 4 ) };

my $twig = XML::Twig->new(
    TwigRoots    => $roots,                     
    TwigHandlers => $handlers,
)->parsefile('counter_test.xml');

I tested this with the following XML (which is what I could work out from your example XML & code):

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <counter name="music">
        <report type="week">
            <stringSet index="4">
                <string>music week 4</string>
            </stringSet>
        </report> 
    </counter>
    <counter name="xmusic">
        <report type="month">
            <stringSet index="4">
                <string>xmusic month 4</string>
            </stringSet>
        </report> 
    </counter>
    <counter name="music">
        <report type="month"> 
            <stringSet index="4">
                <string>music month 4 zz</string>
                <string>music month 4 xx</string>
            </stringSet>
        </report>
    </counter>
</root>

And I got back this:

[
    [
        'music month 4 zz',
        'music month 4 xx'
    ]
];

Which is what I was expecting!

Hope that helps.

/I3az/

draegtun
A: 

Hi, I am the author lilili07 who asked the question.I do not know why my username is changed and why I cannot Add comment directly. anyway, here is my response:

Thanks DVK and draegtun for the answer. draegtun's code works for me. I like Internet:-)

The reason I want to use twig is that I do not want to load and parse the whole xml file all together because the size of the file I need to parse is too huge (there are huge numbers of elements in the xml file). It takes long time and much memory to load it. so I want to use twig to parse a part of xml, free the memory for that part, and then parse a another part of xml, free the memory for that part...., and I expect to save time and memory in this way (BTW, I have not got a chance to verify the efficiency of twig yet. At least I am not sure how much time can be saved by using it. Please let me know if it is not what I expect...).

Here are some additional questions as for draegtun’s code:

  1. Why we can pass $twig, $counter to “return sub{ my ($twig, $counter) = @_;”? Could you please explain more? My perl book does not have such kind of examples.

  2. I want to use twig to parse different part of xml file according to counter name. And each time after it finishes processing the result of a counter, it should free the memory assigned for processing that counter (especially the variable to store element), and then it should continue to parsing the next counter, etc. How to implement that? It is said the memory of variable defined by “my” will not be freed by system automatically so I think maybe I should free them explicitly with such as pop, undef and delete as is shown in my code below)?

  3. Must I create a new $twig as well as defining new $root, $handlers to parse different part of xml file? Can I change the value of root and $handler and then reuse the twig in stead of new twig again?


Here is my new code (modified based on draegtun).

use strict;
use warnings;
use XML::Twig;

our @results;      # <= put results in here

sub parse_a_counter {
    my ($type) = @_;

    # return closure over type
    return sub {
        my ($twig, $counter) = @_;
        my @report = $counter->children(qq{report [\@type="$type"]});
        for my $report (@report){
            my @stringSet = $reportingInterval->children(qq{stringSet[\@index]});
            for my $stringSet (@stringSet){
                    my @string_list = $stringSet->children_text('string');
                    push @results, \@string_list;
            }
        }
        $counter->purge; # free the memory of $counter
    }; # end of return sub
}

my @counter_name = qw/music xmusic/;
foreach my $counter_name (@counter_name){
    my $roots = { qq{counter[\@name="$counter_name"]} => 1 };
    my $handlers = { counter => parse_a_counter("month") };
    my $twig = new XML::Twig(TwigRoots => $roots,
             TwigHandlers => $handlers);
    $twig->parsefile('substr_test.xml');
    # got the string list now:
    for my $myresults (@results){
            print @{$myresults}; # will replace print with sth. to process the results
    }
    my $nums = @results;
    while ($nums --){
            pop(@results);
    }
    print qq{\finish process in counter name $counter_name \n\n};
    undef $roots;
    undef $handlers;
    undef $twig;
    undef $nums;
}
note the element which occupy a huge space is the content of element `<string`>
note the xml file is composed of `<root`>, `<counter`>, `<report`>, `<stringSet`> and `<string`> elements, as draegtun figured out. and the huge size of the file is due to contents of a large number of `<string`>
Hi lilili07: Don't know why Stackoverflow is playing up for you! re: comments... I think you maybe only able to give a comment once your get past a certain number of reputation points (100?). Anyway I'll do my best to answer your question in comments below.
draegtun
draegtun
`2) ... It is said the memory of variable defined by “my” will not be freed by system automatically` - No need to worry here Perl will handle these safely so you can remove all those `undef`
draegtun
draegtun
Hope that helps. I've edited your "answer" a bit. On General note.. yes XML::Twig is a handy tool for dealing with large XML files because it comes with highly granular controls for releasing parts of tree that aren't needed / already used.
draegtun
+2  A: 

The easiest, and usual way to pass arguments to handlers is to use closures. That's a big word but a simple concept: you call the handler like this tag => sub { handler( @_, $my_arg) } and $my_arg will be passed to the handler. Achieving Closure has more detailed explanations about the concept.

Below is how I would write the code. I used Getopt::Long for argument processing, and qq{} instead of quotes around strings that contained an XPath expression, to be able to use the quotes in the expression.

#!/usr/bin/perl
use strict;
use warnings;

use XML::Twig;

use Getopt::Long;

# set defaults
my $counter_name= 'music';
my $type= 'month';
my $id= 4;

GetOptions ( "name=s" => \$counter_name,
             "type=s" => \$type,
             "id=i"   => \$id,
           ) or die;   

my @results;

my $twig= XML::Twig->new( 
            twig_roots => { qq{counter[\@name="$counter_name"]} 
                             => sub { parse_a_counter( @_, $type, $id, \@results); } } )
                   ->parsefile('counter_test.xml');

print join( "\n", @results), "\n";

sub parse_a_counter {

     my ($twig, $counter, $type, $id, $results) = @_;
     my @report = $counter->children( qq{report[\@type="$type"]});

     for my $report (@report){

         my @stringSet = $report->children( qq{stringSet[\@index="$id"]});
         for my $stringSet (@stringSet){

             my @string_list = $stringSet->children_text('string');
             push @$results, @string_list;
         }
     }

     $counter->purge; # free the memory of $counter
}
mirod