views:

107

answers:

1

I'm using following code to parse rather large xml file (> 50GB):

use XML::Parser;

my $p = new XML::Parser(
    'Handlers' => {
        'Start' => \&handle_start,
        'End'   => \&handle_end,            
        'Char'  => \&handle_char,
    }
);
$p->parsefile( 'source.xml' );

...

sub handle_start {
    ...
}

The problem is that it takes very long to parse, and I'd like to get some kind of progress meter.

I'd prefer a way that doesn't require first scanning whole file just to get total count - so, for example, current position in input file would be perfect, because I could simply check at start total size of file, and then in handle_start() check current position, and print it.

+7  A: 

You're probably looking for the current_byte method of the parser object, which is documented in XML::Parser::Expat.

So you could save the size of the file in a global before starting the parse:

my $file_size = -s $input_file;

and then calculate your progress in the handler like this:

sub handle_start {
    my($parser, $element) = @_;

    my $pos = $parser->current_byte;
    printf("%-20s %5.1f%%\n", $element, $pos * 100 / $file_size);
}
Grant McLean
Thanks a lot. That's exactly what I need.
depesz