ansaurus

Question

Can XML::Twig start parsing an XML file from a given line number?

Answer 1

+2 A:

Perhaps this example will give you some ideas for an alternative strategy. In particular, you might be able to combine the idea in index_file with Zoul's suggestion about seeking to a location before passing off the file handle to XML::Twig.

use strict;
use warnings;

# Index the XML file, storing start and end positions
# for each class in the document. You pay this cost only once.
sub index_file {
    local @ARGV = (shift);
    my (%index, $prev);
    while (<>){
        if ( /^<class name=(\w+)>/ ) {
            my $start = tell() - length();
            $index{$1} = { start => $start, end => undef };

            $index{$prev}{end} = $start - 1 if defined $prev;
            $prev = $1;
        }        
        $index{$prev}{end} = tell if eof;
    }
    return \%index;
}

# Use the index to retrieve the XML for a particular class.
# This allows you to jump quickly to any section of interest.
# It assumes that the sections of the XML document are small enough
# to be held in memory.
sub get_section {
    my ($file_name, $class_name, $index) = @_;
    my $ind = $index->{$class_name};

    open(my $fh, '<', $file_name) or die $!;    
    seek $fh, $ind->{start}, 0;
    read( $fh, my $xml_section, $ind->{end} - $ind->{start} );

    return $xml_section;
}

# Example usage.
sub main {
    my ($file_name) = @_;
    my $index = index_file($file_name);
    for my $cn (keys %$index){
        # Process only sections of interest.
        next unless $cn eq 'math' or $cn eq 'english';
        my $xml = get_section($file_name, $cn, $index);

        # Pass off to XML::Twig or whatever.
        print $xml;
    }
}

main(@ARGV);

FM 2010-08-02 13:17:15

very smart solution. I will try to see the performance using this approach.

2010-08-02 20:14:38

Answer 2

+1 A:

The parse method of XML::Twig accepts an IO::Handle, so that you could probably seek to the right line yourself? And there’s also an input_filter parameter to the XML::Twig constructor where you could skip the first n unwanted lines.

zoul 2010-08-02 16:37:12

thanks zoul. I I will try to understand what input_filter means

2010-08-02 21:22:12

ansaurus

tags:

views:

answers:

Can XML::Twig start parsing an XML file from a given line number?

related questions