I'm trying to parse a large XML file. I read it using XML::SAX (using Expat, not the perl implementation) and put all the second level and below nodes into my "Node" class:
package Node;
use Moose;
has "name" =>
(
isa => "Str",
reader => 'getName'
);
has "text" =>
(
is => "rw",
isa => "Str"
);
has "attrs" =>
(
is => "rw",
isa => "HashRef[Str]"
);
has "subNodes" =>
(
is => "rw",
isa => "ArrayRef[Node]",
default => sub { [] }
);
sub subNode
{
my ($self, $name) = @_;
my $subNodeRef = $self->subNodes;
my @matchingSubnodes = grep { $_->getName eq $name } @$subNodeRef;
if (scalar(@matchingSubnodes) == 1)
{
return $matchingSubnodes[0];
}
return undef;
}
1;
In the "end_element" sub, I check if this is a node I care about, and if it is, I do some further processing.
This all worked fine on my test files, but the day before yesterday I threw it at my real file, all 13 million lines of it, and it's taking forever. It's been running for over 36 hours. How do I tell if it's Moose or XML::SAX that's the bottleneck? Is Moose always this slow, or am I using it wrong?
Update Doing a profile on a 20,000 line subset of the data shows that it is Moose that's the bottleneck - specifically in Class::MOP::Class::compute_all_applicable_attributes (13.9%) and other Class and Moose classes.