+5  A: 

XML::Twig has a simplify method which you can call on a XML element which according to docs says:

Return a data structure suspiciously similar to XML::Simple's

Here is an example:

use XML::Twig;
use Data::Dumper;

my $twig = XML::Twig->new(
    twig_handlers => {
        rec => \&rec,
    }
)->parsefile( 'data.xml' );


sub rec {
    my ($twig, $rec) = @_;
    my $data = $rec->simplify;
    say Dumper $data;
    $rec->purge;
}

NB. The $rec->purge cleans out the record immediately from memory.

Running this against your XML example produces this:

$VAR1 = {
          'f1' => 'v1',
          'f2' => 'v2'
        };

$VAR1 = {
          'f1' => 'v1b',
          'f2' => 'v2b'
        };

$VAR1 = {
          'f1' => 'v1c',
          'f2' => 'v2c'
        };

Which I hope is suspiciously like what comes out of XML::Simple :)

/I3az/

draegtun
+4  A: 

As the author of XML::Simple, I'd just like to correct some misconceptions in your question.

XML::Simple isn't a DOM parser, in fact it isn't a parser at all. It delegates all parsing duties to either a SAX parser or XML::Parser. The speed of parsing will depend on which parser module is the default on your system. When you run 'make test' for the XML::Simple distribution, the output will list the default parser.

If the default parser on your system is XML::SAX::PurePerl then it will be slow and more importantly buggy too. If that's the case then I'd recommend installing either XML::Expat or XML::ExpatXS for an immediate speed up. (Whichever SAX parser is installed last will be the default from that point).

Having said that, your requirements are a bit contradictory, you want something that returns your whole document as a hash and yet you don't want a parser that slurps the whole document into memory.

I understand your short-term goals, but as a longer term solution, I'd recommend migrating your code to XML::LibXML. It is a DOM parser but it's very fast because all the grunt work is done in C. Best of all the built-in XPath support makes it even simpler to use than XML::Simple - see this article.

Grant McLean
@Grant - (1) we have experimented with changing back-end parsers via `$ENV{XML_SIMPLE_PREFERRED_PARSER}`. It provided significant speedup, but NOT comparable to pure C solution (XPath module). So our conclusion was that a large amount of time is spent on building the entire data tree in memory as opposed to just parsing.
DVK
@Grant - (2) We don't want to return the whole document - we want an iterator that returns 1-level-deep tags as individual hashes, without storing the whole thing in memory (at least in Perl - not sure what the underlying XS module does)
DVK
@Grant - (3) Unfortunately, the "correct" solution is not feasible - I CAN attempt to change the implementation of one module (whose API is basically "give me the next hashref"); but changing every single user of that module to use XPath is looking like too much of an effort to justify. Thus my looking for something that can emulate XML::Simple's putput per individual 2d level tag.
DVK
@Grant - (4) ... and thanks for feedback! One of the beauties of SO :)
DVK
A: 

Take a look at XML::LibXML::Reader.

blah
Downvote: ⑴ Link text does not match link target. ⑵ Neither Reader nor SAX fulfil the question's requirement. ⑶ [Grant already recommended LibXML one month ago.](http://stackoverflow.com/questions/2912462#2916863)
daxim
@daxim: XML::LibXML::Reader provides a 'pull' API to the libxml parsing library. This is an entirely different paradigm to the more commonly used DOM/XPath API of XML::LibXML and also seems like a good fit for your stated requirement of an 'iterator'. Don't dismiss it out of hand.
Grant McLean