ansaurus

Question

Which Perl modules for good for data munging?

Answer 1

+5 A:

It's unlikely that there will ever be a second edition of "Data Munging with Perl". I'm afraid that the economics just don't stack up.

But, you're right that technology has moved on a long way since 2001 and there are plenty of new and improved modules that cover much of the same area as the modules discussed in the book, For example, I can't remember the last time I used XML::Parser or XML::DOM. I seem to use XML::LibXML for the majority of my XML work these days. Also, of course, my discussion of databases is incomplete because it doesn't mention DBIx::Class.

Perhaps it would be an interesting idea to update some of this information through some posts on my Perl blog. I'll give it some thought. Thanks for the idea.

davorg 2010-09-27 07:34:20

Some of the specifics might have changed, but the concepts are the same. :)

brian d foy 2010-09-27 15:30:44

Dave, it would be a pleasure to read these reviews and recipes in your blog some day.

Pablo Marin-Garcia 2010-09-28 17:25:01

Answer 2

+3 A:

re: Parse::RecDescent <=> Regexp::Grammars

Damian Conway has been quoted saying that Regexp::Grammars is the successor to Parse::RecDescent. But even so if Parse::RecDescent still gets the job done for you then continue to use it. The tool you know well is better than the tool you don't know!

However if performance is a key issue and you are running perl 5.10+ then do consider Regexp::Grammars.

Hope Dave doesn't mind but here is his first Parse::RecDescent example from Data Munging with Perl (11.1.1) converted to Regexp::Grammars:

use 5.010;
use warnings;
use Regexp::Grammars;

my $parser = qr{
    <Sentence>

    <rule: Sentence>        <subject> <verb> <object>
    <rule: subject>         <noun_phrase>
    <rule: object>          <noun_phrase>
    <rule: noun_phrase>     <pronoun> | <proper_noun> | <article> <noun>

    <token: verb>           wrote | likes | ate
    <token: article>        a | the | this
    <token: pronoun>        it | he
    <token: proper_noun>    Perl | Dave | Larry
    <token: noun>           book | cat
}xms;

while (<DATA>) {
    chomp;
    print "'$_' is ";
    print 'NOT ' unless $_ =~ $parser;
    say 'a valid sentence';
}

__DATA__
Larry wrote Perl
Larry wrote a book
Dave likes Perl
Dave likes the book
Dave wrote this book
the cat ate the book
Dave got very angry

NB. For those you don't have the book only "Dave got very angry" is an invalid sentence :)

/I3az/

draegtun 2010-09-27 12:26:17

noun_phase = noun_phrase?

Mike 2010-09-27 13:34:04

@Mike: Its annoying you can't copy/paste from printed paper to here :) Well spotted I've `s/noun_phase/noun_phrase/`

draegtun 2010-09-27 13:57:38

Of course "Dave got very angry" isn't a valid sentence. Dave never gets angry :-)

davorg 2010-09-28 07:41:28

@draegtun: +1 Thanks a lot for the example. I have been without using P::RD for 5 years so I would need to read again the pod. Therefore if the concepts of tokens, production rules etc are similar, then from your answer I assume that is wiser to write my new parsers with R::G, isn't it?.

Pablo Marin-Garcia 2010-09-28 17:34:59

@Pablo Marin-Garcia: If i were to start a new parsing project tomorrow then i would choose R::G. I would only use P::RD if the project had to run on a perl prior to 5.10.

draegtun 2010-09-30 09:17:43

@draegtun: thanks for your suggestion. I am using perl 5.12 so I will give a chance to R::G. I hope that the P::RD tutorials and documentation around could be easily transformed to R::G.

Pablo Marin-Garcia 2010-10-03 19:09:16

ansaurus

tags:

views:

answers:

Which Perl modules for good for data munging?

related questions