views:

223

answers:

3

What's the best way to profile Perl regexes to determine how expensive they are?

A: 

My preferred way would be to have a large set of input data to the RE then process that data N times (e.g., 100,000) to see how long it takes.

Then tweak the RE and try again (keep all the old REs as comments in case you need to benchmark them again in future, who knows what wondrous optimizations may appear in Perl 7?).

There may well be tools which can analyze REs to give you execution paths for specific inputs (like the analysis tools in DBMS') but, since Perl is the language of the lazy (a commandment handed down by Larry himself), I couldn't be bothered going to find it :-).

paxdiablo
+12  A: 

Perl comes with the Benchmark module, which can take a number of code samples, and answer the question of "which one is faster?". I've got a Perl Tip on Benchmarking Basics, and while that doesn't use regexps per se, it does give a quick and useful introduction to the topic, along with further references.

brian d foy also has an excellent chapter on benchmarking in his Mastering Perl book. He's been kind enough to put the chapter on-line as a draft, which is well worth the read. I really can't recommend it enough.

Paul

pjf
Besides the Benchmarking and Profiling chapters, check out the regex chapter for some tools.
brian d foy
perl -Mre=debug / use re 'debug';
Brad Gilbert
+3  A: 

Just saying "use the Benchmark" module doesn't really answer the question, though. Benchmarking a regex is different than benchmarking a calculation; you need a large amount of realistic data so you can stress the regex as real data would. If most of your data will match, you'd want a regex that matches quickly; if most will fail, you want a regex that fails quickly. They could wind up being the same regex, but maybe not.

Joe Casadonte