views:

96

answers:

2

I have been told that disabling backreferences in perl improves performance (provided you're not using them), and that if you don't use any backreferences perl will do this by itself.

Now I have a perl script with a large number of regex in it and only a single one uses a backreference and I would like to know the following:

  • Given I have a very large number of regex (let's assuming most of my processing time is regex) does disabling back references a significant performance improvement? or are there criteria which I can use to know if this is the case?
  • Is there a way I can disable backreferences once at the beginning and only reenable it when I need it (I know about (?:, but I don't want to have to add it to every grouping)?
  • Would scoping allow for perl to optimize this backreferencing behavior for me (ie. does a sub or an eval change whether perl turns off backreferencing for things outside of it)?
+1  A: 

The only real way to check is to profile it yourself. Take a look at the Benchmark module (it's in core perl so you won't have to install it). Set up a couple of benchmarks: one where you have a function that has (say) ten regexes without any backreferences, and one that has the same ten but one of them uses backreferences.

If you find that the backreference-including regex really slows down the rest of your regexes, maybe try to re-write that one to not include the backreference somehow...?

CanSpice
So, benchmarking helps me make this decision for this particular piece of code, but I was hoping for some information so I can more easily make this decision going forward. The bottom line here is, I don't understand a ton about disabling/enabling back referencing and I'm trying to get more information so I can make an informed decision.
tzenes
+13  A: 

Using capturing parentheses only penalizes regular expressions that use them, so use them where you need to capture, but use non-capturing parens (?:...) when all you need is grouping.

Using any of the global match variables

$& $` $'

imposes a performance penalty on all regular expressions, so avoid using them if at all possible. (But once you do, go nuts! You've already paid the price.) There's no way to turn this on and off. Once Perl detects that they're used anywhere (even in third-party modules you may use) the feature is turned on.

As of Perl 5.10.0, there are alternatives for the global match variables that only penalize regular expressions that use them. If you add the /p modifier to a particular regular expression you can then use

${^PREMATCH} ${^MATCH} ${^POSTMATCH}

instead.

Michael Carman