views:

456

answers:

1

I'm running a regular expression against a large scalar. Though this match isn't capturing anything, my process grows by 30M after this match:

# A
if (${$c} =~ m/\G<<\s*/cgs)
{
    #B
    ...
}

$c is a reference to a pretty big scalar (around 21M), but I've verified that pos(${$c}) is in the right place and the expression matches at the first character, with pos(${$c}) being updated to the correct place after the match. But as I mentioned, the process has grown by about 30M between #A and #B, even though I'm not capturing anything with this match. Where is my memory going?

Edit: Yes, use of $& was to blame. We are using Perl 5.8.8, and my script was using Getopt::Declare, which uses the built-in Text::Balanced. The 1.95 version of this module was using $&. The 2.0.0 version that ships with Perl 5.10 has removed the reference to $& and alleviates the problem.

+19  A: 

Just a quick sanity check, are you mentioning $&, $` or $' (sometimes called $MATCH, $PREMATCH and $POSTMATCH) anywhere in your code? If so, Perl will copy your entire string for every regular expression match, just in case you want to inspect those variables.

"In your code" in this case means indirectly, including using modules that reference these variables, or writing use English rather than use English qw( -no_match_vars ).

If you're not sure, you can use the Devel::SawAmpersand module to determine if they have been used, and Devel::FindAmpersand to figure out where they are used.

There may be other reasons for the increase in memory (which version of Perl are you using?), but the match variables will definitely blow your memory if they're used, and hence are a likely culprit.

Cheerio,

Paul

pjf
Certainly looks like that's it. I can't run FindAmpersand on my perl because I have threads enabled, so I'm recompiling perl just to run this test, but SawAmpersand is reporting yes.
Ryan Olson
grep or ack should also be able to tell you if they are mentioned anywhere in your code.
Leon Timmermans
Ryan Olson