I'm chasing a couple of potential memory leaks in a Perl code base and I'd like to know about common pitfalls with regards to memory (mis-)management in Perl.
What are common leak patterns you have observed in Perl code?
I'm chasing a couple of potential memory leaks in a Perl code base and I'd like to know about common pitfalls with regards to memory (mis-)management in Perl.
What are common leak patterns you have observed in Perl code?
Circular references are by far the most commonthe canonical cause of leaks.
sub leak {
my ($foo, $bar);
$foo = \$bar;
$bar = \$foo;
}
Perl uses reference counting garbage collection. This means that perl keeps a count of what pointers to any variable exist at a given time. If the variable goes out of scope and the count is 0, the variable is cleared.
In the example code above, $foo
and $bar
are never collected and a copy will persist after every invocation of leak()
because both variables have a reference count of 1.
The easiest way to prevent this issue is to use weak references. Weak references are references that you follow to access data, but do not count for garbage collection.
use Scalar::Util qw(weaken);
sub dont_leak {
my ($foo, $bar);
$foo = \$bar;
$bar = \$foo;
weaken $bar;
}
In dont_leak()
, $foo
has a reference count of 0, $bar
has a ref count of 1. When we leave the scope of the subroutine, $foo
is returned to the pool, and its reference to $bar
is cleared. This drops the ref count on $bar
to 0, which means that $bar
can also return to the pool.
Update: brain d foy asked if I have any data to back up my assertion that circular references are common. No, I don't have any statistics to show that circular references are common. They are the most commonly talked about and best documented form of perl memory leaks.
My experience is that they do happen. Here's a quick rundown on the memory leaks I have seen over a decade of working with Perl.
I've had problems with pTk apps developing leaks. Some leaks I was able to prove were due to circular references that cropped up when Tk passes window references around. I've also seen pTk leaks whose cause I could never track down.
I've seen the people misunderstand weaken
and wind up with circular references by accident.
I've seen unintentional cycles crop up when too many poorly thought out objects get thrown together in a hurry.
On one occasion I found memory leaks that came from an XS module that was creating large, deep data structures. I was never able to get a reproducible test case that was smaller than the whole program. But when I replaced the module with another serializer, the leaks went away. So I know those leaks came from the XS.
So, in my experience cycles are a major source of leaks.
Fortunately, there is a module to help track them down.
As to whether big global structures that never get cleaned up constitute "leaks", I agree with brian. They quack like leaks (we have ever-growing process memory usage due to a bug), so they are leaks. Even so, I don't recall ever seeing this particular problem in the wild.
Based on what I see on Stonehenge's site, I guess brian sees a lot of sick code from people he is training or preforming curative miracles for. So his sample set is easily much bigger and varied than mine, but it has its own selection bias.
Which cause of leaks is most common? I don't think we'll ever really know. But we can all agree that circular references and global data junkyards are anti-patterns that need to be eliminated where possible, and handled with care and caution in the few cases where they make sense.
I've had problems with XS in the past, both my own hand-rolled stuff and CPAN modules, where memory is leaked from within the C code if it's not properly managed. I never managed to track the leaks down; the project was on a tight deadline and had a fixed operational lifetime, so I papered over the issue with a daily cron
reboot. cron
is truly wonderful.
If the problem is in the Perl code, you might have a reference that points to itself, or a parent node.
Usually it comes in the form of an object, that reference a parent object.
{ package parent;
sub new{ bless { 'name' => $_[1] }, $_[0] }
sub add_child{
my($self,$child_name) = @_;
my $child = child->new($child_name,$self);
$self->{$child_name} = $child; # saves a reference to the child
return $child;
}
}
{ package child;
sub new{
my($class,$name,$parent) = @_;
my $self = bless {
'name' => $name,
'parent' => $parent # saves a reference to the parent
}, $class;
return $self;
}
}
{
my $parent = parent->new('Dad');
my $child = parent->add_child('Son');
# At this point both of these are true
# $parent->{Son}{parent} == $parent
# $child->{parent}{Son} == $child
# Both of the objects **would** be destroyed upon leaving
# the current scope, except that the object is self-referential
}
# Both objects still exist here, but there is no way to access either of them.
The best way to fix this is to use Scalar::Util::weaken.
use Scalar::Util qw'weaken';
{ package child;
sub new{
my($class,$name,$parent) = @_;
my $self = bless {
'name' => $name,
'parent' => $parent
}, $class;
weaken ${$self->{parent}};
return $self;
}
}
I would recommend dropping the reference to the parent object, from the child, if at all possible.
Some modules from CPAN use circular references to do their work, e.g. HTML::TreeBuilder (which represents HTML tree). They will require you to run some destroying method/routine at the end. Just read the docs :)