views:

249

answers:

4

I'm chasing a couple of potential memory leaks in a Perl code base and I'd like to know about common pitfalls with regards to memory (mis-)management in Perl.

What are common leak patterns you have observed in Perl code?

+12  A: 

Circular references are by far the most commonthe canonical cause of leaks.

sub leak {
    my ($foo, $bar);
    $foo = \$bar;
    $bar = \$foo;
}

Perl uses reference counting garbage collection. This means that perl keeps a count of what pointers to any variable exist at a given time. If the variable goes out of scope and the count is 0, the variable is cleared.

In the example code above, $foo and $bar are never collected and a copy will persist after every invocation of leak() because both variables have a reference count of 1.

The easiest way to prevent this issue is to use weak references. Weak references are references that you follow to access data, but do not count for garbage collection.

use Scalar::Util qw(weaken);

sub dont_leak {
    my ($foo, $bar);
    $foo = \$bar;
    $bar = \$foo;
    weaken $bar;
}

In dont_leak(), $foo has a reference count of 0, $bar has a ref count of 1. When we leave the scope of the subroutine, $foo is returned to the pool, and its reference to $bar is cleared. This drops the ref count on $bar to 0, which means that $bar can also return to the pool.

Update: brain d foy asked if I have any data to back up my assertion that circular references are common. No, I don't have any statistics to show that circular references are common. They are the most commonly talked about and best documented form of perl memory leaks.

My experience is that they do happen. Here's a quick rundown on the memory leaks I have seen over a decade of working with Perl.

I've had problems with pTk apps developing leaks. Some leaks I was able to prove were due to circular references that cropped up when Tk passes window references around. I've also seen pTk leaks whose cause I could never track down.

I've seen the people misunderstand weaken and wind up with circular references by accident.

I've seen unintentional cycles crop up when too many poorly thought out objects get thrown together in a hurry.

On one occasion I found memory leaks that came from an XS module that was creating large, deep data structures. I was never able to get a reproducible test case that was smaller than the whole program. But when I replaced the module with another serializer, the leaks went away. So I know those leaks came from the XS.

So, in my experience cycles are a major source of leaks.

Fortunately, there is a module to help track them down.

As to whether big global structures that never get cleaned up constitute "leaks", I agree with brian. They quack like leaks (we have ever-growing process memory usage due to a bug), so they are leaks. Even so, I don't recall ever seeing this particular problem in the wild.

Based on what I see on Stonehenge's site, I guess brian sees a lot of sick code from people he is training or preforming curative miracles for. So his sample set is easily much bigger and varied than mine, but it has its own selection bias.

Which cause of leaks is most common? I don't think we'll ever really know. But we can all agree that circular references and global data junkyards are anti-patterns that need to be eliminated where possible, and handled with care and caution in the few cases where they make sense.

daotoad
Is there anything to support circular references as the most common form? I hardly ever see the problem, but I do see people with global hashes that they never clear out.
brian d foy
@brian d foy: Strictly speaking, that's not a memory leak, just excessive memory use. It's not a memory *leak* until the program *can't* free the memory anymore.
cjm
Most people don't strictly care why their memory constantly grows slowly (or not so slowly). :)
brian d foy
+1. When faced with leaks, an easy first step is to add [Test::Memory::Cycle](http://search.cpan.org/perldoc?Test::Memory::Cycle) tests *everywhere* in one's unit tests.
Ether
+5  A: 

I've had problems with XS in the past, both my own hand-rolled stuff and CPAN modules, where memory is leaked from within the C code if it's not properly managed. I never managed to track the leaks down; the project was on a tight deadline and had a fixed operational lifetime, so I papered over the issue with a daily cron reboot. cron is truly wonderful.

ire_and_curses
+5  A: 

If the problem is in the Perl code, you might have a reference that points to itself, or a parent node.

Usually it comes in the form of an object, that reference a parent object.

{ package parent;
  sub new{ bless { 'name' => $_[1] }, $_[0] }
  sub add_child{
    my($self,$child_name) = @_;
    my $child = child->new($child_name,$self);
    $self->{$child_name} = $child;   # saves a reference to the child
    return $child;
  }
}
{ package child;
  sub new{
    my($class,$name,$parent) = @_;
    my $self = bless {
      'name' => $name,
      'parent' => $parent # saves a reference to the parent
    }, $class;
    return $self;
  }
}
{
  my $parent = parent->new('Dad');
  my $child  = parent->add_child('Son');

  # At this point both of these are true
  # $parent->{Son}{parent} == $parent
  # $child->{parent}{Son}  == $child

  # Both of the objects **would** be destroyed upon leaving
  # the current scope, except that the object is self-referential
}

# Both objects still exist here, but there is no way to access either of them.

The best way to fix this is to use Scalar::Util::weaken.

use Scalar::Util qw'weaken';
{ package child;
  sub new{
    my($class,$name,$parent) = @_;
    my $self = bless {
      'name' => $name,
      'parent' => $parent
    }, $class;

    weaken ${$self->{parent}};

    return $self;
  }
}

I would recommend dropping the reference to the parent object, from the child, if at all possible.

Brad Gilbert
It's worth mentioning that with objects, you can also use a destructor to break any circular references. Your Child class could have `sub DESTROY { $_[0]->{parent} = undef; }`, or even `sub DESTROY { $_[0] = undef; }` and there is no need for weak references. Weak refs are a better way to handle things, they make correct behavior automatic. It is also worth noting that Moose features automatic reference weakening for attributes: http://search.cpan.org/dist/Moose/lib/Moose/Manual/Attributes.pod#Weak_references It's yet another way Moose makes OO Perl better.
daotoad
@daotoad: You can't use a destructor to break circular references, because the destructor doesn't get called until the reference count drops to zero. If the destructor gets called, you don't have a circular reference problem. If you have circular references, you either need to use weak refs or manually call a destructor method when you're done with an object (which is error prone).
cjm
@cjm, you are right. It's been a while. `weaken` is really the way to go. You have to create a container object to hold the circular reference and let the container's destructor break the cycle. The other option is to have an explicit method call that clears a widget and breaks any cycles. For example, Perl Tk's `Widget::destroy()` method.
daotoad
+2  A: 

Some modules from CPAN use circular references to do their work, e.g. HTML::TreeBuilder (which represents HTML tree). They will require you to run some destroying method/routine at the end. Just read the docs :)

eugene y