views:

98

answers:

3

I have a legacy Perl CGI page running on Apache that processes a large Excel spreadsheet worth of data, adding to the database as necessary. It gets processed in groups of data, and each group of data gets sent to the database.

After each call to the database, my system's available memory decreases significantly to the point where there is no memory left. Once I finally get the 'Premature end of script headers' error and HTTP Code 500 is returned to the client, the memory is freed back to the system.

Looking through the (complicated) code, I can't find where the memory leak might be occurring. Is there some trick or tool that I can use to determine where the memory is going?

A: 

If the problem is in the Perl code, you might have a reference that points to itself, or a parent node.

Here's a quick sample of code that exhibits this behaviour.

{
  my @a;
  @a = [\@a];
}

Usually it comes in the form of an object, that reference a parent object.

{ package parent;
  sub new{ bless { 'name' => $_[1] }, $_[0] }
  sub add_child{
    my($self,$child_name) = @_;
    my $child = child->new($child_name,$self);
    $self->{$child_name} = $child;   # saves a reference to the child
    return $child;
  }
}
{ package child;
  sub new{
    my($class,$name,$parent) = @_;
    my $self = bless {
      'name' => $name,
      'parent' => $parent # saves a reference to the parent
    }, $class;
    return $self;
  }
}
{
  my $parent = parent->new('Dad');
  my $child  = parent->add_child('Son');

  # At this point both of these are true
  # $parent->{Son}{parent} == $parent
  # $child->{parent}{Son}  == $child

  # Both of the objects **would** be destroyed upon leaving
  # the current scope, except that the object is self-referential
}

# Both objects still exist here, but there is no way to access either of them.

The best way to fix this is to use Scalar::Util::weaken.

use Scalar::Util qw'weaken';
{ package child;
  sub new{
    my($class,$name,$parent) = @_;
    my $self = bless {
      'name' => $name,
      'parent' => $parent
    }, $class;

    weaken ${$self->{parent}};

    return $self;
  }
}

I would recommend dropping the reference to the parent object, from the child, if at all possible.

Brad Gilbert
This is not a method of determining memory leaks. This is an example of how perl code can leak memory, although the Op didn't say i think the plugin is written in C/C++ so in that case you have to manually increment and decrement ref counts.
Rook
A: 

How are you writing to the database? If you are using any of the DBI packages, or custom wrappers, make sure that you are flushing any cached objects or cached variables that you can. These types of memory bloat issues are relatively common, and are usually representative of a shared object cache somewhere that continues to persist.

Things to try:

  • Clear object variables when finished with them
  • Deep-cycle your database connection (this is extreme, but depending on how you are connecting, it may resolve the issue).
  • Load only one group of data at a time, and flush whatever variables or factory objects load it between groups.

Hope that helps some.

macabail
+2  A: 

The short answer is that it sucks to be you. There isn't a nice, ready-to-use program that you can run to get an answer. I'm sorry that I couldn't be more help, but without seeing any code, etc, there really isn't any better advice that anyone can give.

I can't talk about your particular situation, but here are some things I've done in the past. It's important to find out the general area that's causing the problem. It's not much different than other debugging techniques. Usually I find that there's no elegant solution to these things. You just roll up your sleeves and stick your arms elbow-deep in the muck no matter how bad it smells.

First, run the program outside of the web server. If you still see the problem from the command line, be happy: you just (mostly) ruled out a problem with the web server. This might take a little bit of work to make a wrapper script to set up the web environment, but it ends up being much easier since you don't need to mess with restarting the server, etc to reset the environment.

If you can't replicate the problem outside the server, you can still do what I recommend next, it's just more annoying. If it's a webserver problem and not a problem from the command line, the task becomes the discovery about the difference between those two. I'd encountered situations like that.

If it's not a problem with the web server, start bisecting the script like you would for any debugging problem. If you have logging enabled, turn it on and watch the program run while recording its real memory use. When does it blow up? It sounds like you have it narrowed down to some database calls. If you are able to run this from the command line or the debugger, I'd find a pair appropriate breakpoints before and after the memory increase and gradually bring them closer together. You might use modules such as Devel::Size to look at memory sizes for data structures you suspect.

From there it's just narrowing down the suspects. Once you find the suspect, see if you can replicate it in a short example script. You want to eliminate as many contributing-factor possibilities as possible.

Once you think you've found the offending code, maybe you can ask another question that shows the code if you still don't understand what's going on.

If you wanted to get really fancy, you could write your own Perl debugger. It's not that hard. You get a chance to run some subroutines in the DB namespace at the beginning or end of statements. You have your debugging code list memory profiles for the stuff you suspect and look for jumps in memory sizes. I wouldn't try this unless everything else fails.

brian d foy
That's pretty much how I've done all my Perl debugging in the past. I was really hoping there was some really useful tool out there that I'd never heard of. Thanks for the in-depth strategy.
Aaron