views:

556

answers:

3

SOLVED see Edit 2

Hello,

I've been writing a Perl program to handle automatic upgrading of local (proprietary) programs (for the company I work for).

Basically, it runs via cron, and unfortunately has a memory leak (or something similar). The problem is that the leak only happens when I'm not looking (aka when run via cron, not via command line).

My code does not contain any circular (or other) references, so the commonly cited tools will not help me (Devel::Cycle, Devel::Peek).

How would I go about figuring out what is using so much memory that the kernel kills it?

Basically, the code SFTPs into a server (using `sftp... `), calls OpenSSL to verify the file, and then SFTPs more if more files are needed, and installs them (untars them).

I have seen delays (~15 sec) before the first SFTP session, but it has never used so much memory as to be killed (in my presence).

If I can't sort this out, I'll need to re-write in a different language, and that will take precious time.

Edit: The following message is printed out by the kernel which led me to believe it was a memory leak:

[100023.123] Out of memory: kill process 9568 (update.pl) score 325406 or a child
[100023.123] Killed Process 9568 (update.pl)

I don't believe it is an issue with cron because of the stalling (for ~15 sec, sometimes) when running it via the command-line. Also, there are no environmental variables used (at least by what I've written, maybe underlying things do?)

Edit 2: I found the issue myself, with help from the below comment by mobrule (in response to this question). It turns out that the script was called from a crontab of a user (non-root) just once a day and that (non-root privs) caused a special infinite loop situation.

Sorry guys, I feel kinda stupid for not finding this before, but thanks.

mobrule, if you submit your comment as an answer, I will accept it as it lead to me finding the problem.

End Edits

Thanks, Brian

P.S. I may be able to post small snippets of code, but not the whole thing due to company policy.

+1  A: 

If it is run by cron, that shouldn't it die after iteration? If that is the case, hard for me to see how a memory leak would be a big deal...

Are you sure it is the script itself, and not the child processes that are using the memory? Perhaps it ends up creating a real lot of ssh sessions , instead of doing a bunch of stuff in one session?

Kyle Brandt
Well, the kernel prints the following message to stderr (so I see it when I come back from whatever I'm doing)[100023.123] Out of memory: kill process 9568 (update.pl) score 325406 or a child[100023.123] Killed Process 9568 (update.pl)So that led me to believe it was the perl script itself. But either way, is there a way I could find it if it was one of the child programs?And the reason I can't have it leak so much memory is because a video streaming service is running and it HAS to not have any macro-blocks (by specification), and if it has no memory, it has macro-blocks.
HalfBrian
Looks like that might be to prevent fork bombs, any chance you have something that could reduced to something like: fork while fork
Kyle Brandt
Nope, the only `fork`s I have are from Perl's backtick operator (`) (if that even forks, I don't know for sure)
HalfBrian
since it seems to be memory limit issue, what are the hard and soft process limits for all parameters reported bu the ulimit command.Also what OS are you using?
David Harris
Perl uses fork and exec to implement the backtick operator and the process limits in child;'s environment may depend on those of the parent. (For instance on AIX, I once had forks fail when creating grandchildren because I had too many child processes!!!)
David Harris
One other thought ... what version of pe4rl are you running?
David Harris
Thank you so much for the help David, but I found the problem (infinite recursion), see Edit 2 of the main post for info.
HalfBrian
+1  A: 

How do you know that it's a memory leak? I can think of many other reasons why the OS would kill a program.

The first question I would ask is "Does this program always work correctly from the command line?". If the answer is "No" then I'd fix these issues first.

On the other hand if the answer is "Yes", I would investigate all the differences between having the program executed under cron and from the command line to find out why it is misbehaving.

David Harris
I don't know for sure if it is a memory "leak" but it certainly uses all of my memory. The kernel prints this message: [100023.123] Out of memory: kill process 9568 (update.pl) score 325406 or a child [100023.123] Killed Process 9568 (update.pl). And the program works 100% of the time (in my experience) from the command line, but the cron is run every minutes (for testing) so it has more chances to use all the memory.
HalfBrian
downvote for what? This is a perfectly good answer. In fact, a similar process led the OP to find a solution to his problem...
Leonardo Herrera
+2  A: 

You could try using Devel::Size to profile some of your objects. e.g. in the main:: scope (the .pl file itself), do something like this:

use Devel::Size qw(total_size);

foreach my $varname (qw(varname1 varname2 ))
{
    print "size used for variable $varname: " . total_size($$varname) . "\n";
}

Compare the actual size used to what you think is a reasonable value for each object. Something suspicious might pop out immediately (e.g. a cache that is massively bloated beyond anything that sounds reasonable).

Other things to try:

  • Eliminate bits of functionality one at a time to see if suddenly things get a lot better; I'd start with the use of any external libraries
  • Is the bad behaviour localized to just one particular machine, or one particular operating system? Move the program to other systems to see how its behaviour changes.
  • (In a separate installation) try upgrading to the latest Perl (5.10.1), and also upgrade all your CPAN modules
Ether
Why the downvote?
Ether