Which is the way that I can reclaim memory back from my Perl script and/or to prevent perl from memory management pooling?
The most effective method is to have plenty of virtual memory, so that memory that perl has allocated but is not frequently using just gets paged out.
Other than that, it is extremely difficult to keep perl from just allocating more memory over time...not because it is leaked, but because perl really likes to keep things allocated in case they are used again. A small codebase with fairly consistent string sizes will top out after a bit, but that is an exceptional case.
Under apache, the historic technique has been to kill off a process when it reaches a certain size, or after a certain number of requests. This doesn't work so well with threaded MPMs...
As answered on a parallel question : In general, you cannot expect perl to release memory to the OS before the script is finished/terminated. Upon termination all the memory allocated is given back to the OS, but that's an OS feature and isn't Perl-specific.
So you have a limited number of options if you have a long-running script :
- You delegate memory-intensive parts to child processes. This way the memory will be freed when each part is finished. The price to pay is IPC communications.
- You use your own memory-managed structures, usually based on Tie. The price to pay is to handle the load/store to/from backstore if you structure isn't a simple one (even a standard NDBM-based Hash is simple but quite powerful though).
- You use you memory as a precious ressource, and optimize its usage (by using smaller constructs, enabling memory reuse, etc).
Perl "supports" returning memory to the operating system if the operating system is willing to take that memory back. I use the quotes because, IIRC, Perl does not promise when it will give that memory back.
Perl currently does promise when destructors will run, when objects will be deallocated (and, especially, in what order that will happen). But deallocated memory goes to a pool for Perl to use later and that memory -- eventually -- is released to the operating system if the operating system supports it.
I have a similar problem where I'm reading in a large SOAP message from the server. As far as I can tell, SOAP::Lite can't stream the data, so it hast to load it all into memory before I can process it. Even through I only need a small list as the result, it causes the memory footprint of the program to be in the gigabytes. This is a problem, because the script does a lot of network communication, causing the memory to remain allocated for a long time.
As mentioned before, the only true solutions are a) redesign everything, or b) fork/exec. I have here an example of fork/exec that should work to illustrate the solution:
#
# in general, perl will not return allocated memory back to the OS.
#
# to get around this, we must fork/exec
sub _m($){
my $ln = shift;
my $s = qx/ ps -o rss,vsz $$ | grep -v RSS /;
chomp($s);
print STDERR "$$: $s $ln>\n";
}
sub alloc_dealloc(){
# perldoc perlipc for more interesting
# ways of doing this fork:
defined(my $pid = open(KID,'-|')) || die "can't fork $!";
my $result = -1;
if($pid){
my $s = <KID>; eval $s;
}else{
_m(__LINE__);
my $a = [];
# something that allocates a lot of memory...
for($i=0;$i<1024*1024*16;$i++){
push(@$a,int(rand(3)));
}
_m(__LINE__);
# something that processes that huge chunk
# of memory and returns a very small result
my $r=0;
for(@$a){ $r+=$_; }
_m(__LINE__);
@$a = ();
_m(__LINE__);
undef($a);
_m(__LINE__);
# STDOUT goes to parent process
print('$result = '.$r.";\n");
exit;
}
return $result;
}
while(1){
_m(__LINE__);
my $r = alloc_dealloc();
print "Result: $r\n";
_m(__LINE__);
}
This will run forever, producing output like this:
9515: 1892 17876 54>
9519: 824 17876 24>
9519: 790004 807040 31> # <-- chunk of memory allocated in child
9519: 790016 807040 38>
9519: 790068 807040 41>
9519: 527924 544892 43> # <-- partially free()d, but mostly not
9515: 1976 17876 57> # <-- parent process retains its small footprint
Result: 16783001