views:

1279

answers:

4

I am having some problems with memory in Perl. When I fill up a big hash, I can not get the memory to be released back to the OS. When I do the same with a scalar and use undef, it will give the memory back to the OS.

Here is a test program I wrote.

#!/usr/bin/perl
###### Memory test
######

## Use Commands
use Number::Bytes::Human qw(format_bytes);
use Data::Dumper;
use Devel::Size qw(size total_size);

## Create Varable
my $share_var;
my %share_hash;
my $type_hash = 1;
my $type_scalar = 1;

## Start Main Loop
while (true) {
    &Memory_Check();
    print "Hit Enter (add to memory): "; <>;
    &Up_Mem(100_000);
    &Memory_Check();

    print "Hit Enter (Set Varable to nothing): "; <>;
    $share_var = "";
    $share_hash = ();
    &Memory_Check();

    print "Hit Enter (clean data): "; <>;
    &Clean_Data();
    &Memory_Check();

    print "Hit Enter (start over): "; <>;
}

exit;


#### Up Memory
sub Up_Mem {
    my $total_loops = shift;
    my $n = 1;
    print "Adding data to shared varable $total_loops times\n";

    until ($n > $total_loops) {
     if ($type_hash) {
      $share_hash{$n} = 'X' x 1111;
     }
     if ($type_scalar) {
      $share_var .= 'X' x 1111;
     }
     $n += 1;
    }
    print "Done Adding Data\n";
}

#### Clean up Data
sub Clean_Data {
    print "Clean Up Data\n";

    if ($type_hash) {
     ## Method to fix hash (Trying Everything i can think of!
     my $n = 1;
     my $total_loops = 100_000;
     until ($n > $total_loops) {
      undef $share_hash{$n};
      $n += 1;
     }

     %share_hash = ();
     $share_hash = ();
     undef $share_hash;
     undef %share_hash;
    }
    if ($type_scalar) {
     undef $share_var;
    }
}

#### Check Memory Usage
sub Memory_Check {
    ## Get current memory from shell
    my @mem = `ps aux | grep \"$$\"`;
    my($results) = grep !/grep/, @mem;

    ## Parse Data from Shell
    chomp $results;
    $results =~ s/^\w*\s*\d*\s*\d*\.\d*\s*\d*\.\d*\s*//g; $results =~ s/pts.*//g;
    my ($vsz,$rss) = split(/\s+/,$results);

    ## Format Numbers to Human Readable
    my $h = Number::Bytes::Human->new();
    my $virt = $h->format($vsz);
    my $h = Number::Bytes::Human->new();
    my $res = $h->format($rss);

    print "Current Memory Usage: Virt: $virt  RES: $res\n";

    if ($type_hash) {
     my $total_size = total_size(\%share_hash);
     my @arr_c = keys %share_hash;
     print "Length of Hash: " . ($#arr_c + 1) . "  Hash Mem Total Size: $total_size\n";
    }
    if ($type_scalar) {
     my $total_size = total_size($share_var);
     print "Length of Scalar: " . length($share_var) . "  Scalar Mem Total Size: $total_size\n";
    }

}

OUTPUT:

./Memory_Undef_Simple.cgi 
Current Memory Usage: Virt: 6.9K  RES: 2.7K
Length of Hash: 0  Hash Mem Total Size: 92
Length of Scalar: 0  Scalar Mem Total Size: 12
Hit Enter (add to memory): 
Adding data to shared varable 100000 times
Done Adding Data
Current Memory Usage: Virt: 228K  RES: 224K
Length of Hash: 100000  Hash Mem Total Size: 116813243
Length of Scalar: 111100000  Scalar Mem Total Size: 111100028
Hit Enter (Set Varable to nothing): 
Current Memory Usage: Virt: 228K  RES: 224K
Length of Hash: 100000  Hash Mem Total Size: 116813243
Length of Scalar: 0  Scalar Mem Total Size: 111100028
Hit Enter (clean data): 
Clean Up Data
Current Memory Usage: Virt: 139K  RES: 135K
Length of Hash: 0  Hash Mem Total Size: 92
Length of Scalar: 0  Scalar Mem Total Size: 24
Hit Enter (start over): 

So as you can see the memory goes down, but it only goes down the size of the scalar. Any ideas how to free the memory of the hash?

Also Devel::Size shows the hash is only taking up 92 bytes even though the program still is using 139K.

+8  A: 

In general, you cannot expect perl to release memory to the OS.

See the FAQ: How can I free an array or hash so my program shrinks?

It is always a good idea to read the FAQ list, also installed on your computer, before wasting your time.

Sinan Ünür
Your RTFF link is very good. It points out that this is OS dependent. If your OS supports it, you can free memory back to the OS. I have written code that does exactly what the OP desires using ActivePerl on WinXP. There is no need for the extra hostility, please consider fixing your first paragraph.
daotoad
I defused this offensiveness bomb a bit. We need people like you having a rep > 10k! Don't take such risks, please.
innaM
@daotoad and Manni It was a timing issue. When I wrote that, the original post was a malformatted mess and the only thing I could discern was the very first lines. See the comment by EightyEight above as well. Anyway, thanks for taking care of it.
Sinan Ünür
+16  A: 

Generally, yeah, that's how memory management on UNIX works. If you are using Linux with a recent glibc, and are using that malloc, you can return free'd memory to the OS. I am not sure Perl does this, though.

If you want to work with large datasets, don't load the whole thing into memory, use something like BerkeleyDB:

http://search.cpan.org/dist/BerkeleyDB/BerkeleyDB.pod

Example code, stolen verbatim:

  use strict ;
  use BerkeleyDB ;

  my $filename = "fruit" ;
  unlink $filename ;
  tie my %h, "BerkeleyDB::Hash",
              -Filename => $filename,
              -Flags    => DB_CREATE
      or die "Cannot open file $filename: $! $BerkeleyDB::Error\n" ;

  # Add a few key/value pairs to the file
  $h{apple}  = "red" ;
  $h{orange} = "orange" ;
  $h{banana} = "yellow" ;
  $h{tomato} = "red" ;

  # Check for existence of a key
  print "Banana Exists\n\n" if $h{banana} ;

  # Delete a key/value pair.
  delete $h{apple} ;

  # print the contents of the file
  while (my ($k, $v) = each %h)
    { print "$k -> $v\n" }

  untie %h ;

(OK, not verbatim. Their use of use vars is ... legacy ...)

You can store gigabytes of data in a hash this way, and you will only use a tiny bit of memory. (Basically, whatever BDB's pager decides to keep in memory; this is controllable.)

jrockway
+1 Excellent demonstration of the advice given in the last part of the FAQ answer: http://faq.perl.org/perlfaq3.html#How_can_I_make_my_Pe1
Sinan Ünür
The FAQ is wrong about performance, usually you hit the cache and this is no more costly (in terms of time) than accessing the in-memory structure. (And once you start swapping, in-memory structures are horrifyingly slow, since hashes do not have good locality of reference. I remember writing some ETL scripts that ran several orders of magnitude faster with tied BDB hashes instead of native hashes.)
jrockway
@jrockway I guess the performance penalty would only matter when you would not be worried about memory usage: Small data structures that completely fit in memory on a lightly loaded machine.
Sinan Ünür
Yeah. Don't use this for parsing your three-line /etc/passwd or whatever. (Not that you would notice the slowness for three records, of course :)
jrockway
I added this to the end of your code:$SIG{CHLD} = 'IGNORE';unless (my $pid = fork) { print "$h{banana}\n"; $h{banana} = 'YELLOW'; print "$h{banana}\n"; exit;}$SIG{CHLD} = 'IGNORE';unless (my $pid = fork) { sleep 2; print "$h{banana}\n"; exit;}sleep 3;The hash update is not seen in different fork. Is there a way to force an update across processes?
clintonm9
At that point, you should probably just use the raw BDB interface. The "tie" interface is just for quick scripts.
jrockway
+6  A: 

Why do you want Perl to release the memory to the OS? You could just use a larger swap.

If you really must, do your work in a forked process, then exit.

ysth
This answer does not deserve a downvote. A forked process is a perfectly reasonable way to deal with well defined temporary spikes in memory usage in long running programs.
Sinan Ünür
The issues is, the server has 3 GB of ram. 1GB for OS and 1GB for MySQL. My process will start at 27mb and it will get up to about 800mb. Then the system will start to go into swap and slow everything down. The problem with the fork is that it will copy all the 800mb to the new process.
clintonm9
Also, to add more to this; I am using different threads to do different things asynchronous. use threads;use threads::shared;use Thread::Queue;So I will pass data to a shared hash then the other thread will process the data as it comes in. This is passed to many threads doing different things. I am guessing at some point the hash is getting very big, and taking up a lot of memory.Maybe there is a better way of during this in general? My problem with different fork process is it seems a lot harder to pass data back and forth.Any thoughts?
clintonm9
@clintonm9 Given your description, I am sure you will be better off using an external data store (such as a tied DB as in jrockway's example or maybe SQLite which is my personal favorite.
Sinan Ünür
Seems a lot slower to select from Mysql every second over and over again. I need as close to real time as possable. Do you see any problem with doing that?
clintonm9
@clintonm9: ouch. Are you aware that shared data is actually has a copy in each thread? Try storing your data in a tied DB or SQLite and see if it's fast enough for your needs. If it isn't, you are going to have to rethink from the ground up.
ysth
@clintonm9: also, compare the cost of your time working on this to the cost of additional memory (or even an additional server)...
ysth
A: 

Try recompiling perl with the option -Uusemymalloc to use the system malloc and free. You might see some different results

casey