views:

39

answers:

1

I am using R on some relatively big data and am hitting some memory issues. This is on Linux. I have significantly less data than the available memory on the system so it's an issue of managing transient allocation.

When I run gc(), I get the following listing

           used   (Mb) gc trigger   (Mb)  max used   (Mb)
Ncells   2147186  114.7    3215540  171.8   2945794  157.4
Vcells 251427223 1918.3  592488509 4520.4 592482377 4520.3

yet R appears to have 4gb allocated in resident memory and 2gb in swap. I'm assuming this is OS-allocated memory that R's memory management system will allocate and GC as needed. However, lets say that I don't want to let R OS-allocate more than 4gb, to prevent swap thrashing. I could always ulimit, but then it would just crash instead of working within the reduced space and GCing more often. Is there a way to specify an arbitrary maximum for the gc trigger and make sure that R never os-allocates more? Or is there something else I could do to manage memory usage?

+2  A: 

In short: no. I found that you simply cannot micromanage memory management and gc().

On the other hand, you could try to keep your data in memory, but 'outside' of R. The bigmemory makes that fairly easy. Of course, using a 64bit version of R and ample ram may make the problem go away too.

Dirk Eddelbuettel
`bigmemory` keeps things on disk, doesn't it?
Shane
No, you may be thinking of `ff` does.
Dirk Eddelbuettel
Ok, good to clear that up. The documentation says that `bigmemory` "may use memory-mapped files". Not sure when or how it relates. I never looked into the bigmemory internals, but it looks like it uses Boost.Interprocess.
Shane
If you instruct it too, it can map memory to files. By default it uses just ram. And hey, Jay and Mike are locals at your RUG, so how come I have to explain this? ;-)
Dirk Eddelbuettel
I know...and Jay *did* explain it, but I'm dense.
Shane