views:

153

answers:

1

I'm working with a large data frame, and have run up against RAM limits. At this point, I probably need to work with a serialized version on the disk. There are a few packages to support out-of-memory operations, but I'm not sure which one will suit my needs. I'd prefer to keep everything in data frames, so the ff package looks encouraging, but there are still compatibility problems that I can't work around.

What's the first tool to reach for when you realize that your data has reached out-of-memory scale?

Thanks.

+7  A: 

You probably want to look at these packages:

  • ff for 'flat-file' storage and very efficient retrieval
  • bigmemory for out-of-R-memory but still in RAM (or file-backed) use
  • biglm for out-of-memory model fitting with lm() and glm()-style models.

and also see the High-Performance Computing task view.

Dirk Eddelbuettel
oh wow, thanks; I've just been rolling my own lm solution to eat the data from a huge mySQL database.
Carl