views:

2228

answers:

5

What tricks do people use to manage the available memory of an interactive R session? I use the functions below [based on postings by Petr Pikal and David Hinds to the r-help list in 2004] to list (and/or sort) the largest objects and to occassionally rm() some of them. But by far the most effective solution was ... to run under 64-bit Linux with ample memory.

Any other nice tricks folks want to share? One per post, please.

# improved list of objects
.ls.objects <- function (pos = 1, pattern, order.by,
                        decreasing=FALSE, head=FALSE, n=5) {
    napply <- function(names, fn) sapply(names, function(x)
                                         fn(get(x, pos = pos)))
    names <- ls(pos = pos, pattern = pattern)
    obj.class <- napply(names, function(x) as.character(class(x))[1])
    obj.mode <- napply(names, mode)
    obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
    obj.size <- napply(names, object.size)
    obj.dim <- t(napply(names, function(x)
                        as.numeric(dim(x))[1:2]))
    vec <- is.na(obj.dim)[, 1] & (obj.type != "function")
    obj.dim[vec, 1] <- napply(names, length)[vec]
    out <- data.frame(obj.type, obj.size, obj.dim)
    names(out) <- c("Type", "Size", "Rows", "Columns")
    if (!missing(order.by))
        out <- out[order(out[[order.by]], decreasing=decreasing), ]
    if (head)
        out <- head(out, n)
    out
}
# shorthand
lsos <- function(..., n=10) {
    .ls.objects(..., order.by="Size", decreasing=TRUE, head=TRUE, n=n)
}
+12  A: 

Ensure you record your work in a reproducible script. From time-to-time, reopen R, then source() your script. You'll clean out anything you're no longer using, and as an added benefit will have tested your code.

hadley
My strategy is to break my scripts up along the lines of load.R and do.R, where load.R may take quite some time to load in data from files or a database, and does any bare minimum pre-processing/merging of that data. The last line of load.R is something to save the workspace state.Then do.R is my scratchpad whereby I build out my analysis functions. I frequently reload do.R (with or without reloading the workspace state from load.R as needed).
Josh Reich
That's a good technique. When files are run in a certain order like that, I often prefix them with a number: `1-load.r`, `2-explore.r`, `3-model.r` - that way it's obvious to others that there is some order present.
hadley
@josh: that's good advice, put it in an answer on it's own and earn some votes!
pufferfish
I can't back this idea up enough. I've taught R to a few people and this is one of first things I say. This also applies to any language where development incorporates a REPL and a file being edited (i.e. Python). rm(ls=list()) and source() works too, but re-opening is better (packages cleared too).
Vince
+4  A: 

That's a good trick.

One other suggestion is to use memory efficient objects wherever possible: for instance, use a matrix instead of a data.frame.

This doesn't really address memory management, but one important function that isn't widely known is memory.limit(). You can increase the default using this command, memory.limit(size=2500), where the size is in MB. As Dirk mentioned, you need to be using 64-bit in order to take real advantage of this.

Shane
Isn't this only applicable to Windows?
Christopher DuBois
+2  A: 

I never save an R workspace. I use import scripts and data scripts and output any especially large data objects that I don't want to recreate often to files. This way I always start with a fresh workspace and don't need to clean out large objects. That is a very nice function though.

kpierce8
+2  A: 

To further illustrate the common strategy of frequent restarts, we can use littler which allows us to run simple expressions directly from the command-line. Here is an example I sometimes use to time different BLAS for a simple crossprod.

 r -e'N<-3*10^3; M<-matrix(rnorm(N*N),ncol=N); print(system.time(crossprod(M)))'

Likewise,

 r -lMatrix -e'example(spMatrix)'

loads the Matrix package (via the --packages | -l switch) and runs the examples of the spMatrix function. As r always starts 'fresh', this method is also a good test during package development.

Last but not least r also work great for automated batch mode in scripts using the '#!/usr/bin/r' shebang-header. Rscript is an alternative where littler is unavailable (e.g. on Windows).

Dirk Eddelbuettel
+2  A: 

I love Dirk's .ls.objects() script but I kept squinting to count characters in the size column. So I did some ugly hacks to make it present with pretty formatting for the size:

.ls.objects <- function (pos = 1, pattern, order.by,
                        decreasing=FALSE, head=FALSE, n=5) {
    napply <- function(names, fn) sapply(names, function(x)
                                         fn(get(x, pos = pos)))
    names <- ls(pos = pos, pattern = pattern)
    obj.class <- napply(names, function(x) as.character(class(x))[1])
    obj.mode <- napply(names, mode)
    obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
    obj.size <- napply(names, object.size)
    obj.prettysize <- sapply(obj.size, function(r) prettyNum(r, big.mark = ",") )
    obj.dim <- t(napply(names, function(x)
                        as.numeric(dim(x))[1:2]))
    vec <- is.na(obj.dim)[, 1] & (obj.type != "function")
    obj.dim[vec, 1] <- napply(names, length)[vec]
    out <- data.frame(obj.type, obj.size,obj.prettysize, obj.dim)
    names(out) <- c("Type", "Size", "PrettySize", "Rows", "Columns")
    if (!missing(order.by))
        out <- out[order(out[[order.by]], decreasing=decreasing), ]
        out <- out[c("Type", "PrettySize", "Rows", "Columns")]
        names(out) <- c("Type", "Size", "Rows", "Columns")
    if (head)
        out <- head(out, n)
    out
}
JD Long