




I know I can use ls() and rm() to see and remove objects that exist in my environment.

However, when dealing with "old" .RData file, one needs to sometimes pick an environment a part to find what to keep and what to leave out.

What I would like to do, is to have a GUI like interface to allow me to see the objects, sort them (for example, by there size), and remove the ones I don't need (for example, by a check-box interface). Since I imagine such a system is not currently implemented in R, what ways do exist? What do you use for cleaning old .RData files?



+1  A: 

The OS X gui does have such a thing, it's called the Workspace Browser. Quite handy.

I've also wished for an interface that shows the session dependency between objects, i.e. if I start from a plot() and work backwards to find all the objects that were used to create it. This would require parsing the history.

Ken Williams
Thanks Ken - that sounds like a great idea. Do you have a clue how it might be done ?
Tal Galili
@Ken: You can browse the workspace under any platform using `?browseEnv`. `ls.str` provides a console alternative.
Richie Cotton
Richie - thank you, this is very cool (I knew ls.str, but not browseEnv) !
Tal Galili
+9  A: 

I never create .RData files. If you are practicing reproducible research (and you should be!) you should be able to source in R files to go from input data files to all outputs.

When you have operations that take a long time it makes sense to cache them. If often use a construct like:

 if (file.exists("cache.rdata")) { 
 } else {
    # do stuff ...
    save(..., file = "cache.rdata")

This allows you to work quickly from cached files, and when you need to recalculate from scratch you can just delete all the rdata files in your working directory.

I disagreed. Each time loads all files, merge them, prepare? I choose onetime data preparation, save to .RData and do analysis from `load`.
Hi Hadley -In theory I would take your stance, in practice it doesn't always work. For example, I have projects where to get to the relevant data.frame, I would need several minutes of R processing. In which case I would much rather do what Marek wrote.
Tal Galili
Caching is completely orthogonal to this working practice. I've added a note to make that more clear.
+2  A: 

Basic solution is to load your data, remove what you don't want and save as new, clean data.

Another way to handle this situation is to control loaded RData by loading it to own environment

sandbox <- new.env()
load("some_old.RData", sandbox)

Now you can see what is inside

sapply(ls(sandbox), function(x) object.size(get(x,sandbox)))

Then you have several posibilities:

  • write what you want to new RData: save(A, B, file="clean.RData", envir=sandbox)
  • remove what you don't want from environment rm(x, z, u, envir=sandbox)
  • make copy of variables you want in global workspace and remove sandbox

I usually do something similar to third option. Load my data, do some checks, transformation, copy final data to global workspace and remove environments.

You could always implement what you want. So

  1. Load the data
    vars <- load("some_old.RData")
  2. Get sizes
    vars_size <- sapply(vars, function(x) object.size(get(x)))
  3. Order them
    vars <- vars[order(vars_size, decreasing=TRUE)]
    vars_size <- vars_size [order(vars_size, decreasing=TRUE)]
  4. Make dialog box (depends on OS, here is Windows)
    vars_with_size <- paste(vars,vars_size)
    vars_to_save <- select.list(vars_with_size, multiple=TRUE)
  5. Remove what you don't want

To nice form of object size I use solution based on getAnywhere(print.object_size)

pretty_size <- function(x) {
    ifelse(x >= 1024^3, paste(round(x/1024^3, 1L), "Gb"),
    ifelse(x >= 1024^2, paste(round(x/1024^2, 1L), "Mb"),
    ifelse(x >= 1024  , paste(round(x/1024, 1L), "Kb"),
                        paste(x, "bytes")

Then in 4. one can use paste(vars, pretty_size(vars_size))

Thanks Marek. Your code was helpful with some interesting functions and strategies. I do hope though that something like what Nico suggested could be devised - it seems much more easy to work with. Thanks again, Tal.
Tal Galili
+1  A: 

You may want to check out the RGtk2 package. You can very easily create an interface with Glade Interface Designer and then attach whatever R commands you want to it.

If you want a good starting point where to "steal" ideas on how to use RGtk2, install the rattle package and run rattle();. Then look at the source code and start making your own interface :)

I may have a go at it and see if I can come out with something simple.

EDIT: this is a quick and dirty piece of code that you can play with. The big problem with it is that for whatever reason the rm instruction does not get executed, but I'm not sure why... I know that it is the central instruction, but at least the interface works! :D


  • Make rm work
  • I put all the variables in the remObjEnv environment. It should not be listed in the current variable and it should be removed when the window is closed
  • The list will only show objects in the global environment, anything inside other environment won't be shown, but that's easy enough to implement
  • probably there's some other bug I haven't thought of :D


# Our environment
remObjEnv <<- new.env()

# Various required libraries

remObjEnv$createModel <- function()
    # create the array of data and fill it in
    remObjEnv$objList <- NULL
    objs <- objects(globalenv())

    for (o in objs)
        remObjEnv$objList[[length(remObjEnv$objList)+1]] <- list(object = o, 
            type = typeof(get(o)),
            size = object.size(get(o)))

    # create list store
    model <- gtkListStoreNew("gchararray", "gchararray", "gint")

    # add items 
    for (i in 1:length(remObjEnv$objList))
        iter <- model$append()$iter

              0, remObjEnv$objList[[i]]$object,
              1, remObjEnv$objList[[i]]$type,
              2, remObjEnv$objList[[i]]$size)


remObjEnv$addColumns <- function(treeview)
    colNames <- c("Name", "Type", "Size (bytes)")

    model <- treeview$getModel()

    for (n in 1:length(colNames))
        renderer <- gtkCellRendererTextNew()
        renderer$setData("column", n-1)
        treeview$insertColumnWithAttributes(-1, colNames[n], renderer, text=n-1)

# Builds the list. 
# I seem to have some problems in correctly build treeviews from glade files
# so we'll just do it by hand :)
remObjEnv$buildTreeView <- function()
    # create model
    model <- remObjEnv$createModel()
    # create tree view
    remObjEnv$treeview <- gtkTreeViewNewWithModel(model)


    remObjEnv$vbox$packStart(remObjEnv$treeview, TRUE, TRUE, 0)

remObjEnv$delObj <- function(widget, treeview)
    model <- treeview$getModel()
    selection <- treeview$getSelection()
    selected <- selection$getSelected()
    if (selected[[1]])
        iter <- selected$iter
        path <- model$getPath(iter)
            i <- path$getIndices()[[1]]

    obj <- as.character(remObjEnv$objList[[i+1]]$object)

# The list of the current objects
remObjEnv$objList <- NULL

# Create the GUI.
remObjEnv$window <- gtkWindowNew("toplevel", show = FALSE)
gtkWindowSetTitle(remObjEnv$window, "R Object Remover")
gtkWindowSetDefaultSize(remObjEnv$window, 500, 300)
remObjEnv$vbox <- gtkVBoxNew(FALSE, 5)

# Build the treeview

remObjEnv$button <- gtkButtonNewWithLabel("Delete selected object")
gSignalConnect(remObjEnv$button, "clicked", remObjEnv$delObj, remObjEnv$treeview)
remObjEnv$vbox$packStart(remObjEnv$button, TRUE, TRUE, 0)

Thanks Nico. In case you succeed and getting something ready - please let me know about it ([email protected]) Thanks!
Tal Galili
@Tal Galili: I updated my answer with a piece of code. It's all for you to play with! :)
Hi Nico - wonderful, thank you! I currently can't play with it yet since I am having issue with Gtk2, but upon fixing it I will give your code a run. Best, Tal
Tal Galili
Nico, what do you think about the user___ idea down the thread? of using gwidgets instead of RGtk2 ?
Tal Galili
@Tal Galili: I never used gwidgets, I guess it is an option, I generally use RGtk2 because it nicely integrates in the Gnome environment, but it's absolutely a personal preference. You can use Tcl/Tk too if you wish
+1  A: 

It doesn't have checkboxes to delete with, rather you select the file(s) then click delete. However, the solution below is pretty easy to implement:


## make data frame with files
out <- lapply((x <- list.files()), file.info)
out <- do.call("rbind", out)
out <- data.frame(name=x, size=as.integer(out$size), ## more attributes?
## set up GUI
w <- gwindow("Browse directory")
g <- ggroup(cont=w, horizontal=FALSE)
tbl <- gtable(out, cont=g, multiple=TRUE)
size(tbl) <- c(400,400)
deleteThem <- gbutton("delete", cont=g)
enabled(deleteThem) <- FALSE
## add handlers
addHandlerClicked(tbl, handler=function(h,...) {
  enabled(deleteThem) <- (length(svalue(h$obj, index=TRUE)) > 0)

addHandlerClicked(deleteThem, handler=function(h,...) {
  inds <- svalue(tbl, index=TRUE)
  files <- tbl[inds,1]
  print(files)                          # replace with rm?
Thanks user.___, I will try this as well after my RGtk2 will start working again. Best, Tal
Tal Galili
Hi again, I tried your code with options(guiToolkit="tcltk"), it partially worked. I can see the table of files (I actually wanted the ls() objects, but that's fine) - But I don't see the delete button (it doesn't have enough space). Any ideas ?
Tal Galili
Yeah, that is some bug I haven't figured out how to iron out with the table widget. A couple quick hacks are: you can manually resize the window; move the delete button above the gtable instance, or use a horizontal layout (skipping horizontal=FALSE in ggroup) --John