Organizing R Source Code

This question is very closely related to: "How to organize large R programs?"

You should consider creating an R package. You can use the package.skeleton function to start with given a set of R files. I also strongly recommend using roxygen to document the package at the beginning, because it's much more difficult to do it after the fact.

Read "Writing R Extensions". The online book "Statistics with R" has a section on this subject. Also take a look at Creating R Packages: A Tutorial by Friedrich Leisch. Lastly, if you're in NY, come to the upcoming NY use-R group meeting on "Authoring R Packages: a gentle introduction with examples".

Just to rehash some suggestions about good practices:

A package allows you to use R CMD check which is very helpful at catching bugs; separately you can look at using the codetools package.
A package also forces you to do a minimal amount of documentation, which leads to better practices in the long run.
You should also consider doing unit testing (e.g. with RUnit) if you want your code to be robust/maintainable.
You should consider using a style guide (e.g. Google Style Guide).
Use a version control system from the beginning, and if you're going to make your code open source, then consider using github/googlecode/or r-forge.

Edit:

Regarding how do make incremental changes without rebuilding and installing the full package: I find the easiest thing to do is to make changes in your relevant R file and then use the source command to load those changes. Once you load your library into an R session, it will always be lower in the environment (and lower in priority) than the .GlobalEnv, so any changes that you source or load in directly will be used first (use the search command to see this). That way you can have your package underlying and you are overwriting changes as you're testing them in the environment.

Alternatively, you can use an IDE like StatET or ESS. They make loading individual lines or functions out of an R package very easy. StatET is particularly well designed to handle managing packages in a directory-like structure.

Thanks Shane. Exactly the guidance I was looking for.

Chris 2010-02-17 22:12:47

So here's another question that came up in the SO discussion you referenced that was unanswered. As you are modifying / adding code to the developing package, how do you reload the package contents without going through an install? Is there a convenient way? Just trying to get a handle on the cycle one is going through as you write code and test.

Chris 2010-02-17 23:03:59

Look at the function "sourceDir" which is in the example section of the help file for the "source" function (type "?source" at the R commnand prompt). I have a very similar function in my ~/.Rprofile file, and I sourceDir() the mypackage/R directory of the package I'm building as I make changes to it.Occasionally I'll reinstall the package, but I find this way easier to make incremental changes w/o blowing away any current work I have going on in the interpreter.

Steve Lianoglou 2010-02-18 03:11:45

ansaurus

tags:

views:

answers:

Organizing R Source Code

related questions