views:

545

answers:

4

I'm new to R and having a hard time piecing together information from various sources online related to what is considered a "good" practice with writing R code. I've read basic guides but I've been having a hard time finding information that is definitely up to date.

  1. What are some examples of well written/documented S3 classes?
  2. How about corresponding S4 classes?
  3. What conventions do you use when commenting .R classes/functions? Do you put all of your comments in both .Rd files and .R files? Is synchronization of these files tiresome?
+2  A: 

That's half a dozen or more questions bundled into one, which makes it difficult to answer.

So let's try from the inside out: First try to solve your RODBC wrapper problem. A code representation will suggest itself. I would start with simple functions, and then maybe build a package around it. That already gives you some encapsulation.

Much of the rest is style. Some prominent R codes swear by S4, while other swear about it. You can always read the packages of others as well as code in R itself. And you can always re-implement your RODBC wrapper in different ways and the compare your own approaches.

Edit: Reflecting you updated and much shortened question: Pick some packages from CRAN, in particular among those you use. I think you will quickly find some more or less interesting according to your style.

Dirk Eddelbuettel
You're very right. I've removed the second half of my question. Sorry about that and thanks.
Bob Albright
I would argue for changing the question itself to be more focused: rather than "R code examples/best practices" make it something like "R: When to use S3 or S4 classes?"; that would be much more manageable.
Shane
+3  A: 

For 3. Use roxygen - it works like javadoc to take comments in your source files and build Rd files.

hadley
Thanks Hadley, this is exactly what I was looking for.
Bob Albright
+2  A: 

Whether to use S3, S4, or a package at all is mostly a style issue (as Dirk says), but I would suggest using one of those if you want to have a very well structured object (just as you would in any OOP language). For instance, all the time series classes have time series objects (I believe that they're all S3 with the exception of its) because it allows them to enforce certain behavior around the construction and usage of those objects. Similarly with the question about creating a package: it's a good idea to do this if you will be re-using your code frequently or if the code will be useful to someone else. It requires a little more effort, but the added organizational structure can easily make up for the cost.

Regarding S3 vs. S4 (discussed on R-Help here and here), the basic guideline is that S3 classes are more "quick and dirty" while S4 classes place more rigid control over objects and types. If you're working on Bioconductor, you typically will use S4 (see, for instance, "S4 classes and methods").

I would recommend reading some of the following:

  1. "A (Not So) Short Introduction to S4" by Christophe Genolini
  2. "Programmers' niche: A simple class, in S3 and S4" by Thomas Lumley
  3. "Brobdingnag: a ''hello world'' package using S4 methods" by Robin K. S. Hankin
  4. "Converting packages to S4" by Douglas Bates
  5. "How S4 Methods Work" by John Chambers

For documentation, Hadley's suggestion is spot on: Roxygen will make life easier and puts the documentation right next to the code. That aside, you may still want to provide other comments in your code beyond what Roxygen or the man files require, in which case it's a good practice to comment your code for other developers. Those comments will not end up in your package; they will only be visible in the source code.

Shane
Thanks a bunch, Shane. A lot of this confirms what I've been reading and the links in one place is very helpful.
Bob Albright
+2  A: 

somewhat more style related than substance, but the Google R style guide is worth reading:

JD Long