tags:

views:

373

answers:

5

I would like to improve my C skills in order to be more competent at converting R code to C where this would be useful. What hints do people have that will help me on my way?

Background: I followed an online Intro to C course a few years ago and that plus Writing R Extensions and S Programming (Venables & Ripley) enabled me to convert bottleneck operations to C, e.g. computing the product of submatrices (did I re-invent the wheel there?). However I would like to go a bit beyond this, e.g. converting larger chunks of code, making use of linear algebra routines etc.

No doubt I have more to learn from the resources I used before, but I wondered if there were others that people recommend? Working through examples is obviously one way to learn more: Brian Ripley gave a couple of examples of moving from S prototypes to S+C in this workshop on Efficient Programming in S and a more recent Bioconductor workshop Advanced R for Bioinformatics (sorry can't post hyperlink) includes a lab on writing an R+C algorithm. More like this, or other suggestions would be appreciated.

+5  A: 

My primary recommendation is to look at other packages. Needless to say, all packages don't use C code, so you will need to find examples that do. You can download the source code for all packages off CRAN, and in some instances, you can also browse them on R-Forge. Some R projects are also maintained on Google Code or sites like github (for instance, ggplot2). You will find the C code in the "src" directory.

In general, think about what you're trying to accomplish, and then look at packages that do similar things.

The "C Programming Language" book is probably still the most widely used, so you may want to have that on your bookshelf. The following free book is also a useful resource: http://publications.gbdirect.co.uk/c_book/

Shane
I just randomly clicked http://github.com/pjotrp/rqtl/blob/master/src/fitqtl_hk.c link you provide to novice. Do you think that using ***p is good way to start? Does R have pointers
ralu
Heather Turner
+4  A: 

That is a very interesting question. As it happens, I had learned C and C++ before moving to R so that may have made it "easier" for me to add C/C++ to R.

But even with that, I would be among the first to say that adding pure C to R is hellishly complicated because of the different macros and R-internals at the C level that you need to learn.

Which leads me to my favorite argument: Use an additional abstraction layer such as the Rcpp package. It hides a lot of the nasty details. And I hope that you don't need to know a lot of C++ to make use of it. One example of a package using it is the small earthmovdist package on R-Forge which uses Rcpp wrapper classes to interface one particular metric.

Edit 1: For example, see the main function of earthmovdist here which should hopefully be easy enough to read, possibly with the (short) Rcpp wrapper classes package manual at one's side.

Edit 2: Three quick reasons why I consider C++ to be more appropriate and R-alike:

  • using Rcpp wrapper classes means you never have to use PROTECT and UNPROTECT, which is a frequent source of error and heap corruction if not mapped

  • using Rcpp and with STL container classes like vector etc means you never have to explicitly call malloc() / free() or new / deletewhich removes another frequent source of error.

  • Rcpp allows you to wrap everything in try / catch blocks at the C++ level and reports the exception back to R --- so no sudden seg.faults and program deaths.

That said, choice of language us a very personal decision, and many users are of course perfectly happen with the lower-level interface between C and R.

Dirk Eddelbuettel
This is interesting - not something I'd considered. Your arguments are quite convincing, but I've not looked at C++ at all before, so would need to do a bit of homework first.
Heather Turner
+1  A: 

"What is the best book to learn C?" is a perenial SO question. (The middle link is probably the best.)

As for R-specific ways of learning C, I've found it instructive to download the R source code and take a look at some the .Internal code.

EDIT: Someone else had just asked "What to read after K&R?"

Richie Cotton
Thanks for the pointers to further reading, that's useful
Heather Turner
+4  A: 

I have struggled with this issue as well.

If the issue is to improve command of C, there are plenty of book lists on the subject. They all start with K&R. I enjoyed "Expert C Programming" by P. van der Linden and "C primer" by S. Prata. Any reference on the C standard library works.

If the issue is to interface C to R, other then the aforementioned official R document, you can check out this Harvard course, and this quick start guide. I have only passed scalar and arrays to C, and honestly wouldn't know how to interface complex data structures.

If the issue is to interface C++ to R, or build C++ skills, I can't really answer as I don't use much C++. A good starting point for me was "C++ the Core Language" (O'Reilly). Very simple, primitive, but useful for people coming from C.

gappy
I'm selecting this answer as it seems most relevant to where I am now. The Harvard course will be useful to revise C, introduce me to C++ (see comment to Dirk's answer) and is focused on using C/C++ in R.
Heather Turner
A: 

If your goal is to use C to get rid of bottlenecks you'll need a good numerical library in C. There are lots, but I've found gsl (GNU Scientific Library) pretty useful.

http://www.gnu.org/software/gsl/

There is also the classic book "Numerical recipes in C" which provides an overview of important numerical techniques (though I don't recommend using their code verbatim).

Ira Cooke
I'm not sure how useful this is in my case. I am writing code for R packages and I think using gsl code would mean that my packages would require users to have gsl on their computer? I'd prefer only to depend on code distributed with R (or available via CRAN). Correct me if I'm getting this wrong!
Heather Turner