tags:

views:

519

answers:

1

What is the best way to make use of a C++ library in R, hopefully preserving the C++ data structures. I'm not at all a C++ user, so I'm not clear on the relative merits of the available approaches. The R-ext manual seems to suggest wrapping every C++ function in C. However, at least four or five other means of incorporating C++ exist.

Two ways are packages w/ similar lineage, the Rcpp (maintained by the prolific overflower Dirk Eddelbuettel) and RcppTemplate packages (both on CRAN), what are the differences between the two?

Another package, rcppbind available, on R forge that claims to take a different approach to binding C++ and R (I'm not knowledgeable to tell).

The package inline available on CRAN, claims to allow inline C/C++ I'm not sure this differs from the built in functionality, aside for allowing the code to be inline w/R.

And, finally RSwig which appears to be in the wild but it is unclear how supported it is, as the author's page hasn't been updated for years.

My question is, what are the relative merits of these different approaches. Which are the most portable and robust, which are the easiest to implement. If you were planning to distribute a package on CRAN which of the methods would you use?

+6  A: 

First off, a disclaimer: I use Rcpp all the time. In fact, when (having been renamed by the time from Rcpp) RcppTemplate had already been orphaned and without updates for two years, I started to maintain it under its initial name of Rcpp (under which it had been contributed to RQuantLib). That was about a year ago, and I have made a couple of incremental changes that you can find documented in the ChangeLog.

Now RcppTemplate has very recently come back after a full thirty-five months without any update or fix. It contains interesting new code, but it appears that it is not backwards compatible so I won't use it where I already used Rcpp.

Rcppbind was not very actively maintained whenever I checked. Whit Armstrong also has a templated interface package called rabstraction.

Inline is something completely different: it eases the compile / link cycle by 'embedding' your program as an R character string that then gets compiled, linked, and loaded. I have talked to Oleg about having inline support Rcpp which would be nice.

Swig is interesting too. Joe Wang did great work there and wrapped all of QuantLib for R. But when I last tried it, it no longer worked due to some changes in R internals. According to someone from the Swig team, Joe may still work on it. The goal of Swig is larger libraries anyway. This project could probably do with a revival but it is not without technical challenges.

Another mention should go to RInside which works with Rcpp and lets you embed R inside of C++ applications.

So to sum it up: Rcpp works well for me, especially for small exploratory projects where you just want to add a function or two. It's focus is ease of use, and it allows you to 'hide' some of the R internals that are not always fun to work with. I know of a number of other users whom I have helped on and and off via email. So I would say go for this one.

My 'Intro to HPC with R' tutorials have some examples of Rcpp, RInside and inline.

Edit: So let's look at a concrete example (taken from the 'HPC with R Intro' slides and borrowed from Stephen Milborrow who took it from Venables and Ripley). The task is to enumerate all possible combinations of the determinant of a 2x2 matrix containing only single digits in each position. This can be done in clever vectorised ways (as we discuss in the tutorial slides) or by brute force as follows:

#include <Rcpp.h>

RcppExport SEXP dd_rcpp(SEXP v) {
  SEXP  rl = R_NilValue;        // Use this when there is nothing to be returned.
  char* exceptionMesg = NULL;   // msg var in case of error

  try {
    RcppVector<int> vec(v);     // vec parameter viewed as vector of ints
    int n = vec.size(), i = 0;
    if (n != 10000) 
       throw std::length_error("Wrong vector size");
    for (int a = 0; a < 9; a++)
      for (int b = 0; b < 9; b++)
        for (int c = 0; c < 9; c++)
          for (int d = 0; d < 9; d++)
            vec(i++) = a*b - c*d;

    RcppResultSet rs;           // Build result set to be returned as list to R
    rs.add("vec", vec);         // vec as named element with name 'vec'
    rl = rs.getReturnList();    // Get the list to be returned to R.
  } catch(std::exception& ex) {
    exceptionMesg = copyMessageToR(ex.what());
  } catch(...) {
    exceptionMesg = copyMessageToR("unknown reason");
  }

  if (exceptionMesg != NULL) 
     Rf_error(exceptionMesg);

  return rl;
}

If you save this as, say, dd.rcpp.cpp and have Rcpp installed, then simply use

PKG_CPPFLAGS=`Rscript -e 'Rcpp:::CxxFlags()'`  \
    PKG_LIBS=`Rscript -e 'Rcpp:::LdFlags()'`  \
    R CMD SHLIB dd.rcpp.cpp

to build a shared library. We use Rscript (or r) to ask Rcpp about its header and library locations. Once built, we can load and use this from R as follows:

dyn.load("dd.rcpp.so")

dd.rcpp <- function() {
    x <- integer(10000)
    res <- .Call("dd_rcpp", x)
    tabulate(res$vec)
}

In the same way, you can send vectors, matrics, ... of various R and C++ data types back end forth with ease. Hope this helps somewhat.

Dirk Eddelbuettel
Thanks for the answer! Could you clarify what you mean by "interesting new code" in RcppTemplate. How does the package differ from Rcpp? I know almost nothing about C++, how do these package work?
Peter
Dirk, maybe it is time for the talked-about quick vignette on getting C++ to work with R...:( !
knguyen
It ain't easy to explain how they differ as it is ... in the C++ that they differ.
Dirk Eddelbuettel
Fair enough, I am just trying to figure out why one would choose one or the other. I'm trying to take a look at the "Intro to HPC with R" to see if it has more detail, but your site appears to be down.
Peter
Try now -- server didn't automatically server after a power-outage. As for the 'which one to chose', why don't you start a simple test project and try both?
Dirk Eddelbuettel