views:

783

answers:

5

There are at least two sparse matrix packages for R. I'm looking into these because I'm working with datasets that are too big and sparse to fit in memory with a dense representation. I want basic linear algebra routines, plus the ability to easily write C code to operate on them. Which library is the most mature and best to use?

So far I've found

  • Matrix which has many reverse dependencies, implying it's the most used one.
  • SparseM which doesn't have as many reverse deps.
  • Various graph libraries probably have their own (implicit) versions of this; e.g. igraph and network (the latter is part of statnet). These are too specialized for my needs.

Anyone have experience with this?

From searching around RSeek.org a little bit, the Matrix package seems the most commonly mentioned one. I often think of CRAN Task Views as fairly authoritative, and the Multivariate Task View mentions Matrix and SparseM.

+1  A: 

In my experience, Matrix is the best supported and most mature of the packages you mention. Its C architecture should also be fairly well-exposed and relatively straightforward to work with.

AWB
+3  A: 

Matrix is the most common and has also just been accepted R standard installation (as of 2.9.0), so should be broadly available.

Matrix in base: https://stat.ethz.ch/pipermail/r-announce/2009/000499.html

David Lawrence Miller
A: 

Brendan, I have used Matrix, and it just works. Several packages use it as well.

A: 

Follow-up question. If I do log(x) I get an out-of-memory error, because it seems to be trying to convert my sparse matrix into a dense one. Am I not allowed to do extremely basic manipulations of the data like this? This seems pretty crappy.

Brendan OConnor
this is a somewhat dumb question, i take it back.
Brendan OConnor
+1  A: 

log(x) on a sparse matrix is a bad idea since log(0) isn't defined and most elements of a sparse matrix are zero.

If you would just like to get the log of the non-zero elements, try converting to a triplet sparse representation and taking a log of those values.

Ted Dunning
oops. i meant log(1+x) actually. i guess this doesn't make any sense. yeah, i do it with the triplet representation, which makes much more sense.
Brendan OConnor