views:

117

answers:

2

i'd like to do large-scale regression (linear/logistic) in R with many (e.g. 100k) features, where each example is relatively sparse in the feature space---e.g., ~1k non-zero features per example.

it seems like the SparseM package slm should do this, but i'm having difficulty converting from the sparseMatrix format to a slm-friendly format.

i have a numeric vector of labels y and a sparseMatrix of features X \in {0,1}. when i try

> model <- slm(y ~ X)

i get the following error:

Error in model.frame.default(formula = y ~ X) : 
invalid type (S4) for variable 'X'

presumably because slm wants a SparseM object instead of a sparseMatrix.

is there an easy way to either a) populate a SparseM object directly or b) convert a sparseMatrix to a SparseM object? or perhaps there's a better/simpler way to do this?

(i suppose i could explicitly code the solutions for linear regression using X and y, but it would be nice to have slm working.)

any help is greatly appreciated.

+3  A: 

Don't know about SparseM but the Matrix package has an unexported lm.fit.sparse function that you can use. See vignette("sparseModels",package="Matrix"). Here is an example:

Create the data:

> y<-rnorm(30)
> x<-factor(sample(letters,30,replace=TRUE))
> X<-as(x,"sparseMatrix")
> class(X)
[1] "dgCMatrix"
attr(,"package")
[1] "Matrix"
> dim(X)
[1] 18 30

Run the regression:

> Matrix:::lm.fit.sparse(t(X),y)
 [1] -0.17499968 -0.89293312 -0.43585172  0.17233007 -0.11899582  0.56610302
 [7]  1.19654666 -1.66783581 -0.28511569 -0.11859264 -0.04037503  0.04826549
[13] -0.06039113 -0.46127034 -1.22106064 -0.48729092 -0.28524498  1.81681527

For comparison:

> lm(y~x-1)

Call:
lm(formula = y ~ x - 1)

Coefficients:
      xa        xb        xd        xe        xf        xg        xh        xj  
-0.17500  -0.89293  -0.43585   0.17233  -0.11900   0.56610   1.19655  -1.66784  
      xm        xq        xr        xt        xu        xv        xw        xx  
-0.28512  -0.11859  -0.04038   0.04827  -0.06039  -0.46127  -1.22106  -0.48729  
      xy        xz  
-0.28524   1.81682  
Jyotirmoy Bhattacharya
+1  A: 

You might also get some mileage by looking here:

Steve Lianoglou