views:

313

answers:

3

Hi,

I want to do sparse high dimensional (a few thousand features) least squares regression with a few hundred thousands of examples. I'm happy to use non fancy optimisation - stochastic gradient descent is fine.

Does anyone know of any software already implemented for doing this, so I don't have to write to my own?

Kind regards.

+5  A: 

While I don't know for sure, this strikes me as the kind of thing that LAPACK (linear algebra package) would be able to provide support for. They are typically interested in large matrix math, incluing sparse matrices and out-of-core sizes. The basic version is FORTRAN, but there are ports of the libraries for C and other languages.

As LAPACK uses BLAS (basic linear algebra subprograms) for many of its underlying calls, you will probably also want to check out Sparse BLAS.

jvasak
+1  A: 

I'd suggest taking a look at LAPACK. It's a pretty mature linear algebra library, although interfacing with it can be a little tricky, since it's written in Fortran. That's fine, though, since Fortran is ABI compatible with C, if you get your function prototypes right.

[Edit] Upon further review, it appears that LAPACK does not support sparse matrices. It can handle banded matrices for some purposes, but for the linear least-squares problem, it only supports general matrices.

Adam Rosenfield
+3  A: 

I'm pretty sure that the R package can be used for problems like this. It's incredibly powerful and flexible. Lots of online resources linked from that page.

Argalatyr