tags:

views:

469

answers:

7

How would one go about implementing least squares regression for factor analysis in C/C++?

+4  A: 

the gold standard for this is LAPACK. you want, in particular, xGELS.

Peter
This is a FORTRAN solution, albeit a good one. The point is valid, however, that there are existing libraries and statistical packages in c that are much easier to use than rolling your own.
CaptnCraig
@Captn: There's C/C++ port of LAPACK.
KennyTM
What KennyTM said. Most platforms that provide LAPACK also provide C interfaces.
Stephen Canon
+1  A: 

Get ROOT and use TGraph::Fit() (or TGraphErrors::Fit())?

Big, heavy piece of software to install just of for the fitter, though. Works for me because I already have it installed.

Or use GSL.

dmckee
+2  A: 

When I've had to deal with large datasets and large parameter sets for non-linear parameter fitting I used a combination of RANSAC and Levenberg-Marquardt. I'm talking thousands of parameters with tens of thousands of data-points.

RANSAC is a robust algorithm for minimizing noise due to outliers by using a reduced data set. Its not strictly Least Squares, but can be applied to many fitting methods.

Levenberg-Marquardt is an efficient way to solve non-linear least-squares numerically. The convergence rate in most cases is between that of steepest-descent and Newton's method, without requiring the calculation of second derivatives. I've found it to be faster than Conjugate gradient in the cases I've examined.

The way I did this was to set up the RANSAC an outer loop around the LM method. This is very robust but slow. If you don't need the additional robustness you can just use LM.

Michael Anderson
Looks like the GSL library mentioned by dmckee supports Levenberg-Marquardt. This would be a good starting point if you want to go down this route. I think GSL may have been unusable for us due to its GPL license.
Michael Anderson
+1 for mentioning RANSAC, because it's a nifty algorithm that doesn't get the exposure it desserves
Kena
A: 

Have a look at http://www.alglib.net/optimization/

They have C++ implementations for L-BFGS and Levenberg-Marquardt.

You only need to work out the first derivative of your objective function to use these two algorithms.

Yin Zhu
A: 

If you want to implement an optimization algorithm by yourself Levenberg-Marquard seems to be quite difficult to implement. If really fast convergence is not needed, take a look at the Nelder-Mead simplex optimization algorithm. It can be implemented from scratch in at few hours.

http://en.wikipedia.org/wiki/Nelder%E2%80%93Mead_method

midtiby
A: 

I've used TNT/JAMA for linear least-squares estimation. It's not very sophisticated but is fairly quick + easy.

Jason S
A: 

Lets talk first about factor analysis since most of the discussion above is about regression. Most of my experience is with software like SAS, Minitab, or SPSS, that solves the factor analysis equations, so I have limited experience in solving these directly. That said, that the most common implementations do not use linear regression to solve the equations. According to this, the most common methods used are principal component analysis and principal factor analysis. In a text on Applied Multivariate Analysis (Dallas Johnson), no less that seven methods are documented each with their own pros and cons. I would strongly recommend finding an implementation that gives you factor scores rather than programming a solution from scratch.

The reason why there's different methods is that you can choose exactly what you're trying to minimize. There a pretty comprehensive discussion of the breadth of methods here.

Grembo