views:

324

answers:

4

Does anyone know of an efficient way to do multiple linear regression in C#, where the number of simultaneous equations may be in the 1000's (with 3 or 4 different inputs). After reading this article on multiple linear regression I tried implementing it with a matrix equation:

Matrix y = new Matrix(
    new double[,]{{745},
                  {895},
                  {442},
                  {440},
                  {1598}});

Matrix x = new Matrix(
     new double[,]{{1, 36, 66},
                 {1, 37, 68},
                 {1, 47, 64},
                 {1, 32, 53},
                 {1, 1, 101}});

Matrix b = (x.Transpose() * x).Inverse() * x.Transpose() * y;

for (int i = 0; i < b.Rows; i++)
{
  Trace.WriteLine("INFO: " + b[i, 0].ToDouble());
}

However it does not scale well to the scale of 1000's of equations due to the matrix inversion operation. I can call the R language and use that, however I was hoping there would be a pure .Net solution which will scale to these large sets.

Any suggestions?

EDIT #1:

I have settled using R for the time being. By using statconn (downloaded here) I have found it to be both fast & relatively easy to use this method. I.e. here is a small code snippet, it really isn't much code at all to use the R statconn library (note: this is not all the code!).

_StatConn.EvaluateNoReturn(string.Format("output <- lm({0})", equation));
object intercept = _StatConn.Evaluate("coefficients(output)['(Intercept)']");
parameters[0] = (double)intercept;
for (int i = 0; i < xColCount; i++)
{
  object parameter = _StatConn.Evaluate(string.Format("coefficients(output)['x{0}']", i));
  parameters[i + 1] = (double)parameter;
}
+1  A: 

Try Meta.Numerics:

Meta.Numerics is a library for advanced scientific computation in the .NET Framework. It can be used from C#, Visual Basic, F#, or any other .NET programming language. The Meta.Numerics library is fully object-oriented and optimized for speed of implementation and execution.

To populate a matrix, see an example of the ColumnVector Constructor (IList<Double>). It can construct a ColumnVector from many ordered collections of reals, including double[] and List.

gimel
Thanks, I hadn't seen that library before. Looks good, but still suffers the same issues of solving the equations with matrices. I think I need a different approach.
mrnye
+1  A: 

The size of the matrix being inverted does NOT grow with the number of simultaneous equations (samples). x.Transpose() * x is a square matrix where the dimension is the number of independent variables.

Joe H
Interesting point, I wonder why my performance degrades so much then? I had about 6000 samples in my set. I will have to investigate this further.
mrnye
I'd guess your performance degrades because x.Transpose() * x takes more time with bigger matrices. I have a library somewhere that works for millions of data points... I'll try to dig it up if you're interested. I faced this problem about twenty years ago (yes I'm old) and found a clever mathematical solution :-)
Joe H
A: 

You might find this blogpost useful: There is hardly ever a good reason to invert a matrix.

AakashM
A: 

For the record, I recently found the ALGLIB library which, whilst not having much documentation, has some very useful functions such as the linear regression which is one of the things I was after.

mrnye