views:

69

answers:

2

Hey all-

I'm trying to make a simple linear regression function but continue to encounter a

numpy.linalg.linalg.LinAlgError: Singular matrix error

Existing function (with debug):

def makeLLS(inputData, targetData):
    print "In makeLLS:"
    print "    Shape inputData:",inputData.shape
    print "    Shape targetData:",targetData.shape
    term1 = np.dot(inputData.T, inputData)
    term2 = np.dot(inputData.T, targetData)
    print "    Shape term1:",term1.shape
    print "    Shape term2:",term2.shape
    #print term1
    #print term2
    result = np.linalg.solve(term1, term2)
    return result

The output to the console with my test data is:

In makeLLS:
    Shape trainInput1: (773, 10)
    Shape trainTargetData: (773, 1)
    Shape term1: (10, 10)
    Shape term2: (10, 1)

Then it errors on the linalg.solve line. This is a textbook linear regression function and I can't seem to figure out why it's failing.

What is the singular matrix error?

Any suggestions on resolution?

Thanks-

Jonathan

+2  A: 

A singular matrix is one for which the determinant is zero. This indicates that your matrix has rows that aren't linearly independent. For instance, if one of the rows is not linearly independent of the others, then it can be constructed by a linear combination of the other rows. I'll use numpy's linalg.solve example to demonstrate. Here is the doc's example:

>>> a = np.array([[3,1], [1,2]])
>>> b = np.array([9,8])
>>> x = np.linalg.solve(a, b)
>>> x
array([ 2.,  3.])

Now, I'll change a to make it singular.

>>> np.array([[2,4], [1,2]])
>>> x = np.linalg.solve(a, b)
...
LinAlgError: Singular matrix

This is a very obvious example because the first row is just double the second row, but hopefully you get the point.

Justin Peel
+2  A: 

As explained in the other answer linalg.solve expects a full rank matrix. This is because it tries to solve a matrix equation rather than do linear regression which should work for all ranks.

There are a few methods for linear regression. The simplest one I would suggest is the standard least squares method. Just use numpy.linalg.lstsq instead. The documentation including an example is here.

Muhammad Alkarouri
Quite right, I was thinking about lstsq after I posted.
Justin Peel