views:

466

answers:

5

I'd like to calculate the mathematical rank of a matrix using scipy. The most obvious function numpy.rank calculates the dimension of an array (ie. scalars have dimension 0, vectors 1, matrices 2, etc...). I am aware that the numpy.linalg.lstsq module has this capability, but I was wondering if such a fundamental operation is built into the matrix class somewhere.

Here is an explicit example:

from numpy import matrix, rank
A = matrix([[1,3,7],[2,8,3],[7,8,1]])
print rank(A)

This gives 2 the dimension, where I'm looking for an answer of 3.

+1  A: 

I don't know about Numpy in particular, but that's unlikely to be a built-in operation on a matrix; it involves fairly intensive numerical computations (and associated concerns about floating-point roundoff error and so forth) and threshold selections that may or may not be appropriate in a given context, and algorithm selection is important to computing it accurately and quickly.

Things that are built into the basic classes tend to be things that can be performed in a unique and straightforward manner, such as matrix multiplications at the most complex.

Brooks Moses
This is a good point, a numerically unstable matrix could cause the rank to change due to roundoff errors. However, this is a known problem and I was wondering if the scipy/numpy libraries directly have a function. If the answer is no - that's fine too, I can always go with a SVD.
Hooked
It's not just numerically-unstable ones. How about {{1.0, 3.0}, {1.0/3.0, 1.0}}? The division can't produce an exact answer, so should this get counted as rank 1, or rank 2?
Brooks Moses
A: 

The linear algebra functions are generally grouped in numpy.linalg. (They're also available from scipy.linalg, which has more functionality.) This allows polymorphism: the functions can accept any of the types that SciPy handles.

So, yes, the numpy.linalg.lstsq function does what you're asking. Why is that insufficient?

bignose
It does what I'm asking - but it does a lot more unnecessarily, and with a large amount of baggage. The same could have been accomplished with a LU decomposition and then a row sort. The intent of the question - if this wasn't clear, was if a function existed whose sole purpose was to calculate the rank. Ie. take in a matrx, spit out an int.
Hooked
+2  A: 

The answer is no—there is currently no function dedicated to calculating the matrix rank of an array/matrix in scipy. Adding one has been discussed before, but if it's going to happen, I don't believe it has yet.

Mike Graham
Incidentally, http://mail.scipy.org/pipermail/numpy-discussion/2008-February/031214.html is the first post of a short newsgroup discussion about this.
Mike Graham
+2  A: 

To provide a rough code snippet for people who need to get this done in practice. Feel free to improve.

u, s, v = np.linalg.svd(A)
rank = np.sum(s > 1e-10)
Stefan van der Walt
+1  A: 

If numpy does not offer a rank facility, why don't you write your own?

An efficient way to compute the rank is via the Singular Value Decomposition - the rank of the matrix is equal to the number of non-zero singular values.

def rank(A, eps=1e-12):
    u, s, vh = numpy.linalg.svd(A)
     return len([x for x in s if abs(x) > eps])

Notice that eps depends in your application - most would agree that 1e-12 corresponds to zero, but you may witness numerical instability even for eps=1e-9.

Using your example, the answer is three. If you change the second row to [2, 6, 14] (linearly dependent with row one) the answer is two (the "zero" eigenvalue is 4.9960E-16)

Arrieta