ansaurus

Question

Computing the null space of a matrix as fast as possible

Answer 1

A:

"seems very expensive" - what data do you have that supports this?

Maybe Block Lanczos is the answer you seek.

Or maybe this.

Both JAMA and Apache Commons Math have SVD implementations in Java. Why not take those and try them out? Get some real data for your case instead of impressions. It won't cost you much, since the code is already written and tested.

duffymo 2010-02-02 02:02:00

Seems expensive in that the complexity is much higher than a row reduction method for example. I am not an expert but I can calculate the null space using Gaussian elimination by hand and as far as I am aware its unsuitability is down solely to rounding errors.

zenna 2010-02-02 02:14:59

"seems" is the operative word here. This article suggests that it's O(m^3): http://www.cs.rpi.edu/~drinep/Papers/Drineas_PCI_01.pdf.

duffymo 2010-02-02 02:28:44

I give this post a -1 for incorrect information.Gaussian Elimination can be used for solving Ax = 0. Isn't that just the null space?

Moron 2010-02-02 05:12:07

Fair enough, you're correct.

duffymo 2010-02-02 10:58:12

Answer 2

+3 A:

Gaussian elimination is plenty fast for 4x3 matrices. IIRC I've done about 5 million per second with Java without parallelism. With such a small problem, your best bet is to code the routine (row reduce etc.) yourself; otherwise you'll waste most of the time putting the data into the right format for the external routine.

Rex Kerr 2010-02-02 02:28:54

By 'not an option' I was referring to errors due to rounding being a problem or not. Or does it depend heavily on the application?

zenna 2010-02-02 02:37:15

It depends heavily on the application. If it is okay to treat nearly null and actually null both as null, you set some epsilon that is considered "close enough to zero" and use Gaussian elimination. If it matters very much to you when things are close to singular but not quite, your numerical accuracy will be a lot better with SVD (typically). The larger your matrix, the worse it gets (typically), so now that you say 8x9, I'd more seriously consider SVD. Why not try out both methods with non-CUDA code and see if SVD is required?

Rex Kerr 2010-02-02 12:54:19

Answer 3

+1 A:

I think the most important thing for CUDA is to find an algorithm that doesn't depend on conditional branching (which is quite slow on graphics hardware). Simple if statements that can be optimized into conditional assignment are much better (or you can use the ?: operator).

If necessary, you should be able to do some form of pivoting using conditional assignment. It might actually be harder to determine how to store your result: if your matrix is rank-deficient, what do you want your CUDA program to do about it?

If you assume your 4x3 matrix is not actually rank-deficient, you can find your (single) null-space vector without any conditionals at all: the matrix is small enough that you can use Cramer's rule efficiently.

Actually, since you don't actually care about the scale of your null vector, you don't have to divide by the determinant -- you can just take the determinants of the minors:

    x1 x2 x3
M = y1 y2 y3
    z1 z2 z3
    w1 w2 w3

         |y1 y2 y3|        |x1 x2 x3|       |x1 x2 x3|        |x1 x2 x3|
->  x0 = |z1 z2 z3|  y0 = -|z1 z2 z3|  z0 = |y1 y2 y3|  w0 = -|y1 y2 y3|
         |w1 w2 w3|        |w1 w2 w3|       |w1 w2 w3|        |z1 z2 z3|

Note that these 3x3 determinants are just triple products; you can save computation by reusing the cross products.

comingstorm 2010-02-02 06:51:59

Answer 4

A:

Apparently it is possible to do Gaussian Elimination (and algorithms based on that) using CUDA constructs.

Here is one paper: http://academic.research.microsoft.com/Paper/4908208.aspx

Hope that helps.

Moron 2010-02-02 07:11:52

Answer 5

+1 A:

I wondered if the matrixes are related rather than just being random, so that the null spaces you are seeking can be considered to be like 1-dimensional tangents to a curve in N-space (N = 9). If so, you may be able to speed things up by using Newton's method to solve successive instances of the system of quadratic equations Ax = 0, |x|^2 = 1, starting from a previous null space vector. Newton's method uses first derivatives to converge to a solution, and so would use Gaussian elimination to solve 9x9 systems. Using this technique would require that you be able to make small steps from matrix to matrix by say varying a parameter.

So the idea is that you initialize using SVD on the first matrix, but thereafter you step from matrix to matrix, using the null space vector of one as the starting point for the iteration for the next one. You need one or two iterations to get convergence. If you don't get convegence you use SVD to restart. If this situation is what you have, it is much faster than starting fresh on each matrix.

I used this a long time ago to map contours in the solutions of sets of 50 x 50 quadratic equations associated with the behavior of electric power systems.

Permaquid 2010-02-05 21:41:16

Interesting, in this case the matrices are not related but this method will surely be useful elsewhere, and after reading your response I have seen a lot more literature on this kind of updating method.

zenna 2010-02-10 11:57:18

Oh yes, I used QR transformations, not SVD. There are lots of efficiencies to be had there. It's not so easy to remember details after more than 25 years though.

Permaquid 2010-02-15 03:09:16

Answer 6

A:

To answer your question directly... yes! QR decomposition!

Let A be an m-by-n matrix with rank n. QR decomposition finds orthonormal m-by-m matrix Q and upper triangular m-by-n matrix R such that A = QR. If we define Q = [Q1 Q2], where Q1 is m-by-n and Q2 is m-by-(m-n), then the columns of Q2 form the null space of A^T.

QR decomposition is computed either by Gram-Schmidt, Givens rotations, or Householder reflections. They have different stability properties and operation counts.

You are right: SVD is expensive! I can't speak for what state-of-the-art stuff uses, but when I hear "compute null space" (EDIT: in a way that is simple for me to understand), I think QR.

Steve 2010-02-08 01:42:17

Thanks, Coincidentally after a lot of reading (much of which was through old school parallel implementations on vector machines in the 1980s) I had decided to attempt using the QR decomposition with givens rotations, will update this if it works out well.

zenna 2010-02-10 11:57:56

Great! Glad to hear. Although you are using CUDA, I have Matlab code if you need any assistance.

Steve 2010-02-10 14:18:50

I implemented a QR decomposition using givens rotations, I parallelised at 3 levels 1. The matrix multiplication between a row pair of matrix A and the 2 x 2 Givens matrix: using 16 threads for each element of the multiplication product 2. I do 4 row pairs in parallel, 2 for Q and 2 for A, and 3. I do the 25000 matrices in parallel. Overall I halved the run time from 20 ms to 9.5ms when compared to the SVD. So success!

zenna 2010-02-17 23:45:47

Wow, that's awesome! Yes, these rotations should definitely be parallelizable. Now I want to try this myself. :-)

Steve 2010-02-17 23:47:38

Yeah! get in touch if you need any assistance, was quite a complex job, might right a block post on it or something

zenna 2010-02-17 23:52:25

This link looks really useful for this exact problem: http://www.cse.illinois.edu/courses/cs554/notes/11_qr.pdf

Steve 2010-02-17 23:52:27

... and this one: http://www.cs.umd.edu/~oleary/reprints/j32.pdf (I took numerical optimization with Prof. O'Leary.)

Steve 2010-02-17 23:54:05

Yeah I have read them both, that first one set me on a paper trail. Deciding which order to do givens rotations in is not trivial and there are quite a few papers just on that topic. I have collected some of the papers in a mendeley collection, http://www.mendeley.com/collections/1105691/Matrix-Parallel/

zenna 2010-02-18 12:03:47

ansaurus

tags:

views:

answers:

Computing the null space of a matrix as fast as possible

related questions