views:

69

answers:

2

I am trying to work out if I can parallelise the training aspect of a machine learning algorithm. The computationally expensive part of the training involves Cholesky decomposing a positive-definite matrix (covariance matrix). I'll try and frame the question purely in terms of the matrix algebra. Let me know if you need any more info.

Lets say we have a block matrix (covariance matrix, but that's not relevant to the problem)

 
M = A  B  
    B* C

where A and C relate to training data from two different sets. Both A , and B are positive definite. Lets also assume for simplicity that A and C have size nxn.

There is a formula for carrying out block Cholesky decomposition. See http://en.wikipedia.org/wiki/Block_LU_decomposition. Summarising we have the following result.

M = LU

where (* indicates transpose)

L = A^{1/2}      0 
    B*A^{-*/2}  Q^{1/2}

where

Q = C - B*A^{-1}B

Now lets say training related to matrices A and C has already been carried out, so we have carried out the cholesky decomposition for A, and C giving A^{1/2}, and C^{1/2} (It is therefore straightforward to calculate the inverses A^{-1/2}, and C^{-1/2} using forward substitution).

Rewriting the Q in terms of these quantities we now have.

Q = Q^{1/2} Q^{*/2} = C^{1/2} C^{*/2} - B* A^{-*/2}A^{-1/2} B

My question is this: Given this set up is it possible to algebraicly calculate Q^{1/2} without having to apply cholesky decomposition to Q. Or in other words can I use C^{1/2} to help me in the calculation of Q^{1/2}. If this were possible it would then be possible to easily parallelise the training.

Thanks in advance for any replies. Sorry about the matrix typesetting. Is there any way sensible way to typeset maths or matrices in particular?

Matt.

A: 

I think I've come to an answer although it is not exactly as I'd hoped.

Removing the machine learning context, my question boiled down to whether knowing C^{1/2} would help in the calculation of Q^{-1/2}. I'll go into more detail below but to cut the chase, the answer is yes, but only with respect to stability and not computation (can't prove this to be the case currently, but fairly certain).

For why the answer is yes wrt to stability we look at the definition Q from the original question has been rearranged as follows.

Q = C - B* A^{-1} B = (C^{1/2} + B*A^{-*/2})(C^{1/2} - B*A^{-*/2})*

By knowing C^{1/2} before hand, we can calculate Q without having to invert A directly. Direct inversion is not numerically stable.

Sadly, although I have done a fair amount of research on the subject, it does not appear that $C^{1/2}$ helps wrt computation in the exact calculation of Q^{-1/2}. The best approach appears to be to calculate Q using C^{1/2} as above and then use Cholesky to decompose Q to Q^{1/2} and then forward substitution to calculate Q^{-1/2}.

Further Research

One area I did not look into in much detail was whether it was possible to use C^{1/2} to approximate Q^{-1/2}. Something along the lines of an iterative method using C^{1/2} as a starting point. I do not know of any such iterative approximation process, but I'll keep searching. I may even start a new question with that as the focus.

I'll update you all if I have any major breakthroughs.

Sigmoidal
It's been pointed out to me that the instability associated with inverting A directly is avoided in both formulations of Q. Having said that, there may be some stability gains if Q is low rank. So there is no real harm in the reformulation of Q.
Sigmoidal
+1  A: 

You can do this with a sequence of cholesky downdates:

(Below I use ' for transpose to avoid confusion with multiplication)

If the cholesky factor of A is a, and of C is c, then the equation for Q can be written

Q = c*c' - beta'*beta where beta = inverse(a)*B (ie solve a*beta = B for beta)

If we write b[i] for the i'th column of beta, then

Q = c*c' - Sum b[i]*b[i]'

Finding the cholesky decomposition of

c*c' - x*x' (where x is a vector and c is lower triangular)

is known as a rank 1 cholesky downdate. There is a stable algorithm for this in Golub and van Loan

dmuir
Hi. Thanks for the reply. I really appreciate it :-). I just have a couple of questions. Although this certainly gives a method for computing the cholesky factor of Q without factorising directly, does it improve the process with respect to computation? In your set up the expensive operation is solving to find beta. Assuming that cholesky factor a and matrix B are both nxn matrices. Am I right in saying that calculating beta would be a O(n^3) operation? If this is the case what does it offer over factorising Q directly? Thanks again for the reply.Matt.
Sigmoidal
Yes, solving for beta is O(n^3); also the cholesky downdate is O(n^2) and you'll be doing n of them. I suspect there's not much in it if the factor of Q is all you want. On the other hand if you want all of M factored then you'll be computing beta anyway,
dmuir