I am trying to work out if I can parallelise the training aspect of a machine learning algorithm. The computationally expensive part of the training involves Cholesky decomposing a positive-definite matrix (covariance matrix). I'll try and frame the question purely in terms of the matrix algebra. Let me know if you need any more info.
Lets say we have a block matrix (covariance matrix, but that's not relevant to the problem)
M = A B B* C
where A and C relate to training data from two different sets. Both A , and B are positive definite. Lets also assume for simplicity that A and C have size nxn.
There is a formula for carrying out block Cholesky decomposition. See http://en.wikipedia.org/wiki/Block_LU_decomposition. Summarising we have the following result.
M = LU
where (* indicates transpose)
L = A^{1/2} 0 B*A^{-*/2} Q^{1/2}
where
Q = C - B*A^{-1}B
Now lets say training related to matrices A and C has already been carried out, so we have carried out the cholesky decomposition for A, and C giving A^{1/2}, and C^{1/2} (It is therefore straightforward to calculate the inverses A^{-1/2}, and C^{-1/2} using forward substitution).
Rewriting the Q in terms of these quantities we now have.
Q = Q^{1/2} Q^{*/2} = C^{1/2} C^{*/2} - B* A^{-*/2}A^{-1/2} B
My question is this: Given this set up is it possible to algebraicly calculate Q^{1/2} without having to apply cholesky decomposition to Q. Or in other words can I use C^{1/2} to help me in the calculation of Q^{1/2}. If this were possible it would then be possible to easily parallelise the training.
Thanks in advance for any replies. Sorry about the matrix typesetting. Is there any way sensible way to typeset maths or matrices in particular?
Matt.