views:

379

answers:

2

I know there are LOT of questions like that but I can't find one specific to my situation. I have 4x4 matrices implemented as NIO float buffers (These matrices are used for OpenGL). Now I want to implement a multiply method which multiplies Matrix A with Matrix B and stores the result in Matrix C. So the code may look like this:

class Matrix4f
{
    private FloatBuffer buffer = FloatBuffer.allocate(16);

    public Matrix4f multiply(Matrix4f matrix2, Matrix4f result)
    {
        {{{result = this * matrix2}}} <-- I need this code

        return result;
    }
}

What is the fastest possible code to do this multiplication? Some OpenGL implementations (Like the OpenGL ES stuff in Android) provide native code for this but others doesn't. So I want to provide a generic multiplication method for these implementations.

+2  A: 

Go through the FloatBuffer.array() if that operation is supported. Then just perform the necessary multiplications through that array, and return the resulting matrix.

Have a look at GameDev.net - Matrix Math for the exact computations.

If you want to optimize it further, you could try out Strassens Algorithm. You wouldn't even need to pad your matrices, since they are square and of a size that is a power of 2.

aioobe
As the wikipedia article on Strassen says: "Practical implementations of Strassen's algorithm switch to standard methods of matrix multiplication for small enough submatrices, for which they are more efficient. The particular crossover point for which Strassen's algorithm is more efficient depends on the specific implementation and hardware. It has been estimated that Strassen's algorithm is faster for matrices with widths from 32 to 128 for optimized implementations."
janneb
Good point. Thanks!
aioobe
+5  A: 

The real answer is of course to test different implementations and check which one is fastest.

My guess, without testing, would be that as the matrices are so small, expanding the loops by hand would result in the fastest code. E.g. something like

result[0][0] = this[0][0] * matrix2[0][0] + this[0][1] * matrix2[1][0] 
             + this[0][2] * matrix2[2][0] + this[0][3] * matrix2[3][0];
result[0][1] = // ... and so forth

or then maybe just unroll the innermost loop, and retain the two outermost ones to save some typing as well as I$.

janneb
Note that the JIT compiler is quite good at unrolling loops where necessary, so you might find there's not much in it.
Neil Coffey