views:

105

answers:

3

So, I'm using boolean matrices whose dimension is usually a couple dozen to a couple hundred, they are usually quite sparse (just some 2-4 non-zeros in most rows and columns) and my runtime is heavily dominated by their multiplication.

What datastructure suits best for speeding up multiplication under these circumstances?

Currently I'm storing each matrix in a contiguous bitset (array of 64-bit longs) row-wise, and multiplicating them with basically the standard algorithm, just sped up for sparsity with the fast operation of locating the next set bit in a word, and with vectorization by bitmask operations.

Should I perhaps use some sparse representation indeed?

A: 

I think it makes sense to use a sparse representation. Even with something as simple as set of indexes that are non-zero, I think you will get better performance.

For example, for a 100×100 matrix with 300 non-zero elements, using 2D array representation, multiplication requires 100×100×100=1,000,000 “operations”. Using set of indexes requires only 300×300=90,000 operations. (Of course those operations are not directly comparable.)

svick
Yeah, that's the point: these operations are really not directly comparable, because it is not obvious at all that 90000 operations on index sets is faster than 1000000 vectorized operations on 64-bit masks. I guess I'll just have to implement it and see.
jkff
A: 

A list of the 1's x's and y's. For instance:

[0,2,0,12,0,60,1,2,1,39 ... etc]

Which means that there's a 1 at 0,2 a 1 at 0,12, etc.

The nice part is that you don't really need a new data type, and it's pretty simply to parse.

To multiply, you'd look up all matching/partly matching indexes.

TaslemGuy
I think that creating new data type is much cleaner and easier to work with.
svick
That's the point: I'm not sure that it will be faster, considering that the current implementation is very low-level-efficient since it works through vector operations on 64-bit bitmasks. Looks like I'll just have to implement it.
jkff
+2  A: 

You might want to consider using a quadtree representation. The quadtree can encode a sparse matrix pretty well, and lends itself to a pretty easy (and cache efficient) recursive multiply implementation. Make the base case a 8x8 matrix so you can implement the multiply as a (assembly-optimized?) 64-bit by 64-bit operation.

Keith Randall