So, I'm using boolean matrices whose dimension is usually a couple dozen to a couple hundred, they are usually quite sparse (just some 2-4 non-zeros in most rows and columns) and my runtime is heavily dominated by their multiplication.
What datastructure suits best for speeding up multiplication under these circumstances?
Currently I'm storing each matrix in a contiguous bitset (array of 64-bit longs) row-wise, and multiplicating them with basically the standard algorithm, just sped up for sparsity with the fast operation of locating the next set bit in a word, and with vectorization by bitmask operations.
Should I perhaps use some sparse representation indeed?