tags:

views:

384

answers:

4

I have two matrices and I need to compare them, but I don't want to compare position by position, I think that is not the best way. I thought of hash functions, does anyone know how to calculate the hash of a matrix?

+3  A: 

You can compute a hash of the whole floating point array (as a byte sequence). If you want a comparison function able to cope with small differences in the coefficients, you can compare trivial scalars and vectors computed from each matrix. It makes sense if you have to compare each matrix with more than one matrix. Examples coming to mind:

trace of the matrix
vector of L0, L1, L2 norms of all columns or rows
diagonal of LU factorization
tridiagonal reduction (if symmetric)
diagonal of eigenvalues (if possible)
diagonal of SVD
Eric Bainville
+4  A: 

If your matrices are implemented as arrays, I'd suggest using memcmp() from string.h to determine if they are equal.

If floating point values are involved and the values result from actual computations, there's no way around checking them value by value, as you'll have to include epsilons to accomodate numeric errors.

Christoph
+1  A: 

First, a hash won't tell you if two matrices are equal, only can tell you if they're distinct; because there can be (and there will be, Murphy's Law is always lurking) collisions.

You can calculate a hash by chaining any function over the elements... if you can cast the elements to integer values (not truncation, but the binary representation), maybe you could XOR all of them (but keep in mind that this wouldn't work if some values are the same but with distinct representation like -0 and +0 or NaN).

So my advice is that you could have some kind of hash function (even the sum of all the elements could be valid) precalculated (this is important, there's no point in calculate the hash each time you want to make a comparison and then compare the hashes) to discard quickly some different matrices, but if the hash is the same, you will have to compare each position.

fortran
+1 for mentioning collisions
rascher
A: 

When you say hash i guess you want to checksum the matrices and compare the checksums to confirm equality. Assuming that each of your matrices is stored as a contiguous chunk of data, you could compute the start address and length (in bytes) of each chunk and then generate checksums for both (since you expect them to be equal sometimes, the length would be same). If the checksums are same with a very high probability the two matrices are also equal. If correctness is critical, you could run a comparison loop over the two matrices once their checksums match. That way you will not be invoking the cost of comparison unless equality is very likely.

wikipedia checksum reference

nik