First, I think the requirements are not quite clear. If you hash three datasets c1, c2 and c3. Then if you switch, c1.copyNumber and c2.copyNumber and hash again. Should that give the same result or not?
If you switch c1.startLocation with c1.endLocation. Should that result in the same hash or not?
I'm going to assume that you'd like to have different hash results in both cases and that the only permutation that should not change the hash result are permutations of the datasets c1, c2, c3.
If that is the case then I'd propose to first hash the three datasets independently to smaller values. I.e.
h1 = H(c1)
h2 = H(c2)
h3 = H(c3)
where H can be any hash function (e.g., CRC32, Adler32, SHA1 etc) depending on how hard you want to avoid collisions.
The next step would be to compute a commutative hash of h1, h2, h3. If you want to avoid collisions unless h1, h2, h3 are permuted then the following works.
Compute the polynomial
- P(x) = (x-h1)(x-h2)(x-h3)
then hash the polynomial (rsp. its coefficients) with any good hash function. I.e. that
would be
- H(h1+h2+h3 || h1 * h2 + h1 * h3 + h2 * h3 || h1 * h2 * h3), where || is concatenation.
If you want to avoid any unecessary collision at all cost then the coefficients should be computed as multiprecision integers and a collision resistant hash function such as SHA1 should be used. Because of the unique factorisation property of polynomials if follows that the coefficents of the polynomial are different if h1, h2 and h3 are different.
But it seems that avoiding collisions at all cost is overkill in your case.
So rather than computing a polynomial P(x) symbolically one could just evaluate it at a arbitrary value R. I.e. if h1, h2, h3 are just 32-bit values then computing the following
might be enough: (some C type pseudocode follows. I'm don't know what C# uses for 64-bit integers)
const long long R = SOME_RANDOM_64_BIT_CONSTANT;
long long hash0 = (R - h1) * (R - h2) * (R - h3);
int hash = (int) (hash0 >> 32);
I'm 64-bit multiplication here, because they are fast enough on modern CPUs and I'm using the upper 32-bit of hash0 rather than the lower 32 bit because the lower 32 bits are somewhat biased. I.e., the least significant bit is much more likely to be 0 than 1.