views:

307

answers:

3

Assume I have two strings (or byte arrays) A and B which both have the same hash (with hash I mean things like MD5 or SHA1). If I concatenate another string behind it, will A+C and B+C have the same hash H' as well? What happens to C+A and C+B?

I tested it with MD5 and in all my tests, appending something to the end made the hash the same, but appending at the beginning did not.

Is this always true (for all inputs)?

Is this true for all (well-known) hash functions? If no, is there a (well-known) hash function, where A+C and B+C will not collide (and C+A and C+B do not either)?

(besides from MD5(x + reverse(x)) and other constructed stuff I mean)

A: 

This depends entirely on the hash function. Also, the probability that you have those collisions is really small.

so, do you know any references which list it for several hash functions?
mihi
+2  A: 

Details depend on the hash function H, but generally they work as follows:

  1. Consume a block of input X (say, 512 bits)
  2. Break the input into smaller pieces (say, 32 bits) and update hash internal state based on the input
  3. If there's more input, go to step 1
  4. At the end, spit the internal state out as the hash value H(X)

So, if A and B collide i.e. H(A) = H(B), the hash will be in the same state after consuming them. Updating the state further with the same input C can make the resulting hash value identical. This explains why H(A+C) is sometimes H(B+C). But it depends how A's and B's sizes are aligned to input block size and how the hash breaks the input block internally.

C+A and C+B can be identical if C is a multiple of the hash block size but probably not otherwise.

laalto
I do not agree with this description. If A and B (different inputs) collide on a particular hash compute, they do so because having run through different 'internal' states, they have reached the same final computation (by a very freak chance provided we have a good hash function).
nik
Now, if an extra input C is prefixed or suffixed to the two inputs, these 'internal' states seen in the compute sequence should change significantly to NOT reach the same final computation for (A,C) and (B,C). Where (X,Y) represents Y prefixing or suffixing X.
nik
@nik: Thanks, I've clarified my answer a bit.
laalto
@laalto, I think you are assuming a lot about the effect of block size on the input.
nik
I just ran a few MD5 tests, there is no size of C below 1024 (except the trival 0) where H(C+A) = H(C+B) for all C (created random Cs until finding the odd one out, usually it is the first one...) So, what is the hash block size of MD5? Or did I understand you incorrectly?
mihi
A: 

The hash functions being discussed here are typically cryptographic (SHA1, MD5). These hash functions have an Avalanche effect -- the output will change drastically with a slight change in the input.

The prefix and suffix extension of C will effectively make a longer input. So, adding anything to the front or rear of the input should change the effective hash outputs significantly.

I do not understand how you did the MD5 check, here is my test.

echo "abcd" | md5sum
70fbc1fdada604e61e8d72205089b5eb

echo "0abcd" | md5sum
f5ac8127b3b6b85cdc13f237c6005d80

echo "abcd0" | md5sum
4c8a24d096de5d26c77677860a3c50e3

Are you saying that you located two inputs which had the same MD5 hash and then appended something to the end or beginning of the input and found that adding at the end resulted in the same MD5 as that for the original input?

Please provide samples with your test results.

nik
I took the samples from http://stackoverflow.com/questions/933497/create-your-own-md5-collisions/933527#933527 which have the same MD5 (as stated in the question) and then appended random strings to it.
mihi
It is an observed property of the MD5 function that if MD5(A) == MD5(B) then it is true that MD5(A+C) == MD5(B+C) for any value of C. You need two colliding A and B values to test this, but it has been demonstrated by mathematical analysis of the MD5 function.
Dustin Fineout
@dustin: Any links/references about that?
mihi
@mihi, Ok, i get your sample from the other SO question. Would like to know if this is an academic interest, brute-force possibility analysis or just checking for a stray collision chance?
nik
I have been searching for references and will post as soon as I find a good one. I've found the fact mentioned several times and have been hunting through many whitepapers to find a good source. It's not a maybe, it's a property of the function that was observed mathematically before anyone could demonstrate it.
Dustin Fineout
Also was able to replicate your results using the sample at the other SO question, which I appended with various random messages and still received colliding hashes. Hopefully I can find that reference.
Dustin Fineout
@nik: Mainly academic interest, basically about some "wrongly-implemented" HMACs and what impact a collision attach would have for them.
mihi