tags:

views:

71

answers:

2

I know the process for padding in md5, but what is the purpose of adding a 1 and several 0's to a message that is already the correct length?

Is this for security or just a marker?

+1  A: 

The message is padded so that the length is divisible by 512. Remember that it is the bit representation of the message for which the hash is being calculated. And since the message needs to be broken into 512-bit chunks, extra bits are added as padding. Check the Algorithm section on the Wiki for more details.

Gangadhar
Yes, I know this, but what if it's already 512-64 bits long? you have the message length added in the 64 and it is now divisible by 512. But instead of using this, you still have to pad an extra 512 bits in the middle. What is the purpose of padding when it is already divisible by 512 without padding?
pclem12
If I understood what you said right, what you are saying is that in case the message is (512-64) bits long why should there be any padding added ? The next 64 bits will be the length of the message represented as a 64-bit integer, so why add padding. Am I right in understanding your question? If so, are you referring to any implementation when you say that an extra 512 bits are being added in the middle?
Gangadhar
sorry for not responding quickly...got busy, but yes that is what I wanted and Thomas Pornin answered it below
pclem12
+1  A: 

The padding procedure must not create collisions. If you have a message m it is padded into pm, which has a length multiple of 512. Now imagine pm as a message m' in itself, i.e. the padding bits already added as if they were part of the message. If padding just keeps m' unchanged, as you suggest, then m and m' would yield the same hash value, even though they are distinct messages. That would be a collision, also known as "not good at all".

Generally speaking, the padding procedure must be such that it could potentially be unambiguously removed: you must be able to look at a padded message, and decide without hesitation which bits are from the message itself, and which were added as padding. Nothing in the course of the hash function actually removes the padding, but it must be conceptually feasible. This is kind of mathematically impossible if messages of length multiple of 512 are "padded" by adding no bit at all.

The above is generic to all hash functions. MD5 and a few functions of the same general family (including SHA-1, SHA-256...), using the Merkle-Damgård construction, also need the input data length to be encoded in the padding (this is necessary to achieve some security proofs). In MD5, the length is encoded as a 64-bit number. With the '1' bit, there are at least 65 padding bits for any message (and at most 511).

Thomas Pornin
Thank you very much. This is the answer I was looking for
pclem12