tags:

views:

322

answers:

5

I am basically creating an API in php, and one of the parameters that it will accept is an md5 encrypted value. I don't have much knowledge of different programming languages and also about the MD5. So my basic question is, if I am accepting md5 encrypted values, will the value remain same, generated from any programing language like .NET, Java, Perl, Ruby... etc.

Or there would be some limitation or validations for it.

+5  A: 

Yes. MD5 isn't an encryption function, it's a hash function that uses a specific algorithm.

Daniel Vandersluis
+3  A: 

Yes, md5 hashes will always be the same regardless of their origin - as long as the underlying algorithm is correctly implemented.

bunting
+18  A: 

Yes, correct implementation of md5 will produce the same result, otherwise md5 would not be useful as a checksum. The difference may come up with encoding and byte order. You must be sure that text is encoded to exactly the same sequence of bytes.

Andrey
It should be mentioned that getting things to have *exactly* the same bytes is a non-trivial problem. Text encoding, byte order, the list goes on.
Travis Gockel
Also line endings -- I had a really annoying bug once where md5sums were not matching across multiple systems, and it turned out that some systems were removing the trailing newline from the input text, and others were not.
Ether
Your answer implies the cart before the horse. MD5 is supposed give you a way to guarantee you got the exact same bytes. Travis is right that is a non-trivial problem. That's why these checksums exist, is to make sure you get the exact same bytes.
glowcoder
@glowcoder you understood me slightly incorrectly. Horse is on its place, don't worry. I said that for correct implementation of md5 will produce same bytes for same input bytes. what is not trivial is to convert same text to same bytes.
Andrey
+3  A: 

A vital point of secure hash functions, such as MD5, is that they always produce the same value for the same input.

However, it does require you to encode the input data into a sequence of bytes (or bits) the same way. For instances, there are many ways to encode a string.

Tom Hawtin - tackline
+10  A: 

It will, but there's a but.

It will because it's spec'd to reliably produce the same result given a repeated series of bytes - the point being that we can then compare that results to check the bytes haven't changed, or perhaps only digitally sign the MD5 result rather than signing the entire source.

The but is that a common source of bugs is making assumptions about how strings are encoded. MD5 works on bytes, not characters, so if we're hashing a string, we're really hashing a particular encoding of that string. Some languages (and more so, some runtimes) favour particular encodings, and some programmers are used to making assumptions about that encoding. Worse yet, some spec's can make assumptions about encodings. This can be a cause of bugs where two different implementations will produce different MD5 hashes for the same string. This is especially so in cases where characters are outside of the range U+0020 to U+007F (and since U+007F is a control, that one has its own issues).

All this applies to other cryptographic hashes, such as the SHA- family of hashes.

Jon Hanna
@Jon Hanna : Thanks for explaining, and as per I understand from your point is as far as the encoding of the strings are same, it will produce same md5 hash irrespective to programming language.Thanks
jtanmay
Yep, and the newline has to be the same too. MD5 guarantees correct result with same set of bytes, it's in not feeding the right bytes into it that bugs come about.
Jon Hanna