views:

109

answers:

3

I am trying to port a python program to c#. Here is the line that's supposed to be a walkthrough but is currently tormenting me:

hash = hashlib.md5(inputstring).digest()

After generating a similar MD5 hash in c# It is absolutely vital that I create a similar hash string as the original python program or my whole application will fail.

My confusion lies in which encoding to use when converting to string in c# i.e

?Encoding enc = new ?Encoding();
string Hash =enc.GetString(HashBytes); //HashBytes is my generated hash

Because I am unable to create two similar hashes when using Encoding.Default i.e

string Hash = Encoding.Default.GetString(HashBytes);

So I'm thinking knowing the deafult hash.digest() encoding for python would help

EDIT

Ok maybe some more code will articulate my problem more. After the hash is calculated in the python program some calculations are carried out i.e

hash = hashlib.md5(inputstring).digest()

for i in range(0,6):

value += ord(hash[i])

return value

Now can you see why two different Hash strings will be problematic? Some of the characters that appear when the python program is ran are repalced by a '?' in C#.

+2  A: 

It is not encoded at all, it is just an array of bytes in both languages.

GregS
the problem is that the hash goes through alot of calculations in its string form to produce an integer result. The python hash string has strange characters that c# outputs only as ??? which affects the end integer result
The_AlienCoder
They are *not really* characters at all, that is where you are confused. In python, the string is just being used as a container for arbitrary binary data, an unfortunate design choice required by the lack of a bytearray type in earlier python versions. In C#, there is no confusion, the output is just a byte array. If you take the ord() of each python "character" you see it is that same as the value of each of the C# bytes.
GregS
The _string_ has to be encoded. Hash algo's operate on byte[] data, not on strings.
Henk Holterman
@Hank: I thought he was referring to the output of the hash.
GregS
+5  A: 

I presume you're using an earlier version of Python than 3, and your string is a normal str.

If you're talking about the output, the digest method returns a string consisting on raw bytes . The equivalent type in C# is byte[], which you already seem to have. It's not text, so using the Encoding class makes no sense.

If you're talking about the input, the md5 function takes in a normal str, which is a string of bytes. You'll have to look at the code before that to figure out what encoding the data is in.

Edit:

Regarding the code you posted, all it's doing is it's taking the values of the six first bytes in the hash and adding them together. You should be able to figure out how to do that in C#.

And make sure you learn the difference between a string of bytes and a string of characters.

Matti Virkkunen
Im still confused. Let me post some more code so you can really understand my problem.
The_AlienCoder
@The_AlienCoder: See my update.
Matti Virkkunen
Sorry I have zero knowledge of Python(2 days old!) my impression was that characters were being manipulated in the code. Let me try working with bytes and compare the outputs.
The_AlienCoder
+1 Works beautifully. Thanks for clearing things up.
The_AlienCoder
A: 

According to the documentation, Python strings are ASCII by default. Alternate encodings must be explicitly specified. Therefore, you should be able to pass an ASCII string to the C# MD5 library and get the same hash bytes as if you passed the string to the Python MD5 library.

jsumners
Python 2 strings are strings of bytes, and I doubt there's any convention for which encoding you should use. This is why we now have a proper string type in Python 3. Those links are related to parsing Python code files, not this.
Matti Virkkunen