views:

82

answers:

2

I'm working on a python project in 2.6 that also has future support for python 3 being worked in. Specifically I'm working on a digest-md5 algorithm.

In python 2.6 without running this import:

from __future__ import unicode_literals

I am able to write a piece of code such as this:

a1 = hashlib.md5("%s:%s:%s" % (self.username, self.domain, self.password)).digest() 
a1 = "%s:%s:%s" %(a1, challenge["nonce"], cnonce )

Without any issues, my authentication works fine. When I try the same line of code with the unicode_literals imported I get an exception:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xa8 in position 0: unexpected code byte

Now I'm relatively new to python so I'm a bit stuck in figuring this out. if I replace the %s in the formatting string as %r I am able to concatenate the string, but the authentication doesn't work. The digest-md5 spec that I had read says that the 16 octet binary digest must be appended to these other strings.

Any thoughts?

+1  A: 

The problem is that "%s:%s:%s" became a unicode string once you imported unicode_literals. The output of the hash is a "regular" string. Python tried to decode the regular string into a unicode string and failed (as expected. The hash output is supposed to look like noise). Change your code to this:

a1 = a1 + str(':') + str(challenge["nonce"]) + str(':') + str(cnonce)

I'm assuming cnonce and challenge["nonce"] are regular strings. To have more control over their conversion to strings (if needed), use:

a1 += str(':') + challenge["nonce"].encode('UTF-8') + str(':') + cnonce.encode('UTF-8')
Tal Weiss
This solution and explanation also works. Thank you.
Macdiesel
+1  A: 

The reason for the behaviour you observed is that from __future__ import unicode_literals switches the way Python works with strings:

  • In the 2.x series, strings without the u prefix are treated as sequences of bytes, each of which may be in the range \x00-\xff (inclusive). Strings with the u prefix are ucs-2 encoded unicode sequences.
  • In Python 3.x -- as well as in the unicode_literals future, strings without the u prefix are unicode strings encoded in either UCS-2 or UCS-4 (depends on the compiler flag used when compiling Python). Strings with the b prefix are literals for the data type bytes which are rather similar to pre-3.x non-unicode strings.

In either version of Python, byte-strings and unicode-strings must be converted. The conversion performed by default depends on your system's default charset; in your case this is UTF-8. Without setting anything, it should be ascii, which rejects all characters above \x7f.

The message digest returned by hashlib.md5(...).digest() is a bytes-string, and I suppose you want the result of the whole operation to be a byte-string as well. If you want that, convert the nonce and cnonce-strings to byte-strings.:

a1 = hashlib.md5("%s:%s:%s"  % (self.username, self.domain, self.password)).digest()
# note that UTF-8 may not be the encoding required by your counterpart, please check
a1 = b"%s:%s:%s" %(a1, challenge["nonce"].encode("UTF-8"), cnonce.encode("UTF-8") )

Alternatively, you can convert the byte-string coming from the call to digest() to a unicode string (not recommended). As the lower 8 bit of UCS-2 are equivalent to ISO-8859-1, this might serve your needs:

a1 = hashlib.md5("%s:%s:%s"  % (self.username, self.domain, self.password)).digest()
a1 = "%s:%s:%s" %(a1.decode("ISO-8859-1"), challenge["nonce"], cnonce)
nd
The first solution worked with the code. Thank you for your insightful answer.
Macdiesel