views:

119

answers:

2

In Python 2.5, I have the following hash function:

def __hash__(self):
  return hash(str(self))

It works well for my needs, but now I started to get the following error message. Any idea of what is going on?

return hash(str(self))
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 16: ordinal not in range(128)

How could I fix this?

Thanks!

+2  A: 

The problem is that you are trying to hash a string that is not convertible to ASCII. The str method takes a unicode object and, by default, converts it to ASCII.

To fix this problem you need to either hash the unicode object directly, or else convert the string using the correct codec.

For example, you might do this if you are reading unicode from the console on a US Windows localized system:

return hash(mystring.encode("cp437"))

On the other hand, data from the registry or API functions might be encoded as:

return hash(mystring.encode("cp1252"))

Please note that the encoding for the local system varies depending on the localization, so you will need to find out what that is using the locale library.

I noticed that you were converting str(self), which means you will need to override the __str__ method to do the encoding there, and probably in __repr__ for the affected objects.

http://boodebr.org/main/python/all-about-python-and-unicode

Is a nice link that has a lot of useful information about Python and unicode. See in particular the section on "Why doesn't print work?"

Christopher
+1  A: 

The error doesn't seem to be in the __hash__ function, but in the __str__ function.

Try str(yourobject) in the object with the problem and you'll see what I mean.

Please edit the question and add your __str__ function (and relevant data) so we can point you on how to correct it.

nosklo