ansaurus

Question

UnicodeDecodeError when using socket.gethostname() result

Answer 1

A:

Yes, if either the hostname or the dirname is a unicode string, it is likely to give you that error. The best solution is typically to make sure both are unicode, and not just one of them.

Lennart Regebro 2009-08-17 21:50:43

So how would I do that?

Frank Niessink 2009-08-17 21:57:11

You decode an encoded string with "thestring".decode('encoding'). Which of the strings is a string and which is unicode you'll have to debug to figure out, same thing with what encoding to use, I don't know you have to check.

Lennart Regebro 2009-08-19 10:41:46

Answer 2

A:

You want a unique string based on the hostname, but it's got Unicode characters in it. There are a variety of ways to reduce a Unicode string to an ascii string, depending on how you want to deal with non-ascii characters. Here's one:

self.hostname = socket.gethostname().encode('ascii', 'replace').replace('?', '_')

This will replace all non-ascii characters with a question mark, then change those to underscore (since file systems don't like questions marks in file names).

Ned Batchelder 2009-08-18 00:23:40

Ned, thanks. What I don't understand is *why* I'm getting this exception: why is os.path.join decoding with the ascii codec? I've added the traceback to my question (should've done that right away of course, sorry).

Frank Niessink 2009-08-18 08:12:25

Answer 3

A:

I don't think that there is a problem with the actual code that you've posted, even if socket.gethostname() returns a unicode object. There will be a problem when you attempt to use name such that it is converted to a string first:

import os
hostname = u'\u1306blah'
pid = os.getpid()
name = os.path.join(os.path.dirname('/tmp/blah.lock'), "%s.%s" % (hostname, pid))

>>> type(name)
<type 'unicode'>

>>> name
u'/tmp/\u1306blah.28292'

>>> print name
/tmp/ጆblah.29032

>>> str(name)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u1306' in position 5: ordinal not in range(128)

You can see that str(name) raises the exception that you're seeing, but everything looks OK up until that point. What are you doing with name once you've constructed it?

mhawke 2009-08-18 05:47:35

It's the os.path.join that seems to be throwing the exception, that would be the fourth line in your example code. I've added the traceback to my question (should've done that right away of course, sorry).

Frank Niessink 2009-08-18 08:09:08

Answer 4

+1 A:

I don't think gethostname() is necessarily giving you a unicode object. It could be the directory name of lockfile. Regardless, one of them is a standard string with a non-ASCII (higher than 127) char in it and the other is a unicode string.

The problem is that the join function in the ntpath module (the module Python uses for os.path on Windows) attempts join the arguments given. This causes Python to try to convert the normal string parts to unicode. In your case the non-unicode string appears to have a non-ASCII character. This can't be reliably converted to unicode, so Python raises the exception.

A simpler way to trigger the problem:

>> from ntpath import join
>> join(u'abc', '\xff')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)

/home/msmits/<ipython console> in <module>()

/usr/lib64/python2.6/ntpath.pyc in join(a, *p)
    106                     path += b
    107                 else:
--> 108                     path += "\\" + b
    109             else:
    110                 # path is not empty and does not end with a backslash,

The traceback shows the problem line in ntpath.py.

You could work around this by using converting the args to join() to standard strings first as other answers suggest. Alternatively you could convert everything to unicode first. If a specific encoding is given to decode() high bytes can be converted to unicode.

For example:

>> '\xff'.decode('latin-1')
u'\xff'

Menno Smits 2009-11-19 11:20:23

ansaurus

tags:

views:

answers:

UnicodeDecodeError when using socket.gethostname() result

related questions