views:

73

answers:

2

I'm processing a UTF-8 file in Python, and have used simplejson to load it into a dictionary. However, I'm getting a UnicodeDecodeError when I try to turn one of the dictionary values into a string:

f = open('my_json.json', 'r')
master_dictionary = json.load(f)
#some json wrangling, then it fails on this line...
mysql_string += " ('" + str(v_dict['code'])
Traceback (most recent call last):
  File "my_file.py", line 25, in <module>
    str(v_dict['code']) + "'), "
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf4' in position 35: ordinal not in range(128)

Why is Python even using ASCII? I thought it used UTF-8 by default, and the input is from a UTF-8 file.

$ file my_json.json 
my_json.json: UTF-8 Unicode English text

What is the problem?

+1  A: 

One way to make this work would be to set the default encoding to UTF-8 explicitly, like:

import sys
sys.setdefaultencoding("utf-8")

This could lead to unintended consequences if you don't want everything to be unicode by default.

A cleaner way could be to use the unicode function rather than str:

mysql_string += " ('" + unicode(v_dict['code'])

or specify the encoding explicitly:

mysql_string += " ('" + unicode(v_dict['code'], "utf-8")

danben
+1  A: 

Python 2.x uses ASCII by default. Use unicode.encode() if you want to turn a unicode into a str:

v_dict['code'].encode('utf-8')
Ignacio Vazquez-Abrams
Thanks! To encode all the items in the dictionary, I did: for k, v in v_dict.iteritems(): if v_dict[k]: v_dict[k] = v_dict[k].encode('utf-8')
AP257