views:

208

answers:

1

On a Python driven web app using a sqlite datastore I had this error:

Could not decode to UTF-8 column 'name' with text '300µL-10-10'

Reading here it looks like I need to switch my text-factory to str and get bytestrings but when I do this my html output looks like this:

300�L-10-10

I do have my content-type set as:

<meta http-equiv="content-type" content="text/html; charset=utf-8" />
+3  A: 

Unfortunately, the data in your datastore is not encoded as UTF-8; instead, it's probably either latin-1 or cp1252. To decode it automatically, try setting Connection.text_factory to your own function:

def convert_string(s):
    try:
        u = s.decode("utf-8")
    except UnicodeDecodeError:
        u = s.decode("cp1252")
    return u

conn.text_factory = convert_string
eswald
Word of caution if you want to adapt this recipe: cp1252 and latin-1 will both succesfully decode any bytestring, so you can't add more 'alternatives' after them.
Thomas Wouters
That would have produced a question mark rather than `xFFFD`.
BalusC
@Thomas: there are bytes `cp1252` won't decode, though not that many. (`iso-8859-1` of course does map every byte.)
bobince
@bobince: Ah, yes, forgot about those five bytes that cp1252 doesn't decode.
Thomas Wouters
@BalusC: U+FFFD ('REPLACEMENT CHARACTER') is the thing used instead of a questionmark in unicode, when failing to decode a bytesequence correctly (using the 'replace' error handler.) A questionmark is only used when you use the 'replace' error handler while encoding to bytestring.
Thomas Wouters
@eswald, this worked like a charm. I think the key to your answer is "data in your datastore is not encoded as UTF-8". I need to go back and fix that. Thanks.
Mark
@Mark: Glad it helped. The good news about this particular recipe is that you can do the conversion at your leisure, without having to update the code at the exact same time.
eswald