views:

206

answers:

2

I have a django project that uses a sqlite database that can be written to by an external tool. The text is supposed to be UTF-8, but in some cases there will be errors in the encoding. The text is from an external source, so I cannot control the encoding. Yes, I know that I could write a "wrapping layer" between the external source and the database, but I prefer not having to do this, especially since the database already contains a lot of "bad" data.

The solution in sqlite is to change the text_factory to something like: lambda x: unicode(x, "utf-8", "ignore")

However, I don't know how to tell the Django model driver this.

The exception I get is:

'Could not decode to UTF-8 column 'Text' with text' in /var/lib/python-support/python2.5/django/db/backends/sqlite3/base.py in execute

Somehow I need to tell the sqlite driver not to try to decode the text as UTF-8 (at least not using the standard algorithm, but it needs to use my fail-safe variant).

A: 

Feed the data with one of the magic str function from Django :

smart_str(s, encoding='utf-8', strings_only=False, errors='strict')

or

smart_unicode(s, encoding='utf-8', strings_only=False, errors='strict')
maersu
I am sorry if I misunderstand you, but the problem is that the database already contains 'bad' data, and I want to do the conversion when I read it. The page you refer to seems to deal with inputting strings into the database.The tool that imports data does not use django, but works with the pysqlite module. It consists of legacy code that I am reluctant to change.Thanks for the response.
Krumelur
have you tried to fill the 'bad' DB content into the two function above?
maersu
Jose Boveda
Sorry, but I must admit you got me totally confused now. I don't understand how to use those functions at the database driver level. No matter how I read the docs, I can only see that they operate on strings, but Sqlite throws an exception way before I get hold of the actual string. The question is updated with the exception I get.
Krumelur
I realize now that my original question wasn't very clearly formulated. The problem is that I get an exception before I can even see the data. Just iterating over the records in the model is enough to trigger the exception.
Krumelur
+4  A: 

The solution in sqlite is to change the text_factory to something like: lambda x: unicode(x, "utf-8", "ignore")

However, I don't know how to tell the Django model driver this.

Have you tried

from django.db import connection
connection.connection.text_factory = lambda x: unicode(x, "utf-8", "ignore")

before running any queries?

zifot
Thanks for the input! The above worked with a few modifications (namely, one has to create a cursor first, otherwise the DatabaseWrapper.connection is None).I've been tearing my hair about this.
Krumelur