ansaurus

Question

Encoding calling from pyodbc to a MS SQL Server

Answer 1

+1 A:

You should stop using code pages and switch to Unicode. This is the only way of getting rid of this kind of problems.

Sorin Sbarnea 2010-09-20 12:14:42

Unfortunately, we are not in control of that database, so we cannot decide how the data is stored :-(

Khelben 2010-09-20 12:22:14

Does this relates to your problem http://code.google.com/p/pyodbc/issues/detail?id=52 ? Check how is your ODBC provider configured and maybe you can assure the encoding change from there. Do not forget to test these outside pyodbc.

Sorin Sbarnea 2010-09-20 12:42:44

Yes, that's was it! The problem was that I didn't configure the driver on UTF-8 mode. I had to add `client charset = UTF-8` to the configuration of the /etc/freetds.conf file

Khelben 2010-09-20 14:00:05

Answer 2

A:

OK, per http://msdn.microsoft.com/en-us/library/ms174596(v=SQL.90).aspx the encoding of Latin1_General_CI_AS is most probably cp1252. So, you'd have to use encoding='cp1252'. But this could solve only halve of the problem, because you have to output the values somehow to see, whether the characters are there or not. So if you have some_db_value, which you extracted from the database, you have to some_db_value.encode('proper-output-encoding') to have it right. proper-output-encoding depends, how you output this: on the console, it is the console encoding, which can be anything like 'cp1252', 'cp437', 'cp850' (on windows). On the web, it is the encoding of the webserver, hopefully 'utf-8'.

edit: Please read John Machin's answer, as it is not clear whether 'cp1252' is the correct database encoding

knitti 2010-09-20 12:23:17

Answer 3

+1 A:

Try connecting to the db with the pyodbc.connect() parameter unicode_results=True , eg. from sqlalchemy:

engine = create_engine('mssql://yourdb', connect_args={'unicode_results': True})

This should make sure that all the results (and not only those from nvarchar etc...) you get are unicode, correctly converted from whatever encoding is used in the db.

As for writing to the db, just always use unicode. If I'm not mistaken (will check later), pyodbc will make sure it will get written to the db correctly as well.

(of course, if the db uses an encoding that does not support the characters you want to write, you will still get errors: if you want the columns to support any kind of character, you will have to use unicode columns on the db too)

Steven 2010-09-20 12:31:36

Answer 4

+2 A:

Original comment turned into an answer:

cp1250 and cp1252 are NOT "latin1 encodings". A collation is not an encoding. Re your comment: Who says that "the server is encoded in latin1"? If the server expects all input/output to be encoded in latin1 (which I doubt), then you quite simply can't get some Eastern European characters into your database (nor Russian, Chinese, Greek, etc etc).

Update:

You need to look further afield than the collation. """msdn.microsoft.com/en-us/library/ms174596(v=SQL.90).aspx suggests, for Latin1_General_CI_AS the used encoding is cp1252""" is codswallop. The table provides an LCID (locale ID), default collation, and codepage for each locale. Yes, the collation "Latin1_General_CI_AS" is listed in association with the cp1252 codepage for several locales. For two locales (Armenian and Georgian), it is listed in association with the "Unicode" codepage (!!!).

Quite simply, you need to find out what codepage the database is using.

Try to extract data from the database without specifing an encoding at all. Don't bother encoding that in whatever encoding you guess your console may be using -- that only adds another source of confusion. Instead, use print repr(data). Report back here what you get from the repr() where you expect non-Latin1 characters.

John Machin 2010-09-20 12:52:36

+1 you are correct, and I didn't read fully.

knitti 2010-09-20 13:10:00

ansaurus

tags:

views:

answers:

Encoding calling from pyodbc to a MS SQL Server

related questions