I am connecting to a MS SQL server through SQL Alchemy, using pyodbc module. Everything appears to be working fine, until I began having problems with the encodings. Some of the non-ascii characters are being replaced with '?'
The DB has a collation 'Latin1_General_CI_AS' (I've checked also the specific fields and they keep the same collation). I started selecting the encoding 'latin1' in the call of create_engine
and that appears to work for Western European character (like French or Spanish, characters like é
) but not for Easter European characters. Specifically, I have a problem with the character ć
I have been trying to select other encodings as stated on Python documentation, specifically the Microsoft ones, like cp1250
and cp1252
, but I keep facing the same problem.
Does anyone knows how to solve those differences? Does the collation 'Latin1_General_CI_AS' has an equivalence on Python encodings?
The code for my current connection is the following
for sqlalchemy import *
def connect():
return pyodbc.connect('DSN=database;UID=uid;PWD=password')
engine = create_engine('mssql://', creator=connect, encoding='latin1')
connection = engine.connect()
Clarifications and comments:
- This problems happens when retrieving information from the DB. I don't need to store anything.
- At the beginning I didn't specify the encoding, and the result was that, whenever a non ascii character was encountered on the DB, pyodbc raises a UnicodeDecodeError. I corrected that using 'latin1' as encoding, but that doesn't solve the problem for all the characters.
- I admit that the server is not on latin1, the comment is incorrect. I have been checking both the database collation and the specific fields collations and appears to be all in 'Latin1_General_CI_AS', then, how can
ć
be stored? Maybe I'm not correctly understanding collations. - I corrected a little the question, specifically, I have tried more encodings than
latin1
, alsocp1250
andcp1252
(which apparently is the one used on 'Latin1_General_CI_AS', according to msdn)
UPDATE:
OK, Following these steps, I get that the encoding used by the DB appears to be cp1252: http://bytes.com/topic/sql-server/answers/142972-characters-encoding Anyway, that appears to be a bad assumption as reflected on answers.
UPDATE2: Anyway, after configuring properly the odbc driver, I don't need to specify the encoding on the Python code.