views:

730

answers:

1

Hi, I have a problem reading a txt file to insert in the mysql db table, te sniped of this code:

file contains the in first line: "aclaración"

archivo = open('file.txt',"r")
for line in archivo.readlines():
....body = body + line
model = MyModel(body=body)
model.save()

i get a DjangoUnicodeDecodeError:

'utf8' codec can't decode bytes in position 8: invalid data. You passed in 'aclaraci\xf3n' (type 'str') Unicode error hint

The string that could not be encoded/decoded was: araci�n.

I tried to body.decode('utf-8'), body.decode('latin-1'), body.decode('iso-8859-1') without solution.

Can you help me please? Any hint is apreciated :)

+4  A: 

Judging from the \xf3 code for 'ó', it does look like the data is encoded in ISO-8859-1 (or some close relative). So body.decode('iso-8859-1') should be a valid Unicode string (you don't specify what "without solution" means -- what error message do you get, and where?); if what you need is a utf-8 encoded bytestring instead, body.decode('iso-8859-1').encode('utf-8') should give you one!

Alex Martelli
Thanks Alex, answering your question, here: >manage.py shellPython 2.5.4 (InteractiveConsole)>>> a = 'á'>>> a'\xa0'>>> a.decode('iso-8859-1').encode('utf-8')'\xc2\xa0'>>> test = unicode(a)Traceback (most recent call last): File "<console>", line 1, in <module>UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128)
panchicore
In `test=unicode(a)` you're implicitly using the ascii codec, as the error message so clearly tells you, so of course it fails. Use `unicode(a,'iso-8859-1')` if you know a's encoded in ISO-8859-1. If you assigned the results of the encode/decode sequence to another variable, say b, `unicode(b, 'utf-8')` would then work. Etc, etc. Maybe you're just calling encode and decode this way and that and NOT assigning and then using their results...?! Remember strings are immutable, so method calls don't CHANGE them: they return RESULTS (assign them and use them!-).
Alex Martelli