tags:

views:

328

answers:

4

Hello,

I have the following code in the view call..

def view(request):
    body = u""  
    for filename, f in request.FILES.items():
        body = body + 'Filename: ' + filename + '\n' + f.read() + '\n'

On some cases I get "UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 7470: ordinal not in range(128)". What am I doing wrong? (I am using Django 1.1.)

Thank you.

+2  A: 

you are appending f.read() directly to unicode string, without decoding it, if the data you are reading from file is utf-8 encoded use utf-8, else use whatever encoding it is in.

decode it first and then append to body e.g.

data = f.read().decode("utf-8")
body = body + 'Filename: ' + filename + '\n' + data + '\n'
Anurag Uniyal
+2  A: 

Anurag's answer is correct. However another problem here is you can't for certain know the encoding of the files that users upload. It may be useful to loop over a tuple of the most common ones till you get the correct one:

encodings = ('windows-xxx', 'iso-yyy', 'utf-8',)
for e in encodings:
    try:
        data = f.read().decode(e)
        break
    except UnicodeDecodeError:
        pass
shanyu
+2  A: 

If you are not in control of the file encoding for files that can be uploaded , you can guess what encoding a file is in using the Universal Encoding Detector module chardet.

mhawke
+1 This has been helpful.
shanyu
+2  A: 

Django has some utilities that handle this (smart_unicode, force_unicode, smart_str). Generally you just need smart_unicode.

from django.utils.encoding import smart_unicode
def view(request):
    body = u""  
    for filename, f in request.FILES.items():
        body = body + 'Filename: ' + filename + '\n' + smart_unicode(f.read()) + '\n'
Silfheed
Thanks and i am going to upvote all of you once registered =)
seyfettin sipsak