In my Django webapp, in one location users can upload a text file where each line contains a string which will be operated on - the file isn't being stored on the server or anything like that.
My code looks like this:
roFile = request.FILES['uploadFileName']
ros = roFile.read().strip()
ros = ros.split('\n')
ros = [t.strip() for t in ros]
To date, this has worked AOK. Today I had a user uploading a file which was causing issues. Using these strings in Django generates the following error:
ProgrammingError: ERROR: invalid byte sequence for encoding "UTF8":0xff
The user has told me that he saved the file as UTF-16.
Within python proper, I can do the following:
import codecs
from django.utils.encoding import *
fo = codecs.open('filename', 'r', 'utf-16')
zz = fo.readlines()
and then the values seem to be manageable, but not with the file upload.
What is the appropriate way to deal with the data in request.FILES in order to handle the differing character set?