The issue is that you haven't specified how the POST data should be encoded by the browser, and subsequently you are getting whatever the browser has guessed it should use - usually ISO-8859-1 instead of Unicode (UTF-8).
The HTML 4.01 spec for the FORM element includes the "accept-charset" attribute which allows you to specify your preference for which encoding to POST data with:
accept-charset = charset list [CI]
This attribute specifies the list of
character encodings for input data
that is accepted by the server
processing this form. The value is a
space- and/or comma-delimited list of
charset values. The client must
interpret this list as an exclusive-or
list, i.e., the server is able to
accept any single character encoding
per entity received.
The default value for this attribute
is the reserved string "UNKNOWN". User
agents may interpret this value as the
character encoding that was used to
transmit the document containing this
FORM element.
In other words, if you serve a page encoded in UTF-8, the browser would default to posting requests in UTF-8.
The best fix is to specify the character encoding for all your pages by either including the appropriate encoding in your response headers, or including something like the following in your HTML within the HEAD section:
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
The HTML 4.01 spec has a section on how to specify which character encoding you are serving.
An alternate but lesser fix is to not specify the character encoding anywhere, and instead decode your filename manually assuming the browser is sending in the default encoding of ISO-8859-1:
def upload_file(request):
if request.method == 'POST':
form = UploadFileForm(request.POST, request.FILES)
if form.is_valid():
filename = form.cleaned_data.image.name.decode('iso-8859-1')
...