views:

234

answers:

1

I don't know if this is expected behavior or not, but if I create a project with a single model with an ImageField field and upload a photo with the filename "árvórés", the uploaded file is saved with an incomprehensible filename(ascii, I presume). As a direct result, that photo becomes impossible to retrieve from the site.

Is this normal? If yes, then how to allow those types of filenames?

+2  A: 

The issue is that you haven't specified how the POST data should be encoded by the browser, and subsequently you are getting whatever the browser has guessed it should use - usually ISO-8859-1 instead of Unicode (UTF-8).

The HTML 4.01 spec for the FORM element includes the "accept-charset" attribute which allows you to specify your preference for which encoding to POST data with:

accept-charset = charset list [CI]

This attribute specifies the list of character encodings for input data that is accepted by the server processing this form. The value is a space- and/or comma-delimited list of charset values. The client must interpret this list as an exclusive-or list, i.e., the server is able to accept any single character encoding per entity received.

The default value for this attribute is the reserved string "UNKNOWN". User agents may interpret this value as the character encoding that was used to transmit the document containing this FORM element.

In other words, if you serve a page encoded in UTF-8, the browser would default to posting requests in UTF-8.

The best fix is to specify the character encoding for all your pages by either including the appropriate encoding in your response headers, or including something like the following in your HTML within the HEAD section:

<META http-equiv="Content-Type" content="text/html; charset=UTF-8">

The HTML 4.01 spec has a section on how to specify which character encoding you are serving.

An alternate but lesser fix is to not specify the character encoding anywhere, and instead decode your filename manually assuming the browser is sending in the default encoding of ISO-8859-1:

def upload_file(request):
    if request.method == 'POST':
        form = UploadFileForm(request.POST, request.FILES)
        if form.is_valid():
            filename = form.cleaned_data.image.name.decode('iso-8859-1')
            ...
Tyson
The problem has occurred in the admin part of django. That explanation is good as I will probably need to write my own forms, but I find it strange django doesn't handle that.
Ricardo B.