views:

1010

answers:

2

I have a Web site with a message board. The board lets people post messages and include attachments. I had a problem where my site was hiccuping every time someone wrote a post with non-Unicode characters. In an effort to solve it, I changed my HTML form code from

enctype="multipart/form-data"

(as I'm accepting file uploads) to:

enctype="multipart/form-data;charset=UTF-8"

This solved the character problem. But it broke the file upload capability in Firefox 2 through 3.5. Firefox accepts all the text that the user submits, but not the file data. It acts totally like it should, but as if there was no file submitted. Everything works fine in Safari.

I also tried

enctype="multipart/form-data" accept-charset="UTF-8"

...but that had no effect on the character problem.

Any ideas for ways around this?

+2  A: 

‘charset’ is not a registered parameter for the ‘multipart/form-data’ media type. It shouldn't do anything.

According to RFC2388, the charset of the submitted fields should actually be passed by the browser in a ‘Content-Type’ header of the form-data subpart. In practice no browser does this.

‘accept-charset’ can't be used because it's broken in IE: instead of choosing the charset for the submission it actually specifies an alternative charset to use, on a per-field basis, when characters do not fit in the primary charset (which is the charset of the current page). This effectively mangles your strings as you cannot find out which charset IE actually used.

The only effective way to make browsers submit your forms as UTF-8 is to serve the page containing the form as UTF-8, by setting a ‘Content-Type: text/html;charset=utf-8’ header, including a <meta> http-equivalent, or both (can be a good idea if the user saves the page to disc, losing the header information).

bobince
Hmm. Don't know what to tell you. charset works like a charm. The page already has <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> in the head. Are you suggesting something else?
burton
If you've put that meta tag in, your forms will always be submitted as UTF-8, period. If you are nonetheless having problems dealing with non-ASCII in the submission, you need to fix that at the server side. Probably what happened with the multipart ‘charset’ was that you simple broke the ‘enctype’, making Firefox ignore it and submit the form as application/x-www-form-urlencoded instead (hence file uploads going missing).
bobince
A: 

The problem is not the foirm data, but the filename field - which simply does not work if you need utf-8 and file data, so if you need to process the filename on the server, which is common, you are messed up. if you set enctype="multipart/form-data;charset=UTF-8" in your form, Tomcat 6 converts this to: content type: application/x-www-form-urlencoded, which is the problem.

It has taken me ages to track this down, but it looks like it is broken in general, and I have tested this with Http equests from web browser, and also .Net, with same effect.

Colin Manning