views:

207

answers:

2

How do you do a file upload in an HTML form without running into mojibake?

I have a form that has three fields:

  1. a file field
  2. a required text field
  3. a text field which accepts Japanese characters

I've set up my HTML form with the attribute enctype='multipart/form-data'. But when the form submission fails due to the missing required field, I get redirected to the same page but my 2nd text field (the one that accepts the Jap. chars) is already mojibaked.

However, if I remove the enctype or change it to anything else, and when the form submission fails, I see the Japanese chars as they are (no mojibake). The problem is, if this succeeds, I am unable to read the uploaded files.

Any ideas how to fix this??

+1  A: 

Mojibake (mangled display of Japanese characters) can have two causes:

  1. The data on the page is in the right character encoding, but the browser does not recognize it.

  2. Some characters on the page use the wrong encoding (the server wrote them in an incorrect encoding).

If the other characters on the page (outside of your form) show correctly, you produced broken output on your server.

If everything is clobbered, and you can fix it by manually setting a different encoding from the browser's menu, then the page encoding is not properly specified.

What kind of content-type headers and HTML meta tags do you use?

Thilo
We already have `<meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/>` in our html. Also, with firefox's Tamper Data plugin, I can see that the POSTDATA submitted is already garbled up (for enctype=multipart/form-data). So I guess it's already broken before it reaches our server.
Franz See
Is my encoding incorrect?
Franz See
is this a publicly accessible page? can we have a look?
Thilo
Sorry. It's not yet publicly accessible. And I'm just changing a small part of a huge functionality so I'm not sure how I can strip it down to show as a code snippet.
Franz See
+1  A: 

I've figured it out (by reverse-engineering appfuse (appfuse.org) which does not seem to be affected by mojibake with its file upload form ).

It solved it by setting the charset encoding to UTF-8 in the server side (with spring's org.springframework.web.filter.CharacterEncodingFilter ). Thus, I guess multipart-/form-data really does screw up the character encoding ( or at least for java ).

Franz See
Setting the character encoding for reading posted request bodies (whether www-encoded or form-data) in Servlet/JSP is annoying and non-standardised, unfortunately, and the default of ISO-8859-1 is becoming increasingly outdated.
bobince