I'm converting a legacy app from ISO-8859-1 to UTF-8, and I've used a number of resources to determine what I need to set to get this to work. However, after several configuration, code, and environment changes, my Servlet (in Tomcat 5) doesn't seem to process submitted HTML form content as UTF-8.
Here's what I've set up for configuration.
- System properties
[user@server ~]$ locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=
- tomcat5 server.xml
<Connector protocol="HTTP/1.1" ... URIEncoding="UTF-8" useBodyEncodingForURI="true"/>
- JSP file
<%@ page language="java" pageEncoding="UTF-8" contentType="text/html;charset=UTF-8" %> ... <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
- Servlet filter
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) { if(request.getCharacterEncoding() == null) { request.setCharacterEncoding("UTF-8"); } ...
With some debug logs I know the following:
System.getProperty("file.encoding"): "UTF-8" java.nio.charset.Charset.defaultCharset(): "UTF-8" new OutputStreamWriter(new ByteArrayOutputStream()).getEncoding(): "UTF8"
However, when I submit my form with an input containing "Бить баклуши", I see the following (from my logs):
request.getParameter("myParameter") = Ð\221иÑ\202Ñ\214 баклÑ\203Ñ\210Ð
I know that the request content type was null
, so it was explicitly set to "UTF-8" in my servlet filter. Also, I'm viewing my logs from a terminal, whose encoding I know is set to UTF-8 as well.
What am I missing here? What else do I need to set for the Servlet to correctly process my input as UTF-8? If more information will help, I'll be glad to add more debugging and update this question with it.
Edit:
- I'm not using Windows Terminal (I'm using PuTTY), so I'm pretty certain the problem is not what I'm viewing the logs with. This is seconded by the fact that when I send my response back to the browser with the submitted content and output it, it's the same garbage as above.
- The form's being submitted from IE8.
Solution:
My web.xml
definition for my CharsetFilter was too far down (below my servlet configurations and other filters). I moved the filter definition to the very top of the web.xml document and everything worked correctly. See the accepted answer below.