views:

964

answers:

1

I'm converting a legacy app from ISO-8859-1 to UTF-8, and I've used a number of resources to determine what I need to set to get this to work. However, after several configuration, code, and environment changes, my Servlet (in Tomcat 5) doesn't seem to process submitted HTML form content as UTF-8.

Here's what I've set up for configuration.

  • System properties
[user@server ~]$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
  • tomcat5 server.xml
<Connector protocol="HTTP/1.1"
    ...
    URIEncoding="UTF-8"
    useBodyEncodingForURI="true"/>
  • JSP file
<%@ page language="java" pageEncoding="UTF-8" contentType="text/html;charset=UTF-8" %>
...
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
  • Servlet filter
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
{
    if(request.getCharacterEncoding() == null)
    {
        request.setCharacterEncoding("UTF-8");
    }
    ...

With some debug logs I know the following:

System.getProperty("file.encoding"): "UTF-8"
java.nio.charset.Charset.defaultCharset(): "UTF-8"
new OutputStreamWriter(new ByteArrayOutputStream()).getEncoding(): "UTF8"

However, when I submit my form with an input containing "Бить баклуши", I see the following (from my logs):

request.getParameter("myParameter") = Ð\221иÑ\202Ñ\214 баклÑ\203Ñ\210Ð

I know that the request content type was null, so it was explicitly set to "UTF-8" in my servlet filter. Also, I'm viewing my logs from a terminal, whose encoding I know is set to UTF-8 as well.

What am I missing here? What else do I need to set for the Servlet to correctly process my input as UTF-8? If more information will help, I'll be glad to add more debugging and update this question with it.

Edit:

  • I'm not using Windows Terminal (I'm using PuTTY), so I'm pretty certain the problem is not what I'm viewing the logs with. This is seconded by the fact that when I send my response back to the browser with the submitted content and output it, it's the same garbage as above.
  • The form's being submitted from IE8.

Solution:

My web.xml definition for my CharsetFilter was too far down (below my servlet configurations and other filters). I moved the filter definition to the very top of the web.xml document and everything worked correctly. See the accepted answer below.

+2  A: 

Edit4 (the final and corrected answer as requested)

Your servlet filter gets applied too late.

A possible proper order would be in web.xml as follows

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE web-app
    PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
    "http://java.sun.com/j2ee/dtds/web-app_2.3.dtd"&gt;

<web-app>
    <!--CharsetFilter start--> 
    <filter>
        <filter-name>Charset Filter</filter-name>
        <filter-class>CharsetFilter</filter-class>
        <init-param>
            <param-name>requestEncoding</param-name>
            <param-value>UTF-8</param-value>
        </init-param>
    </filter>
    <!-- The rest is ommited -->
kd304
Sample code coming.
kd304
This was my problem - my CharsetFilter was too far down in my web.xml. I moved it up to the very top and it worked. Can you update this answer to add some detail about the web.xml order? I'll accept this answer, but a more complete/verbose answer will help others with this problem. Thanks!
Rob Hruska
I don't know, my answer and your explanation together seems to be OK. But I will do it.
kd304
Thanks for your time and help.
Rob Hruska