views:

56

answers:

2

I have some problem with UTF-8. My client (realized in GWT) make a request to my servlet, with some parametres in the URL, as follow:

http://localhost:8080/servlet?param=value

When in the servlet I retrieve the URL, I have some problem with UTF-8 characters. I use this code:

protected void service(HttpServletRequest request, HttpServletResponse response) 
                    throws ServletException, IOException {

        request.setCharacterEncoding("UTF-8");

        String reqUrl = request.getRequestURL().toString(); 
        String queryString = request.getQueryString();
        System.out.println("Request: "+reqUrl + "?" + queryString);
        ...

So, if I call this url:

http://localhost:8080/servlet?param=così

the result is like this:

Request: http://localhost:8080/servlet?param=cos%C3%AC

What can I do to set up properly the character encoding?

+1  A: 

From the HttpServletRequest#getQueryString() javadoc:

Returns: a String containing the query string or null if the URL contains no query string. The value is not decoded by the container.

Note the last statement. So you need to URL-decode it youself using java.net.URLDecoder.

String queryString = URLDecoder.decode(request.getQueryString(), "UTF-8");

However, the normal way to gather parameters is just using HttpServletRequest#getParameter().

String param = request.getParameter("param"); // così

The servletcontainer has already URL-decoded it for you then if you have configured it to use the correct encoding. The request.setCharacterEncoding() has only effect on the request body (POST) not on the request URI (GET). Also see Mirage's answer.

BalusC
if I use the URLDecoder they work, but when I want to retrieve only the parameter with getParameter(), they don't work... any suggestion?
Gabriele
You need to set the server URI encoding as Mirage114 explains. Also see [this article](http://balusc.blogspot.com/2009/05/unicode-how-to-get-characters-right.html#JSPServletRequest)
BalusC
+1  A: 

I've run into this same problem before. Not sure what Java servlet container you're using, but at least in Tomcat 5.x (not sure about 6.x) the request.setCharacterEncoding() method doesn't really have an effect on GET parameters. By the time your servlet runs, GET parameters have already been decoded by Tomcat, so setCharacterEncoding won't do anything.

Two ways to get around this:

  1. Change the URIEncoding setting for your connector to UTF-8. See http://tomcat.apache.org/tomcat-5.5-doc/config/http.html.

  2. As BalusC suggests, decode the query string yourself, and manually parse it (as opposed to using the ServletRequest APIs) into a parameter map yourself.

Hope this helps!

Mirage114
The URIEncoding setting in #1 is in Tomcat's server.xml. Other servlet containers should reasonably have the same kind of setting.
Mirage114
Mirage114
I ran into a problem with the server.xml setting. On windows machines it worked correctly, but on our production Red Hat based machines Tomcat appeared to ignore the server.xml setting. We ended up having to implement our own query parameter parser that explicitly decoded it using UTF-8.
Herms
This is one of the many places where Java's over-reliance on the ‘default encoding’ causes heavy breakage. The encoding you want in URLs is almost always UTF-8, and almost never the server's default encoding.
bobince