views:

37

answers:

2

How to make the servlet accept non-ascii (Arabian, chines, etc) characters passed from JSPs?

I've tried to add the following to top of JSPs:

<%@page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>

And to add the following in each post/get method in the servlet:

request.setCharacterEncoding("UTF-8");
response.setCharacterEncoding("UTF-8");

I've tried to add a Filter that executes the above two statements instead of in the servlet.

To be quite honest, these was working in the past, but now it doesn't work anymore.

I am using tomcat 5.0.28/6.x.x on JDK1.6 on both Win & Linux boxes.

Here's an example: JSP Page:

<%@page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
<html>
<head>
<title>Push Engine</title>
</head>
<body>
Hello ${requestScope['val']}
<form action="ControllerServlet" method="POST">
<table>
    <tr>
        <td>ABC</td>
        <td><input name="ABC" type="text" /></td>
    </tr>
    <tr>
        <td></td>
        <td><input type="submit" value="Submit"></td>
    </tr>
</table>
</form>

</body>
</html>

Servlet doGet method:

protected void doPost(HttpServletRequest request, HttpServletResponse response) 
            throws ServletException, IOException {
        request.setCharacterEncoding("UTF-8");
        String val = "request.getParameter('ABC') : " + request.getParameter("ABC");
        System.out.println(val);
        request.setAttribute("val", val);
        request.getRequestDispatcher("index.jsp").forward(request, response);
    }

THE PROBLEM IS: in the console, value "???" is being printed, however, the value returned backed to the JSP page containing the correct Unicode word

the "???" printed to the console is a problem in the machine that I ran this test on. I've ran the same example on another machine, and It works properly!

A: 

Setting the content type of the page is communication from your server to the browser about what the server is sending it, and that's not really going to help you much. What you need to ensure is that your client-to-server communication has the right character encoding, and that your server is running with the correct locale. The precise way you set that up depends on the framework you're using and how your server is configured; the first thing to do would be to make sure that your server is launched with the right locale in the environment (the LC_ALL variable probably).

Note that the client may try to tell your server what locale it wants, and that's something your framework would probably help you with. (It'd be a header in the HTTP request.)

Pointy
+2  A: 

To the point, you need to set the request encoding.

For GET requests (wherein the parameters are passed through the request URL), you need to configure this at appserver level. In for example Tomcat 6.0 it suffices to set the URIEncoding attribute of the <Connector> element in /conf/server.xml to UTF-8.

<Connector (...) URIEncoding="UTF-8" />

For POST requests (wherein the parameters are "invisibly" passed through the request body), you need to call ServletRequest#setCharacterEncoding() with UTF-8 before gathering any request parameter. The best place is to do this is in a filter which is been called as the very first filter in the chain:

if (request.getCharacterEncoding() == null) {
    request.setCharacterEncoding("UTF-8");
}
chain.doFilter(request, response);
BalusC
Ohhh, thanks too much, It is actually what I want.My problem was because I sent Get request, not Post
Mohammed
You're welcome.
BalusC
So, Isn't there any programmatic (opposite to configurable) way to solve this GET issue ?
Mohammed
You could parse the [HttpServletRequest#getQueryString()](http://java.sun.com/javaee/5/docs/api/javax/servlet/http/HttpServletRequest.html#getQueryString%28%29) yourself. It's not decoded by the container. To abstract this more, you could provide a [HttpServletRequestWrapper](http://java.sun.com/javaee/5/docs/api/javax/servlet/http/HttpServletRequestWrapper.html) implementation which does exactly that on all the getParameter() methods.
BalusC
You need to configure the console to output characters as UTF-8 as well. Also see http://balusc.blogspot.com/2009/05/unicode-how-to-get-characters-right.html#DevelopmentEnvironment (read the entire article though).
BalusC
It was actually a problem of my system!Thanks
Mohammed
You're welcome.
BalusC