views:

159

answers:

2

I had recently a problem with encoding of websites generated by servlet, that occurred if the servlets were deployed under Tomcat, but not under Jetty. I did a little bit of research about it and simplified the problem to the following servlet:

public class TestServlet extends HttpServlet implements Servlet {
    @Override
    public void service(HttpServletRequest request, HttpServletResponse response) throws IOException {
        response.setContentType("text/plain");
        Writer output = response.getWriter();
        output.write("öäüÖÄÜß");
        output.flush();
        output.close();
    }
}

If I deploy this under Jetty and direct the browser to it, it returns the expected result. The data is returned as ISO-8859-1 and if I take a look into the headers, then Jetty returns:

Content-Type: text/plain; charset=iso-8859-1

The browser detects the encoding from this header. If I deploy the same servlet in Tomcat, the browser shows up strange characters. But Tomcat also returns the data as ISO-8859-1, the difference is, that no header tells about it. So the browser has to guess the encoding, and that goes wrong.

My question is, is that behaviour of Tomcat correct or a bug? And if it is correct, how can I avoid this problem? Sure, I can always add response.setCharacterEncoding("UTF-8"); to the servlet, but that means I set a fixed encoding, that the browser might or might not understand. The problem is more relevant, if no browser but another service accesses the servlet. So how I should deal with the problem in the most flexible way?

A: 

If you don't specify the encoding, Tomcat is free to encode your characters however it feels, and the browser is free to guess what encoding Tomcat picked. You are correct in that the way to solve the problem is response.setCharacterEncoding("UTF-8").

You shouldn't worry about the chance that the browser won't understand the encoding, as virtually all browsers released in the past 10 years support UTF-8. Though if you're really worried, you can inspect the "Accept-Encoding" headers provided by the user agent.

Will
That's not correct, the specification requires ISO-8859-1 as default encoding.
Tim Jansen
I have no problem with tomcat picking an encoding, but a problem with the fact, that tomcat doesn't tell the browser which encoding it was choosing. And as I wrote, modern browsers may support ISO- and Unicode-encodings, but other programs may access services provided by servlets.
Dishayloo
@Tim: Which specification would that be? I'd say its probably irrelevant in this case.
Rasmus Kaj
@Rasmus Kaj: Servlet 2.5 Spec, SRV.5.4: "If the servlet does not specify a character encoding before the getWriter method of the ServletResponse interface is called or the response is committed,the default ISO-8859-1 is used."
Tim Jansen
@Tim Jansen: Ok, that is relevant in this case, i misunderstood the original question to one where it would not be relevant. Sorry.
Rasmus Kaj
+3  A: 

If you don't specify an encoding, the Servlet specification requires ISO-8859-1. However, AFAIK it does not require the container to set the encoding in the content type, at least not if you set it to "text/plain". This is what the spec says:

Calls to setContentType set the character encoding only if the given content type string provides a value for the charset attribute.

In other words, only if you set the content type like this

response.setContentType("text/plain; charset=XXXX")

Tomcat is required to set the charset. I haven't tried whether this works though.

In general, I would recommend to always set the encoding to UTF-8 (as it causes the least amount of trouble, at least in browsers) and then, for text/plain, state the encoding explicitly, to prevent browsers from using a system default.

Tim Jansen
Hmm, is the behaviour of Jetty incorrect? Jetty makes the things much easier in this case, as it works as expected.
Dishayloo
I think so. Or at least I can't find anything in the spec that says that Jetty should modify the content type in this case.
Tim Jansen