views:

26

answers:

1

I have a small webapp which handles a lot of Spanish text.

At one point in the code, a JSP page responds with a Json String containing some of this text. If I print the String to the Console, it looks like jibberish. But if I examine the header/content of the response in Chrome Developer Tools, it looks correct. It is transferred in the correct encoding. This part of the webapp functions as expected.

At another point in the code, a different JSP page responds with HTML. Some of this HTML contains more of the Spanish text. This time, the text is transferred (and displayed) as jibberish.

What are potential reasons that this could be happening? Both times, I'm just printing the text using out.print. Why does it work at one point, but not in other?

Examples:

// In a file who's only output is the json string
String jsonString = ...
System.err.println(jsonString); // prints jibberish
out.println(jsonString); // looks correct when the response is viewed in Chrome Developer tools, and looks correct in a browser

...

// In a file who's output is a complete html page
String spanishText = ...
out.println("<label>" + spanishText + "</label>"); // looks like jibberish when the response is viewed in Chrome developer tools, and shows up as jibberish in a browser
+1  A: 

You need to set the encoding which the JSP/Servlet response should use to print the characters and instruct the webbrowser to use the same encoding.

This can be done by putting this in top of your JSP:

<%@ page pageEncoding="UTF-8" %>

Or if you're actually doing this in a Servlet:

response.setCharacterEncoding("UTF-8");

The "jibberish" when using System.err is a different problem. You need to set the encoding of the console/logfile which is been used to print this information to. If it's for example Eclipse, then you can set it by Window > Preferences > General > Workspace > Text File Encoding.

See also:

BalusC
I tried this, but it doesn't seem to work. If I view the request headers in Chrome, the `Content-Type` is specified correctly (`text/html;charset=UTF-8`), but if I view the request content, it is still jibberish (i.e. `útiles` instead of `útiles`), and it shows up as jibberish in the browser.
Matthew
The `pageEncoding="UTF-8"` is important as well. When you're reading the string from other source using `FileReader/FileInputStream` or so, then you need to take its encoding into account as well. Also see [this chapter](http://balusc.blogspot.com/2009/05/unicode-how-to-get-characters-right.html#TextFiles) of the linked article. Give it a read from top to bottom and you'll understand better what happens "under the hoods". This understanding is pretty important to nail down the problem.
BalusC