views:

4170

answers:

10

I have this Web Application in JSP running on JBoss Application Server. I am using Servlets for friendly urls. I'm sending search parameters through my JSP's and Servlets. I am using a form with a text box, the Servlet

The first Servlet uses request.getParameter() to get the text, and sends it to another Servlet with response.sendRedirect (masking the URL to something "friendly"). This final Servlet uses request.getRequestDispatcher().forward() to send the parameters to the JSP in the "ugly" way: searchResults.jsp?searchParameters=Parameters.

Now, when the Search Results page is displayed, the URL displays the correct search term with "friendly url". Example: http://site.com/search/My-Search-Query even when using special characters like: http://site.com/search/Busqué-tildes-y-eñies. But when I try to use that search term in my JSP, the special characters are not displayed correctly.

The whole system uses i18n, and we've had no problems with special characters so far. But when the information is sent through the form (say from index.jsp to searchResults.jsp) special characters are not correctly displayed:

á - á
é - é
í - Ã
ó - ó
ú - ú
ñ - ñ

The whole code base is supposed to be in UTF-8, but apparently I'm missing something when passing the parameters. As I said, they are correctly displayed in the URL, but not inside the JSP.

I was thinking of converting those á manually, but I guess there's a better way to do it correctly, using the correct encoding. Besides, there can be new characters later which I may not be aware of right now (French, Spanish, etc.)

Just in case, I'll let you know I have these lines on each JSP:

<?xml version="1.0" encoding="UTF-8" ?>
<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>

+2  A: 

First off, I have no idea how to solve this, since I don't know much about Java and JSP.

Having said that: the characters on the right-hand side of your table are the UTF-8 encoding of the left-hand side. That is, somewhere in your code, you're interpreting bytes as Latin-1 (or whatever your default encoding is), where they actually represent UTF-8 encoded characters...

Arnout
+2  A: 

Just a wild guess. Try this inside your JSP/Servlet:

if(request.getCharacterEncoding() == null) {
   request.setCharacterEncoding("UTF-8");
}

You need to be sure that the correct encoding is passed to your servlet.

kgiannakakis
+1  A: 

I think the problem might be that the browser does not specify the form post to be utf-8. There is a lot to read about form posts and encodings on the web, multiple web frameworks provide character encoding filters to 'fix' this issue, maybe just like your idea for a fix was - see for example http://static.springframework.org/spring/docs/2.5.x/api/org/springframework/web/filter/CharacterEncodingFilter.html

Simon Groenewolt
+1  A: 

The problem is that the information sent by the browser hasn't got a well-defined encoding and there's no way in HTTP to specify it.

Luckily most browsers will use the encoding of the page that contains the form. So if you use UTF-8 in all your pages, then most browsers will send all data in UTF-8 encoding as well (and your examples show that that's exactly how it is sent).

Unfortunately the most common Java application servers don't really handle the case (can't blame them, it's mostly guesswork anyway).

You can tell your application server to treat any input as UTF-8, by calling

request.setCharacterEncoding("UTF-8");

Based on your coding style and the frameworks you use, it might be to late when the control flow reaches your code, so it might be possible to do that in a javax.servlet.Filter.

Joachim Sauer
+1  A: 

Thanks for your answers. I tried a few things, but nothing has fixed the problem.

Here's what I've done:

  • I added a ServletRequestListener which sets the session's character encoding to UTF-8, and a Filter for every Http request, which does the same.

  • As I said, everything in the JSPs is encoded with UTF-8 (see headers in question).

  • I printed the Servlets' character encoding to the console, which were null by default, set them to UTF-8 like kgiannakakis and saua said.

None of these actions fixes the problem. I'm wondering if there's something else wrong with this...

Fernando
+3  A: 

Check out the connecter setting in your tomcat config. There is an option (URIEncoding) you can set to treat URIs as UTF-8. By default they are treated as ISO-8859-1.

+1  A: 

We had a similar problem. It was solved when all JSPs have been saved with the UTF-8 BOM.

+3  A: 

Try to set URIEncoding in {jboss.server}/deploy/jboss-web.deployer/server.xml.

Ex:

<Connector port="8080" address="${jboss.bind.address}"    
     maxThreads="250" maxHttpHeaderSize="8192"
     emptySessionPath="true" protocol="HTTP/1.1"
     enableLookups="false" redirectPort="8443" acceptCount="100"
     connectionTimeout="20000" disableUploadTimeout="true" URIEncoding="UTF-8" />
A: 

Do you use RequestDumper? If it is configured in deploy/jboss-web.deployer/server.xml then try to remove it and then test your encoding.

mgamer
A: 

response.setCharacterEncoding("UTF-8");

will put this one to bed!

Cheesle
This answer was already given.
BalusC
If you look carefully this answer has not already been given correctly hence my post. Joachim Sauer suggested using request.setCharacterEncoding(...) which does not solve the problem. In actual fact it is response.setCharacterEncoding(...) you need to use as it is the response that need setting and not the request! Nearly caught me out too!!
Cheesle