views:

2465

answers:

3

A requirement of the product that we are building is that its URL endpoints are semantically meaningful to users in their native language. This means that we need UTF-8 encoded URLs to support every alphabet under the sun.

We would also not like to have to provide installation configuration documentation for every application server and version that we support, so it would be nice if we could accomplish this in-code. This might not be possible, since by the time that the Servlet has received the request, its been encoded by the App server, etc.

I've gotten this working (for my first use case using ISO-Latin non-US ASCII characters) by reconstituting the request's path info with:

String pathInfoEncoded = new String(httpServletRequest.getPathInfo().getBytes(), "UTF-8");

and then parsing that.

However, this doesn't work after redirecting from a POST to a GET using sendRedirect(). The request's path comes in already escaped (so ö is encoded as %F6) and my method above doesn't work.

So I guess my question is am I going about this all wrong? And if so, whats the antidote to my ignorance? :)

Update : found the solution. The problem is that the Servlet API has some weird behaviour with regards to URL encoding before sending the redirect. You have to URL-encode (escape the UTF-8 characters) BEFORE you call sendRedirect(). The encodeRedirectURL() method doesn't do it for you.

This page discusses it: http://www.whirlycott.com/phil/2005/05/11/building-j2ee-web-applications-with-utf-8-support/

+2  A: 

A couple things to investigate and experiment with:

  • Have a look at your ./conf/server.xml file and ensure that the connector has the URIEncoding attribute set to "UTF-8".

E.g.:

<Connector port="8080" 
           protocol="HTTP/1.1" 
           URIEncoding="UTF-8"/>
  • Use some sort of browser-based tool (E.g.: TamperData for FireFox) to see what your browser is sending to the server--it very well may be escaping it for you. If this is the case, you can use URL.decode() it on the server.
  • Instead of using Response.redirect(), manually set the headers and response code.

E.g.:

response.setHeader("Location", myUtf8unencodedUrl);
response.setStatus(response.SC_MOVED_TEMPORARILY);

No promises, but this is what I would try out if it were me. :)

Stu Thompson
+2  A: 

found the solution. The problem is that the Servlet API has some weird behaviour with regards to URL encoding before sending the redirect. You have to URL-encode (escape the UTF-8 characters) BEFORE you call sendRedirect(). The encodeRedirectURL() method doesn't do it for you.

This page discusses it: http://www.whirlycott.com/phil/2005/05/11/building-j2ee-web-applications-with-utf-8-support/

ubermensch
A: 

We have the same situation here, i.e. our product as well is required to show meaningful URLs to the user in potentially every language on earth. All our tools and techniques are supporting UTF-8, so no problem with that. Escaping the UTF-8 characters technically works, but IE (7, 8) shows the ugly looking escaped URLs whereas Firefox unescapes them and displays nice urls, i.e. '/français/Banane.html' will be displayed in IE as '/fran%C3%A7ais/Banane.html'. GET after POST / redirecting after form submits did not work at all, neither sending UTF-8 urls nor escaped UTF-8 urls. We also tried to use XML-style numeric entity coding without success.

However, we finally found a way to successfully redirect after a POST: encoding the UTF-8 string bytewise using ISO-8859-1. None of us really understands how this can work anyway (how can the browser know how to decode that, as the number of bytes per utf-8 character may vary and how does the browser know, it originally was utf-8?) , but it does.

Here's a simple servlet to try that out:


package springapp.web.servlet;

import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;

import javax.servlet.ServletContext;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.apache.commons.io.IOUtils;

public class TestServlet extends HttpServlet {

 private static final long serialVersionUID = -1743198460341004958L;

 /* (non-Javadoc)
  * @see javax.servlet.http.HttpServlet#doGet(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse)
  */
 @Override
 protected void doGet(HttpServletRequest req, HttpServletResponse resp)
   throws ServletException, IOException {

  String url = "çöffte.html"; 
  try {
         ServletContext context = req.getSession().getServletContext();
   // read utf8 encoded russian url
            if (context.getResource("/WEB-INF/ru_url.txt") != null){
             InputStream is = context.getResourceAsStream("/WEB-INF/ru_url.txt"); 
             if (is != null){
              url = IOUtils.toString(is, "UTF-8");
              System.out.println(String.format("Redirecting to [%s]", url));
             }
            }
        }
        catch (FileNotFoundException fNFEx) {
         fNFEx.printStackTrace();
        }
        catch (IOException ioEx) {
         ioEx.printStackTrace();
        }

        byte[] utfBytes = url.getBytes("UTF-8");
        String result = new String(utfBytes, "ISO-8859-1");
        resp.sendRedirect(result);

        // does not work:
        //resp.sendRedirect(url);
        //resp.sendRedirect(Utf8UrlEscaper.escapeUtf8(url));
        //resp.sendRedirect(Utf8UrlEscaper.escapeToNumericEntity(url));
 }
}

For the redirect target copy and paste any native language url e.g. from wikipedia in a utf-8 encoded (without BOM!) file and save that in the WEB-INF directory. In our example we took a russian url (http://ru.wikipedia.org/wiki/Заглавная_страница) and save that in a file named 'ru_url.txt'.

We created a simple SpringMVC application mapping any *.abc url to the test servlet. Now if you start the app and enter something like 'localhost:8080/springmvctest/a.abc' you should be redirected to the russian wikipedia site and the browser (IE and Firefox, Safari or else possibly not) should show a nice utf-8 encoded, native russion url.

Cpt.Nut