views:

126

answers:

2

I'm making a small project in Google AppEngine but I'm having problems with international chars. My program takes data from the user through the url "page.html?data1&data2..." and stores it for displaying later.

But when the user are using some international characters like åäö it gets coded as %F4, %F5 and %F6. I assume it is because only the first 128(?) chars in ASCII table are allowed in http-requests.

Is there anyone who has a good solution for this? Any simple way to decode the text? And is it better to decode it before I store the data or should I decode it when displaying it to the user.

+1  A: 

URLs can contain anything, but it should be encoded. In Java you can use URLEncoder and URLDecoder to encode and decode urls with the desired character encoding.

Have in mind that these classes are actually meant for HTML form encoding, but they can be applied to the query string (the parameters) of the URLs, so do not use them on the whole URLs - only on the parameters.

Bozho
Got some question marks instead of the %-codes. But I should be able to solve that somehow. Thanks for the help!
Irro
For others with my problem: I got it working by using ISO-8859-1 decoding. For some reason UTF-8 didn't work.
Irro
The content encoding of the URL depends on the browser, and on the encoding of the page that contained the URL or the form. Try explicitly serving up the page containing the form as UTF-8. ISO-8859-1 may solve your immediate problem, but will make it impossible for users to use the vast majority of unicode characters.
Nick Johnson
A: 

The URI spec (RFC 3986) restricts the characters that can be used in URIs (see the ABNF) and defines a percent-encoding scheme for transmitting "unsafe" characters. As Bozho says, the query part of the URL is usually encoded as per the HTML spec (application/x-www-form-urlencoded).

The doc for App Engine says:

App Engine uses the Java Servlet standard for web applications.

So, you should probably let the Servlet API decode the parameters for you. See the parameter methods on HttpServletRequest. This sort of encoding should generally be kept to the view layer, so data would be stored unencoded.

If you do things manually, have a look at this blog post on character handling in URIs.

McDowell
Got another problem instead but this was really helpful. Thanks!
Irro