views:

29

answers:

2

I'm having problems with sending redirections to servlet with Unicode-URLs.

i.e. consider the following url in Turkish

http://türkçeisimtescil.com

It works if you paste it into your browser's address bar. However it is translated to

http://xn--trkeisimtescil-ijb74a.com

by your browser upon your request.

Let's say I have first URL with UTF8-specific chars and I get it successfully from DB. I want to redirect my servlet to that URL.

However when I just do response.sendRedirect(url); (according to headers) it redirects me to www.t%1frk%e7eisimtescil.com

I tried even response.sendRedirect("http://www.t\u011Frk\u00E7eisimtescil.com"); (inline encoding) and the response is exactly the same.

Maybe if I obtain tükrçeisimtescil.com on the headers, browser will convert it to xn--.. format and it will succeed.

I could not figure out where the encoding got broken. Any helps are appreciated.

A: 

Solved.

IDN class of java.net.IDN solves this by obtaining "ponycode" (xn--.. ) URLs.

java.net.IDN.toASCII(url)

3 self-answering in a row ftw :)

Ahmet Alp Balkan
You're quick! Note that you'd like to convert only the domain (host) part. See my answer.
BalusC
Actually I need the whole url (authority, port, querystring, domain) to convert. Of course IDN ponycode should apply just on domain. I am looking for a convenient solution because it does something like xn--http://.. which is unexpected (by me). Any ideas?
Ahmet Alp Balkan
Interestingly it generates a `http://xn--` for `http://A.türkçe` or `http://w.türkçe` very well but `xn--http://` for `http://türkçeblabla...` pulling my hair out.
Ahmet Alp Balkan
Simply here's the problem: `String idn = IDN.toASCII("http://türkçeisimtescil.com"); System.out.println(idn);`generates a `xn--http:// which` is wrong. I should convert only the domain without losing anything in `authentication:data@and:port/orFile?plus=querystrings#and_refs`
Ahmet Alp Balkan
Solved as follows http://pastebin.com/Q1pQDYXB (again, talking to myself lol)
Ahmet Alp Balkan
+3  A: 

That's an Internationalized Domain Name (IDN). Its conversion between ASCII and Unicode is specified in RFC 3490. In Java, you can use java.net.IDN to convert between the one and other. You can use java.net.URL to obtain the host part from the URL.

String host = new URL("http://türkçeisimtescil.com").getHost();
String idn = IDN.toASCII(host);
String newURL = "http://" + idn;
BalusC
already found out thanks anyway.
Ahmet Alp Balkan