views:

59

answers:

3

Hi,

I have a URI string like the following:

http://www.christlichepartei%F6sterreichs.at/steiermark/

I'm creating a java.lang.URI instance with this string and it succeeds but when I want to retrieve the host it returns null. Opera and Firefox also choke on this URL if I enter it exactly as shown above. But shouldn't the URI class throw a URISyntaxException if it is invalid? How can I detect that the URI is illegal then?

It also behaves the same when I decode the string using URLDecoder which yields

http://www.christlicheparteiösterreichs.at/steiermark/

Now this is accepted by Opera and Firefox but java.net.URI still doesn't like it. How can I deal with such a URL?

thanks

+1  A: 

The correct way to encode non-ASCII characters in hostnames is known as "Punycode".

MSalters
+2  A: 

Java 6 has IDN class to work with internationalized domain names. So, the following produces URI with encoded hostname:

URI u = new URI(IDN.toASCII("http://www.christlicheparteiösterreichs.at/steiermark/"));
axtavt
excellent, thanks!
Raoul Duke
+1  A: 

URI throws an URISyntaxException, when you choose the appropriate constructor:

URI someUri=new URI("http","www.christlicheparteiösterreichs.at","/steiermark",null);

java.net.URISyntaxException: Illegal character in hostname at index 28: http://www.christlicheparteiösterreichs.at/steiermark

You can use IDN for this to fix:

URI someUri=new URI("http",IDN.toASCII("www.christlicheparteiösterreichs.at"),"/steiermark",null);
System.out.println(someUri);
System.out.println("host: "+someUri.getHost()));

Output:

http://www.xn--christlicheparteisterreichs-5yc.at/steiermark

host: www.xn--christlicheparteisterreichs-5yc.at

UPDATE regarding the chicken-egg-problem:

You can let URL do the job:

public static URI createSafeURI(final URL someURL) throws URISyntaxException
{
return new URI(someURL.getProtocol(),someURL.getUserInfo(),IDN.toASCII(someURL.getHost()),someURL.getPort(),someURL.getPath(),someURL.getQuery(),someURL.getRef());    
}


URI raoul=createSafeURI(new URL("http://www.christlicheparteiösterreichs.at/steiermark/readme.html#important"));

This is just a quick-shot, it is not checked all issues concerning converting an URL to an URI. Use it as a starting point.

Michael Konietzka
Hi. Thanks for your answer but how does the URI constructor help me when I don't have the individual parts of the URL. It's a bit of a chicken and egg problem :)
Raoul Duke
You are right. It depends from where do you get your data. If you get an String like "http://www.christlicheparteiösterreichs.at/steiermark/" as input, you just cannot use it in new URI(String), because the JavaDoc states, it wants an already correct URI-String. But this string is not. You have to check where in the dataflow the String gets "corrupted". Where does this string come from?
Michael Konietzka
Hi, thanks for taking the time to look into this. THe suggestion in your update looks promising, I probably can work with that. Thanks again!
Raoul Duke