tags:

views:

938

answers:

4

I'm trying to parse the following URI : http://translate.google.com/#zh-CN|en|

but got this error message :

java.net.URISyntaxException: Illegal character in fragment at index 34: http://translate.google.com/#zh-CN|en|你
        at java.net.URI$Parser.fail(URI.java:2809)
        at java.net.URI$Parser.checkChars(URI.java:2982)
        at java.net.URI$Parser.parse(URI.java:3028)

It's having problem with the "|" character, if I get rid of the "|", the last Chinese char is not causing any problem, what's the right way to handle this ?

My method look like this :

  public static void displayFileOrUrlInBrowser(String File_Or_Url)
  {
    try { Desktop.getDesktop().browse(new URI(File_Or_Url.replace(" ","%20").replace("^","%5E"))); }
    catch (Exception e) { e.printStackTrace(); }
  }

Thanks for the answers, but BalusC's solution seems to work only for an instance of the url, my method needs to work with any url I pass to it, how would it know where's the starting point to cut the url into two parts and only encode the second part ?

+9  A: 

The pipe character is "considered unsafe" for use in URLs. You can fix it by replacing the | with its encoded hex equivalent, which would be "%7C"

However, replacing individual characters in a URL is a brittle solution that does not work very well when you consider that, in any given URL, there could potentially be quite a number of different characters that may need to be replaced. You are already replacing spaces, carets, and pipes.... but what about brackets, and accent marks, and quotation marks? Or question marks and ampersands, which may or may not be valid parts of a URL, depending on how they are used?

Thus, a superior solution would be to use the language's facility for encoding URLs, rather than doing it manually. In the case of Java, use URLEncoder, as per the example in BalusC's answer to this question.

Spike Williams
FYI: `URLEncoder` (despite the name) should not be used to encode URLs. The doc says: _This class contains static methods for converting a String to the application/x-www-form-urlencoded MIME format._ This is not the same as the encoding used by URIs/URLs.
McDowell
BalusC's solution seems to work for this instance of the url, but I need the method to work for all urls I pass to it, how would it know from what starting point to parse the rest of the url ?The url could any of the following :www.yahoo.com/abc/xyzhttp://yahoo.com/abc/123/yahoo.com/abc/123/...
Frank
I think you would need to split the URL into pieces... domain, path, query string, and fragment. The domain should not get encoded. The path, you would have to split up by slashes, and encode each part of the path, then put it back together. For the query string, you would need to encode each parameter name and value. You would also have to encode the fragment. Then, reassemble the URL.
Spike Williams
A: 

Alright, I found how to do it, like this :

try { Desktop.getDesktop().browse(new URI(File_Or_Url.replace(" ","%20").replace("^","%5E").replace("|","%7C"))); }
catch (Exception e) { e.printStackTrace(); }
Frank
Use URLEncoder.
Software Monkey
+3  A: 

Aren't you better off using URLEncoder than selectively encoding stuff?

Geo
+4  A: 

You should use java.net.URLEncoder to URL-encode the query with UTF-8. You don't necessarily need regex for this. You don't want to have a regex to cover all of those thousands Chinese glyphs, do you? ;)

String query = URLEncoder.encode("zh-CN|en|你", "UTF-8");
String url = "http://translate.google.com/#" + query;
Desktop.getDesktop().browse(new URI(url));
BalusC