views:

6064

answers:

5

I am trying to get a java.net.URI object from a String. The string has some characters which will need to be replaced by their percentage escape sequences. But when I use URLEncoder to encode the String with UTF-8 encoding, even the / are replaced with their escape sequences.

How can I get a valid encoded URL from a String object?

http://www.google.com?q=a b gives http%3A%2F%2www.google.com... whereas I want the output to be http://www.google.com?q=a%20b

Can someone please tell me how to achieve this.

I am trying to do this in an Android app. So I have access to a limited number of libraries.

+1  A: 

You could use URLEncoder only for the "a b" string and after this merge the strings.

schnaader
+3  A: 

You might try: org.apache.commons.httpclient.util.URIUtil.encodeQuery in Apache commons-httpclient project

Like this:

URIUtil.encodeQuery("http://www.google.com?q=a b")

will become:

http://www.google.com?q=a%20b

You can of course do it yourself, but URI parsing can get pretty messy...

Hans Doggen
Thanks Hans. I am trying to do this in an Android app. So I have access to a limited number of libraries. Do you have any other suggestions? Thanks again
lostInTransit
Perhaps you could have a look at the source of the URIUtil class (it is open source after all). I would assume that it is possible to extract the necessary code from that class.
Hans Doggen
+1  A: 

You can use the multi-argument constructors of the URI class. From the URI javadoc:

The multi-argument constructors quote illegal characters as required by the components in which they appear. The percent character ('%') is always quoted by these constructors. Any other characters are preserved.

So if you use

URI uri = new URI("http", "www.google.com?q=a b");

Then you get http:www.google.com?q=a%20b which isn't quite right, but it's a little closer.

If you know that your string will not have URL fragments (e.g. http://example.com/page#anchor), and also that your string won't have additional : characters, then you can use the following code to get what you want:

String s = "http://www.google.com?q=a b";
String[] parts = s.split(":");
URI uri = new URI(parts[0], parts[1], null);

To be safe, you should scan the string for additional : or # characters, but this should get you started.

Jason Day
A: 

The java.net blog had a class the other day that might have done what you want (but it is down right now so I cannot check).

This code here could probably be modified to do what you want:

http://svn.apache.org/repos/asf/incubator/shindig/trunk/java/common/src/main/java/org/apache/shindig/common/uri/UriBuilder.java

Here is the one I was thinking of from java.net: https://urlencodedquerystring.dev.java.net/

TofuBeer
A: 

If you don't like libraries, how about this?

Note that you should not use this function on the whole URL, instead you should use this on the components...e.g. just the "a b" component, as you build up the URL - otherwise the computer won't know what characters are supposed to have a special meaning and which ones are supposed to have a literal meaning.

/** Converts a string into something you can safely insert into a URL. */
public static String encodeURIcomponent(String s)
{
    StringBuilder o = new StringBuilder();
    for (char ch : s.toCharArray()) {
        if (isUnsafe(ch)) {
            o.append('%');
            o.append(toHex(ch / 16));
            o.append(toHex(ch % 16));
        }
        else o.append(ch);
    }
    return o.toString();
}

private static char toHex(int ch)
{
    return (char)(ch < 10 ? '0' + ch : 'A' + ch - 10);
}

private static boolean isUnsafe(char ch)
{
    if (ch > 128 || ch < 0)
        return true;
    return " %$&+,/:;=?@<>#%".indexOf(ch) >= 0;
}
Tim Cooper