views:

88

answers:

2

Hello Friends,

I am trying to process the following URL via the HttpGet method:

https://graph.facebook.com/search?q=Cafe++Bakery&type=event&access_token=&type=event&access_token=239090718395|lqqOnRWlcJOb3QGp3G4HW2aqhlc.

And I get the following exception:

    java.lang.IllegalArgumentException: 
Invalid uri 'https://graph.facebook.com/search?q=Cafe++Bakery&type=event&access_token=&type=event&access_token=239090718395|lqqOnRWlcJOb3QGp3G4HW2aqhlc.': Invalid query
            at org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:222)
            at org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)

Now when I cut & paste that URL into the browswer it works just fine. I am guess it is some sort of URL Encoding that needs to occur, but I am not sure what I have to change to call url from Http Client.

Thanks in Advance

+3  A: 

Use URLEncoder.encode() to encode the URL

Don
A: 

The URL you are trying to connect to is not a valid URL according to RFC 1738. The character '|' cannot appear unencoded in a URL; see section 2.2.

Using URLEncoder.encode() is NOT the answer. The problem is that URLEncoder.encode() is not designed for this task. Rather it is designed for encoding raw character data to the "application/x-www-form-urlencoded" MIME format. This will:

  • %-encode a characters such as '/', ':', '?' and so on, in addition to the troublesome '|',
  • %-encode any '%' characters ... resulting in double %-encoding , and
  • replace any space characters with '+' characters, resulting in URL mangling.

(Refer to the javadoc for UrlEncoder for a precise spec of what characters are encoded, and how.)

All of these incorrect / over-zealous can be harmful, depending on how the web server handles the URLs. In the name of security, a lot of webservers cope with URLs where syntactically significant characters have been encoded unnecessarily, and will repeatedly decode until no valid-looking %-encoding sequences remain. So in a lot of cases, you can get away with using URLEncoder.

But no webserver should attempt to turn '+' characters into space characters. And some of the defensive tricks can be problematical; e.g. if you really need to send a '%' data character in the URL.

So what is the real solution? Unfortunately it is difficult. The correct thing to do is to parse the URL into its constituent parts using a parser that is tolerant of URL syntax errors, and the put it back together relying on the URL (or URI) class to encode the components correctly as required by the URL / URI specifications.

Alternatively, reject the URL. After all, it is invalid.

Stephen C