views:

1637

answers:

4

Hi everyone,

I can't understand why Java HttpURLConnection doesn't follow redirect. I use the following code to get the page http://bit.ly/4hW294:

import java.net.URL;
import java.net.HttpURLConnection;
import java.io.InputStream;

public class Tester {

    public static void main(String argv[]) throws Exception{
        InputStream is = null;

        try {
            String bitlyUrl = "http://bit.ly/4hW294";
            URL resourceUrl = new URL(bitlyUrl);
            HttpURLConnection conn = (HttpURLConnection)resourceUrl.openConnection();
            conn.setConnectTimeout(15000);
            conn.setReadTimeout(15000);
            conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 6.0; ru; rv:1.9.0.11) Gecko/2009060215 Firefox/3.0.11 (.NET CLR 3.5.30729)");
            conn.connect();
            is = conn.getInputStream();
            String res = conn.getURL().toString();
            if (res.toLowerCase().contains("bit.ly"))
                System.out.println("bit.ly is after resolving: "+res);
       }
       catch (Exception e) {
           System.out.println("error happened: "+e.toString());
       }
       finally {
            if (is != null) is.close(); 
        }
    }
}

Moreover, I've got the following response (it seems absolutely right!):

GET /4hW294 HTTP/1.1 Host: bit.ly Connection: Keep-Alive User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; ru-RU; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729) HTTP/1.1 301 Moved Server: nginx/0.7.42 Date: Thu, 10 Dec 2009 20:28:44 GMT Content-Type: text/html; charset=utf-8 Connection: keep-alive Location: https://www.myganocafe.com/CafeMacy MIME-Version: 1.0 Content-Length: 297

Unfortunately, 'res' variable contains the same URL and stream contains the following (obviously, Java HttpUTLConnection doesn't follow redirect!):

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <HTML> <HEAD> <TITLE>Moved</TITLE> </HEAD> <BODY> <H2>Moved</H2> <A HREF="https://www.myganocafe.com/CafeMacy"&gt;The requested URL has moved here.</A> <P ALIGN=RIGHT><SMALL><I>AOLserver/4.5.1 on http://127.0.0.1:7400&lt;/I&gt;&lt;/SMALL&gt;&lt;/P&amp;gt; </BODY> </HTML>

+1  A: 

HTTPUrlConnection is not responsible for handling the response of the object. It is performance as expected, it grabs the content of the URL requested. It is up to you the user of the functionality to interpret the response. It is not able to read the intentions of the developer without specification.

monksy
Why it has setInstanceFollowRedirects in this case? ))
Shcheklein
My guess is that it was a suggested feature to add in later, it makes sense.. my comment was more of reflected toward... the class is designed to go and grab web content and bring it back... people may want to get non HTTP 200 messages.
monksy
+1  A: 

Has something called HttpURLConnection.setFollowRedirects(false) by any chance?

You could always call

conn.setInstanceFollowRedirects(true);

if you want to make sure you don't affect the rest of the behaviour of the app.

Jon Skeet
It set to true (by default). And adding conn.setInstanceFollowRedirects(true);does't help. Moreover, the same code works perfectly with bit.ly/ /6mwKLw !
Shcheklein
Ooo... didn't know about that... Nice find... I was about to look up the class incase there was logic like that.... It makes sense that it would be returning that header giving the single responsibility principal.... now go back to answering C# questions :P [I'm kidding]
monksy
A: 

Its the correct response, but you know have to get the new Location from the response, and use that as the url

sp
+5  A: 

I don't think that it will automatically redirect from HTTP to HTTPS (or vice-versa).

Even though we know it mirrors HTTP, from the HTTP protocol point of view, HTTPS is just some other, completely different, unknown protocol. It would be unsafe to follow the redirect without user approval.

For example, suppose the application is set up to perform client authentication automatically. The user expects to be surfing anonymously because he's using HTTP. But if his client follows HTTPS without asking, his identity is revealed to the server.

erickson
Thanks. I've just found confiramtion: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4620571 . Namely: "After discussion among Java Networking engineers, it is felt that we shouldn't automatically follow redirect from one protocol to another, for instance, from http to https and vise versa, doing so may have serious security consequences. Thus the fix is to return the server responses for redirect. Check response code and Location header field value for redirect information. It's the application's responsibility to follow the redirect."
Shcheklein
+1 and +1 for the comment too, I wasn't aware of that
Pascal Thivent