tags:

views:

1002

answers:

3

Hello. I am trying to download an xml.gz file from a remote server with HttpsURLConnection in java, but I am getting an empty response. Here is a sample of my code:

URL server = new URL("https://www.myurl.com/path/sample_file.xml.gz");
HttpsURLConnection connection = (HttpsURLConnection)server.openConnection();
connection.connect();

When I try to get an InputStream from the connection, it is empty. (If I try connection.getInputStream().read() I get -1) The file I am expecting is approximately 50MB.

To test my sanity, I aslo tried entering the exact same url in my browser, and it did return the file I needed. Am I missing something? Do I have to set some sort of parameter in the connection? Any help/direction is much appreciated.

+2  A: 

Is any exception being logged? Is the website presenting a self-signed SSL certificate, or one that is not signed by a CA? There are several reasons why it might work fine in your browser (the browser might have been told to accept self-signed certs from that domain) and not in your code.

What are the results of using curl or wget to fetch the URL?

The fact that the InputStream is empty / result from the InputStream.read() == -1 implies that there is nothing in the stream to read, meaning that the stream was not able to even be set up properly.

Update: See this page for some info on how you can deal with invalid/self-signed certificates in your connection code. Or, if the site is presenting a certificate but it is invalid, you can import it into the keystore of the server to tell Java to trust the certificate. See this page for more info.

matt b
The third party that is supplying the file originally told me to use curl as follows:curl --location -C - --digest -khttps://www.myurl.com/path/sample_file.xml.gz -o sample_file.xml.gzthis works fine too!
Zakir Hemraj
the -k switch with curl means "Allow connections to SSL sites without certs", so I think it's safe to assume that the site isn't presenting a valid certificate. You'll have to update your code to account for this
matt b
the curl command works without the -k switch. I'm guessing that means that the cert is valid.
Zakir Hemraj
+2  A: 
  1. Verify the response code is 200
  2. Check that connection.contentType to verify the content type is recognized
  3. You may need to add a Content-Handler for the GZ mime type, which I can't recall off the top of my head.

After the comment describing the response code as 3xx,

  1. Set 'connection.setFollowRedirects(true)'

Should fix it.

Ken Gentle
The response code is 302, which means "the data requested actually resides under a different URL". I wonder if this is my problem... I guess this is why the provider told me to use the "-location" parameter when using curl to download the file. Is there any way around this?
Zakir Hemraj
'connection.setFollowRedirects(true)' should do it.
Ken Gentle
The followRedirects property is set to true by default, and I was able to see the redirected url. But, from your original answer, I noticed when I do connection.getContentType(), null is returned. Does this mean I have to creat a ContentHandler[Factory] for mime type "application/x-gzip"?
Zakir Hemraj
+1  A: 

Turns out the download wasn't working because the remote server was redirecting me to a new url to download the file. Even though connection.setFollowRedirects(true) was set, I still had to manually set up a new connection for the redirected URL as follows:

if (connection.getResponseCode() == 302 && connection.getHeaderField("location") != null){
      URL server2 = new URL(connection.getHeaderField("location"));
      HttpURLConnection connection2 = (HttpURLConnection)server2.openConnection();
      connection2.connect();
      InputStream in = connection2.getInputStream();
}

After that, I was able to retrieve the file from the input stream. Thanks for all your help guys!

Zakir Hemraj
I noticed this as well:http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6810084
Jesse Glick