tags:

views:

729

answers:

5

Hi,

I have to login into a https web page and download a file using Java. I know all the URLs beforehand:

baseURL = // a https URL;
urlMap = new HashMap<String, URL>();
urlMap.put("login", new URL(baseURL, "exec.asp?login=username&pass=XPTO"));
urlMap.put("logout", new URL(baseURL, "exec.asp?exec.asp?page=999"));
urlMap.put("file", new URL(baseURL, "exec.asp?file=111"));

If I try all these links in a web browser like firefox, they work.

Now when I do:

urlConnection = urlMap.get("login").openConnection();
urlConnection.connect();
BufferedReader in = new BufferedReader(
    new InputStreamReader(urlConnection.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
    System.out.println(inputLine);
in.close();

I just get back the login page HTML again, and I cannot proceed to file download.

Thanks!

+2  A: 

I'd say have a look at Java CURL http://sourceforge.net/projects/javacurl. I have used it before to login into an https website and download stuff, it has features such as spoofing the browser id etc. Which might solve your issue of getting redirected back to login.

Although they provide an eclipse plugin for it I have used it without and it works fine.

Alternatively you could use wget and call it from java.

Mark Davidson
+3  A: 

Notwithstanding that you may have some other problem that's preventing the login request from getting you logged in, it's unlikely that you'll be able to proceed to the download page unless you store and return any cookies that the login page generates.

That's because HTTP itself is stateless, so in your current code there's no way for the remote server to tell that the second download request is from the same user that just logged in.

Alnitak
+3  A: 

I agree with Alnitak that the problem is likely storing and returning cookies.

Another good option I have used is HttpClient from Jakarta Commons.

It's worth noting, as an aside, that if this is a server you control, you should be aware that sending the username and password as querystrings is not secure (even if you're using HTTPS). HttpClient supports sending parameters using POST, which you should consider.

JacobM
How does using GET make it less secure. As far as I'm aware, when using HTTPS, nothing transmitted over the wire in unencrypted, including the address of the page being requested.
Kibbee
How would URL-based query parameters not be secure if you're using HTTPS? The HTTP request is encrypted along with the rest of the exchange. It's generally not secure in a BROWSER as that information is typically stored in the history.
Alan Krueger
While the querystrings are encrypted during transmission, they may be exposed to browser plugins, browser history, other applications running on your own machine, and very likely in server logs. It's bad practice to include sensitive data in a URL without encrypting it within your application.
JacobM
I believe the best practice is HTTP Basic-Auth over HTTPS. You can of course roll your own form-based authentication if that results in less server-side code to maintain.
Mark Renouf
+1  A: 

Perhaps you want to try HttpUnit. Although written with testing of websites in mind it may be usable for your problem.

From their website:

"... Written in Java, HttpUnit emulates the relevant portions of browser behavior, including form submission, JavaScript, basic http authentication, cookies and automatic page redirection, and allows Java test code to examine returned pages either as text, an XML DOM, or containers of forms, tables, and links."

Mathias Weidner
+4  A: 

As has been noted, you must maintain the session cookie between requests (see CookieHandler).

Here is a sample implementation:

class MyCookieHandler extends CookieHandler {

    private Map<String, List<String>> cookies = new HashMap<String, List<String>>();

    @Override
    public Map<String, List<String>> get(URI uri,
            Map<String, List<String>> requestHeaders) throws IOException {
        String host = uri.getHost();
        Map<String, List<String>> ret = new HashMap<String, List<String>>();
        synchronized (cookies) {
            List<String> store = cookies.get(host);
            if (store != null) {
                store = Collections.unmodifiableList(store);
                ret.put("Cookie", store);
            }
        }

        return Collections.unmodifiableMap(ret);
    }

    @Override
    public void put(URI uri, Map<String, List<String>> responseHeaders)
            throws IOException {
        List<String> newCookies = responseHeaders.get("Set-Cookie");
        if (newCookies != null) {
            String host = uri.getHost();
            synchronized (cookies) {
                List<String> store = cookies.get(host);
                if (store == null) {
                    store = new ArrayList<String>();
                    cookies.put(host, store);
                }
                store.addAll(newCookies);
            }
        }
    }

}
McDowell