views:

5260

answers:

1

This subject is pretty often asked here and the Sun tutorial is too concise about the subject. So I thought, let's post a CW question and answer about this so that it can if necessary be referenced in other topics.

Others are of course free to add more hints and best practices here.

+21  A: 

First a disclaimer beforehand: the posted code snippets are all basic examples. You'll need to handle trivial IOExceptions and RuntimeExceptions like NullPointerException, ArrayIndexOutOfBoundsException and consorts yourself.


Preparing:

We first need to know at least the URL and the charset. The parameters are optional and depends on the functional requirements.

String url = "http://example.com";
String charset = "UTF-8";
String param1 = "value1";
String param2 = "value2";
// ...
String query = String.format("param1=%s&param2=%s", 
     URLEncoder.encode(param1, charset), 
     URLEncoder.encode(param2, charset));

The query parameters must be in name=value format and be concatenated by &. You would normally also URL-encode the query parameters with the specified charset using URLEncoder#encode().

The String#format() is just for convenience. I prefer it when I would need the String concatenation operator + more than twice.


Firing a HTTP GET request with (optionally) query parameters:

It's a trivial task. It's the default request method.

URLConnection connection = new URL(url + "?" + query).openConnection();
connection.setRequestProperty("Accept-Charset", charset);
InputStream response = connection.getInputStream();
// ...

Any query string should be concatenated to the URL using ?. The Accept-Charset header may hint the server what encoding the parameters are in. If you don't send any query string, then you can leave the Accept-Charset header away.

If the other side is a HttpServlet, then its doGet() method will be called and the parameters will be available by HttpServletRequest#getParameter().


Firing a HTTP POST request with query parameters:

Setting the URLConnection#setDoOutput() to true implicitly sets the request method to POST. The standard HTTP POST as web froms do is of type application/x-www-form-urlencoded wherein the query string is written to the request body.

URLConnection connection = new URL(url).openConnection();
connection.setDoOutput(true); // Triggers POST.
connection.setRequestProperty("Accept-Charset", charset);
connection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded;charset=" + charset);
OutputStream output = null;
try {
     output = connection.getOutputStream();
     output.write(query.getBytes(charset));
} finally {
     if (output != null) try { output.close(); } catch (IOException logOrIgnore) {}
}
InputStream response = connection.getInputStream();
// ...

Note: whenever you'd like to submit a HTML form programmatically, don't forget to take the name=value pairs of any <input type="hidden"> elements into the query string and of course also the name=value pair of the <input type="submit"> element which you'd like to "press" programmatically (because that's usually been used in the server side to distinguish if a button was pressed and if so, which one).

If the other side is a HttpServlet, then its doPost() method will be called and the parameters will be available by HttpServletRequest#getParameter().


Actually firing the HTTP request:

You can fire the HTTP request explicitly with URLConnection#connect(), but the the request will automatically be fired on demand when you want to get any information about the HTTP response, such as the response body using URLConnection#getInputStream() and so on. The above examples does exactly that, so the connect() call is in fact superfluous.


Gathering HTTP response information:

1) HTTP response status:

int status = ((HttpURLConnection) connection).getResponseCode();

Yes, you need to cast back to HttpURLConnection here.

2) HTTP response headers:

for (Entry<String, List<String>> header : connection.getHeaderFields().entrySet()) {
    System.out.println(header.getKey() + "=" + header.getValue());
}

3) HTTP response encoding:

When the Content-Type contains a charset parameter, then the response body is likely text based and we'd like to process the response body with the server-side specified character encoding then.

String contentType = connection.getHeaderField("Content-Type");
String charset = null;
for (String param : contentType.replace(" ", "").split(";")) {
    if (param.startsWith("charset=")) {
        charset = param.split("=", 2)[1];
        break;
    }
}

if (charset != null) {
    BufferedReader reader = null;
    try {
        reader = new BufferedReader(new InputStreamReader(response, charset));
        for (String line; (line = reader.readLine()) != null;) {
            // ... System.out.println(line) ?
        }
    } finally {
        if (reader != null) try { reader.close(); } catch (IOException logOrIgnore) {}
    }
} else {
    // It's likely binary content, use InputStream/OutputStream.
}

Maintaining the session:

The server side session is usually backed by a cookie. Some web forms require that you're logged in and/or are tracked by a session. You need to grab all Set-Cookie headers from the response of the login or the first GET request and then pass this through the subsequent requests.

// First a GET to gather cookies (or submit a login form using POST).
URLConnection connection = new URL(url).openConnection();
List<String> cookies = connection.getHeaderFields().get("Set-Cookie");

// Then a subsequent POST with cookies.
connection = new URL(url).openConnection();
connection.setDoOutput(true);
for (String cookie : cookies) {
    connection.addRequestProperty("Cookie", cookie.split(";", 2)[0]);
}
// ...

The split(";", 2)[0] is there to get rid of cookie attributes which are irrelevant for the server side like expires, path, etc.


Streaming mode

The HttpURLConnection will by default buffer the entire request body before actually sending it, regardless of whether you've set a fixed content length yourself using connection.setRequestProperty("Content-Length", contentLength);. This may cause OutOfMemoryExceptions whenever you concurrently send large POST requests (e.g. uploading files). To avoid this, you would like to set the HttpURLConnection#setFixedLengthStreamingMode().

((HttpURLConnection) connection).setFixedLengthStreamingMode(contentLength);

But if the content length is really not known beforehand, then you can make use of chunked streaming mode by setting the HttpURLConnection#setChunkedStreamingMode() accordingly. This will set the HTTP Transfer-Encoding header to chunked which will force the request body being sent in chunks. The below example will send the body in chunks of 1KB.

((HttpURLConnection) connection).setChunkedStreamingMode(1024);

User-Agent

It can happen that a request returns an unexpected response, while it works fine with a real webbrowser. The chance is big that the server side blocks requests based on the User-Agent request header. The URLConnection will by default set it to Java/1.6.0_19 where the last part is obviously the JRE version. You can override this as follows:

connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401"); // Do as if you're using Firefox 3.6.3.

Error handling

If the HTTP response code is 4nn (Client Error) or 5nn (Server Error), then you may want to read the HttpURLConnection#getErrorStream() to see if the server has sent any useful error information.

InputStream error = ((HttpURLConnection) connection).getErrorStream();

If the HTTP response code is -1, then something went wrong with connection and response handling. The HttpURLConnection implementation is somewhat buggy with keeping connections alive. You may want to turn if off by setting the http.keepAlive system property to false. You can do this programmatically in the beginning of your application by:

System.setProperty("http.keepAlive", "false");

Uploading files.

You'd normally use multipart/form-data encoding for mixed POST content (binary and character data). The encoding is in more detail described in RFC2388.

String param = "value";
File textFile = new File("/path/to/file.txt");
File binaryFile = new File("/path/to/file.bin");
String boundary = Long.toHexString(System.currentTimeMillis()); // Just generate some unique random value.

URLConnection connection = new URL(url).openConnection();
connection.setDoOutput(true);
connection.setRequestProperty("Content-Type", "multipart/form-data; boundary=" + boundary);
PrintWriter writer = null;
try {
    OutputStream output = connection.getOutputStream();
    writer = new PrintWriter(new OutputStreamWriter(output, charset), true); // true = autoFlush, important!

    // Send normal param.
    writer.println("--" + boundary);
    writer.println("Content-Disposition: form-data; name=\"param\"");
    writer.println("Content-Type: text/plain; charset=" + charset);
    writer.println();
    writer.println(param);

    // Send text file.
    writer.println("--" + boundary);
    writer.println("Content-Disposition: form-data; name=\"textFile\"; filename=\"" + textFile.getName() + "\"");
    writer.println("Content-Type: text/plain; charset=" + charset);
    writer.println();
    BufferedReader reader = null;
    try {
        reader = new BufferedReader(new InputStreamReader(new FileInputStream(textFile), charset));
        for (String line; (line = reader.readLine()) != null;) {
            writer.println(line);
        }
    } finally {
        if (reader != null) try { reader.close(); } catch (IOException logOrIgnore) {}
    }

    // Send binary file.
    writer.println("--" + boundary);
    writer.println("Content-Disposition: form-data; name=\"binaryFile\"; filename=\"" + binaryFile.getName() + "\"");
    writer.println("Content-Type: " + URLConnection.guessContentTypeFromName(binaryFile.getName());
    writer.println("Content-Transfer-Encoding: binary");
    writer.println();
    InputStream input = null;
    try {
        input = new FileInputStream(binaryFile);
        byte[] buffer = new byte[1024];
        for (int length = 0; (length = input.read(buffer)) > 0;) {
            output.write(buffer, 0, length);
        }
        output.flush(); // Important! Output cannot be closed. Close of writer will close output as well.
    } finally {
        if (input != null) try { input.close(); } catch (IOException logOrIgnore) {}
    }
    writer.println(); // Important! Indicates end of binary boundary.

    // End of multipart/form-data.
    writer.println("--" + boundary + "--");
} finally {
    if (writer != null) writer.close();
}

If the other side is a HttpServlet, then its doPost() method will be called and the parameters will be available by HttpServletRequest#getParts() (note, thus not getParameter() and so on!). The getParts() method is however relatively new, it's introduced in Servlet 3.0 (Glassfish 3, Tomcat 7, etc). Prior to Servlet 3.0, your best choice is using Apache Commons FileUpload to parse a multipart/form-data request. Also see this answer for examples of both the FileUpload and the Servelt 3.0 approaches.


Last words:

The Apache HttpComponents HttpClient is much more convenient in this all :)


Parsing and extracting HTML:

If all you want is parsing and extracting data from HTML, then better use a HTML parser like Jsoup

BalusC
You should place the apache link first, so people looking for a solution find it faster ;)
ZeissS
Awesome examples. Thank you!
Geo
very well explained.thank you
daedlus