I have the following Java code to fetch the entire contents of an HTML page at a given URL. Can this be done in a more efficient way? Any improvements are welcome.
public static String getHTML(final String url) throws IOException {
if (url == null || url.length() == 0) {
throw new IllegalArgumentException("url cannot be null or empty");
}
final HttpURLConnection conn = (HttpURLConnection) new URL(url).openConnection();
final BufferedReader buf = new BufferedReader(new InputStreamReader(conn.getInputStream()));
final StringBuilder page = new StringBuilder();
final String lineEnd = System.getProperty("line.separator");
String line;
try {
while (true) {
line = buf.readLine();
if (line == null) {
break;
}
page.append(line).append(lineEnd);
}
} finally {
buf.close();
}
return page.toString();
}
I can't help but feel that the line reading is less than optimal. I know that I'm possibly masking a MalformedURLException
caused by the openConnection
call, and I'm okay with that.
My function also has the side-effect of making the HTML String have the correct line terminators for the current system. This isn't a requirement.
I realize that network IO will probably dwarf the time it takes to read in the HTML, but I'd still like to know this is optimal.
On a side note: It would be awesome if StringBuilder
had a constructor for an open InputStream
that would simply take all the contents of the InputStream
and read it into the StringBuilder
.