With the native Java API you can read from an URL using java.net.URLConnection. Here's a basic example:
URL url = new URL("http://www.stackoverflow.com");
URLConnection urlConnection = url.openConnection();
InputStream result = urlConnection.getInputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(result));
String line = null;
while ((line = reader.readLine()) != null) {
System.out.println(line);
}
reader.close();
You could feed the InputStream
to any DOM/SAX parser of your taste. The average parser can take (in)directly an InputStream
as argument. JTidy is one of the better HTML parsers.
For convenience, here's a shorthand to get an InputStream
from URL directly:
InputStream result = new URL("http://www.stackoverflow.com").openStream();