views:

65

answers:

4

Writing some additional classes for an existing GWT project. I need to:

  • Request a URL
  • Read in the webpage returned, in order to perform operations on.

The returned page is in very simple HTML, therefore parsing it shouldn't be very difficult, I just need to get the data first.

How do I do this in Java? What packages am I best looking at?

+1  A: 

For HTML pages you should use HttpClient.

For Web services, you need a framework like CXF.

kgiannakakis
A: 

HttpClient, although very good, is considered obsolete. HttpComponents is an alternative.

Bozho
+3  A: 

With the native Java API you can read from an URL using java.net.URLConnection. Here's a basic example:

URL url = new URL("http://www.stackoverflow.com");
URLConnection urlConnection = url.openConnection();
InputStream result = urlConnection.getInputStream();

BufferedReader reader = new BufferedReader(new InputStreamReader(result));
String line = null;
while ((line = reader.readLine()) != null) {
    System.out.println(line);
}
reader.close();

You could feed the InputStream to any DOM/SAX parser of your taste. The average parser can take (in)directly an InputStream as argument. JTidy is one of the better HTML parsers.

For convenience, here's a shorthand to get an InputStream from URL directly:

InputStream result = new URL("http://www.stackoverflow.com").openStream();
BalusC
A: 

If you want to do something like this on the client, take a look at the HTTP types of GWT. But be aware that you are subject to the same-origin policy then.

wilth