views:

69

answers:

2

I want to download the html source code of a site to parse some info. How do I accomplish this in Java?

A: 

You can use the Java classes directly:

URL url = new URL("http://www.example.com");
URLConnection conn = url.openConnection();
InputStream in = conn.getInputStream();
...

but it's more recommended to use Apache HttpClient as HttpClient will handle a lot of things that you'll have to do yourself with the Java native classes.

cletus
+2  A: 

Just attach a BufferedReader (or anything that reads strings) from a URL's InputStream returned from openStream().

public static void main(String[] args)
        throws IOException
{
    URL url = new URL("http://stackoverflow.com/");
    BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()));

    String s = null;
    while ((s = reader.readLine()) != null)
        System.out.println(s);
}
Travis Gockel