views:

2768

answers:

5

Without the use of any external library, what is the simplest way to fetch a website's HTML content into a String?

+4  A: 

I'm currently using this:

String content = null;
URLConnection connection = null;
try {
  connection =  new URL("http://www.google.com").openConnection();
  Scanner scanner = new Scanner(connection.getInputStream());
  scanner.useDelimiter("\\Z");
  content = scanner.next();
}catch ( Exception ex ) {
    ex.printStackTrace();
}
System.out.println(content);

But not sure if there's a better way.

pek
+2  A: 

I just left this post in your other thread, though what you have above might work as well. I don't think either would be any easier than the other. The Apache packages can be accessed by just using import org.apache.commons.HttpClient at the top of your code.

Edit: Forgot the link ;)

Justin Bennett
Apparently you also have to install the JAR file :)
Seun Osewa
A: 

@Justin Sorry, I was refreshing and then thought that, since the question was answered, you wouldn't come back again. ;) Looking at your link right now.

pek
+2  A: 

This has worked well for me:

URL url = new URL(theURL);
InputStream is = url.openStream();
int ptr = 0;
StringBuffer buffer = new StringBuffer();
while ((ptr = is.read()) != -1) {
    buffer.append((char)ptr);
}

Not sure at to whether the other solution(s) provided are any more efficient or not.

Scott Bennett-McLeish
Don't you need to include the following?import java.io.*import java.net.*
Seun Osewa
Sure, but they're core java so very simple. As for the actual code, the import statements are omitted for clarity.
Scott Bennett-McLeish
A: 

@Scott Yes, I also wonder whether using Scanner or StringBuffer has an impact on efficiency/performance.

If anybody knows, please comment on it.

Thank you.

pek