views:

666

answers:

3

I'm trying to download a file over HTTP and store its contents in a String, as the title says. My approach is thus:

URL u = new URL("http://url/file.txt");

ByteArrayBuffer baf = new ByteArrayBuffer(32);
InputStream in = (InputStream) u.getContent(); 
BufferedInputStream bis = new BufferedInputStream(in);

int buffer;
while((buffer = bis.read()) != -1){
    baf.append((byte)buffer);
}

bis.close();
in.close();

The code fails when it tries to read from the stream, reporting the stream is closed.

Now if you try to access the file through a browser, it won't be served as text, rather as a file to be downloaded.

I haven't gotten anywhere searching the web on this, so a little insight would be much appreciated!

Thanks.

+2  A: 

Check out HttpClient from Apache Commons, in particular the getResponseBodyAsString() method.

Hank Gay
i actually used reponse.getEntity().getContent() and it works like a charm
alkar
A: 

Here's a piece of code that does that for you. In addition to what you're attempting to do, it is also able to handle GZip compression (if you set it in the headers with Accept-Encoding: gzip, deflate) and automatically detects encoding for you (required for handling strings).

private InputStream prepareInputStream(String urlToRetrieve) throws IOException
{
 URL url = new URL(urlToRetrieve);
 URLConnection uc = url.openConnection();
 if (timeOut > 0)
 {
  uc.setConnectTimeout(timeOut);
  uc.setReadTimeout(timeOut);
 }
 InputStream is = uc.getInputStream();
 // deflate, if necesarily
 if ("gzip".equals(uc.getContentEncoding()))
  is = new GZIPInputStream(is);

 this.lastURLConnection = uc;
 return is;
}
// detects encoding associated to the current URL connection, taking into account the default encoding
public String detectEncoding()
{
 if (forceDefaultEncoding)
  return defaultEncoding;
 String detectedEncoding = detectEncodingFromContentTypeHTTPHeader(lastURLConnection.getContentType());
 if (detectedEncoding == null)
  return defaultEncoding;

 return detectedEncoding;
}


public static String detectEncodingFromContentTypeHTTPHeader(String contentType)
{
 if (contentType != null)
 {
  int chsIndex = contentType.indexOf("charset=");
  if (chsIndex != -1)
  {
   String enc = StringTools.substringAfter(contentType , "charset=");
   if(enc.indexOf(';') != -1)
    enc = StringTools.substringBefore(enc , ";");
   return enc.trim();
  }
 }
 return null;
}


// retrieves into an String object
public String retrieve(String urlToRetrieve)
throws MalformedURLException , IOException
{
 InputStream is = prepareInputStream(urlToRetrieve);
 String encoding = detectEncoding();
 BufferedReader in = new BufferedReader(new InputStreamReader(is , encoding));
 StringBuilder output = new StringBuilder(BUFFER_LEN_STRING);
 String str;
 boolean first = true;
 while ((str = in.readLine()) != null)
 {
  if (!first)
   output.append("\n");
  first = false;
  output.append(str);
 }
 in.close();
 return output.toString();
}

The code is from info.olteanu.utils.retrieve.RetrievePage, Phramer project.

Marian
+2  A: 

Try this code, it might not compile since i've not tested it but it should work beside that all possible Exceptions are not caught, but you can add this easily. Note the timeouts, NEVER use infinite timeouts since your program will hang sometime in the future if the ressource is not available. If you're doing more than a simple text file retrievement you could have a look into HTTPClient of the Apache Commons.

 URL url = new URL("http://mydomain.com/file.txt");
 URLConnection urlConnection = url.openConnection();
 urlConnection.setConnectTimeout(1000);
 urlConnection.setReadTimeout(1000);
 BufferedReader breader = new BufferedReader(new InputStreamReader(urlConnection.getInputStream()));

 StringBuilder stringBuilder = new StringBuilder();

 String line;
 while((line = breader.readLine()) != null) {
  stringBuilder.append(line);
 }

 System.out.println(stringBuilder.toString());
Malax
Oh, there is no handling for Charsets. But this code should give you a starting point.
Malax
I have already tried this, it reads a null string. Let me check HTTPClient.
alkar