views:

114

answers:

3

I have tried other methods to download info from a URL, but needed a faster one. I need to download and parse about 250 separate pages, and would like the app to not appear ridiculously slow. This is the code I am currently using to retrieve a single page, any insight would be great.

try 
{
    URL myURL = new URL("http://www.google.com");
    URLConnection ucon = myURL.openConnection();
    InputStream inputStream = ucon.getInputStream();
    BufferedInputStream bufferedInputStream = new BufferedInputStream(inputStream);
    ByteArrayBuffer byteArrayBuffer = new ByteArrayBuffer(50);
    int current = 0;
    while ((current = bufferedInputStream.read()) != -1) {
        byteArrayBuffer.append((byte) current);
    }
    tempString = new String(byteArrayBuffer.toByteArray());

} 
catch (Exception e) 
{
    Log.i("Error",e.toString());
}
+1  A: 

Try to keep the connection open if the requests are to the same server. Also, try to avoid reallocations in the buffer, and read as much as possible in one go.


const int APPROX_MAX_PAGE_SIZE = 300;
try 
{
    URL myURL = new URL("http://www.google.com");
    URLConnection ucon = myURL.openConnection();
    ucon.setRequestHeader("Connection", "keep-alive") // (1)
    InputStream inputStream = ucon.getInputStream();
    BufferedInputStream bufferedInputStream = new BufferedInputStream(inputStream);
    ByteArrayBuffer byteArrayBuffer = new ByteArrayBuffer(APPROX_MAX_PAGE_SIZE); // (2)
    int current = 0;
    byte[] buf = new byte[APPROX_MAX_PAGE_SIZE];
    int read;
    do {
       read = bufferedInputStream.read(buf, 0, buf.length); // (3)
       if(read > 0) byteArrayBuffer.append(buf, 0, read);
    } while (read >= 0);
    tempString = new String(byteArrayBuffer.toByteArray());

} 
catch (Exception e) 
{
    Log.i("Error",e.toString());
}


  1. Set Keep-alive header (not sure if you need this, on J2SE it is a configurable property as well)
  2. Allocate what is "usually enough" in the buffer to avoid reallocation.
  3. Read more than one byte at once

Disclaimer: This was written "in the blind" without access to a Java compiler. It may be that setRequestHeader is only available on a HttpURLConnection (cast needed), or that some parameters are wrong, but please feel free to edit if so.

Krumelur
They are all to same server, any code suggestions?
cphil5
Something like ucon.setRequestHeader("Connection", "keep-alive")Connection reuse is handled internally. Also, try to read the buffer in chunks, not byte by byte.
Krumelur
The code above worked with just slight modification, am going to bench mark it against old routine and let you know. BTW I parse one page about 75K to get the 250 URI to parse individual pages.
cphil5
After 10 test runs both sets of code this 35% faster than my original. Any other suggestions would be great.
cphil5
+1  A: 

Why don't you use the built in apache http components?

HttpClient httpClient = new DefaultHttpClient();
HttpGet request = new  HttpGet(uri);
HttpResponse response = httpClient.execute(request);

int status = response.getStatusLine().getStatusCode();

if (status != HttpStatus.SC_OK) {
    ByteArrayOutputStream ostream = new ByteArrayOutputStream();
    response.getEntity().writeTo(ostream);
} 
Schildmeijer
IMHO, URLConnection is more high-level than interfacing with the HTTP protocol directly.
Krumelur
This errors on HttpClient.execute(request); "Cannot make a static reference to the non-static method execute(HttpUriRequest) from the type HttpClient"
cphil5
The execute method in org.apache.http.client.HttpClient is not static. I updated the example above to include the creation of the HttpClient.
Schildmeijer
Thank you for code correction, I was able to use it but Krumelur's above was about 1/3 faster on average. I don't know if this is due to allocating the bytearray to full size or not I will try a few changes and see how it goes. I do like how clean this is though, much more readable and easier to understand.
cphil5
Tried pre-allocating ostream to full uri size but have had no success with speed improvement, though this is faster than my original version. Any other idea would be great.
cphil5
A: 

Use a pooled HTTPClient and try to make 2 or 3 requests at once. And try to create a memory pool to avoid allocations and the GC stalls.

Moss