views:

939

answers:

5

I'm trying to perform a once-through read of a large file (~4GB) using Java 5.0 x64 (on Windows XP).

Initially the file read rate is very fast, but gradually the throughput slows down substantially, and my machine seems very unresponsive as time goes on.

I've used ProcessExplorer to monitor the File I/O statistics, and it looks like the process initially reads 500MB/sec, but this rate gradually drops to around 20MB/sec.

Any ideas on the the best way to maintain File I/O rates, especially with reading large files using Java?

Here's some test code that shows the "interval time" continuing to increase. Just pass Main a file that's at least 500MB.

import java.io.File;
import java.io.RandomAccessFile;

public class MultiFileReader {

public static void main(String[] args) throws Exception {
 MultiFileReader mfr = new MultiFileReader();
 mfr.go(new File(args[0]));
}

public void go(final File file) throws Exception {
 RandomAccessFile raf = new RandomAccessFile(file, "r");
 long fileLength = raf.length();
 System.out.println("fileLen: " + fileLength);
 raf.close();

 long startTime = System.currentTimeMillis();
 doChunk(0, file, 0, fileLength);
 System.out.println((System.currentTimeMillis() - startTime) + " ms");
}

public void doChunk(int threadNum, File file, long start, long end) throws Exception {
 System.out.println("Starting partition " + start + " to " + end);
 RandomAccessFile raf = new RandomAccessFile(file, "r");
 raf.seek(start);

 long cur = start;
 byte buf[] = new byte[1000];
 int lastPercentPrinted = 0;
 long intervalStartTime = System.currentTimeMillis();
 while (true) {
  int numRead = raf.read(buf);
  if (numRead == -1) {
   break;
  }
  cur += numRead;
  if (cur >= end) {
   break;
  }

  int percentDone = (int)(100.0 * (cur - start) / (end - start));
  if (percentDone % 5 == 0) {
   if (lastPercentPrinted != percentDone) {
    lastPercentPrinted = percentDone;
    System.out.println("Thread" + threadNum + " Percent done: " + percentDone + " Interval time: " + (System.currentTimeMillis() - intervalStartTime));
    intervalStartTime = System.currentTimeMillis();
   }
  }
 }
 raf.close();
}
}

Thanks!

+5  A: 

I very much doubt that you're really getting 500MB per second from your disk. Chances are the data is cached by the operating system - and that the 20MB per second is what happens when it really hits the disk.

This will quite possibly be visible in the disk section of the Vista Resource Manager - and a low-tech way to tell is to listen to the disk drive :)

Jon Skeet
This is the correct answer (as usual).
StaxMan
A: 

You could use JConsole to monitor your app, including memory usage. The 500 MB/sec sounds to good to be true.

Some more information about the implementation and VM arguments used would be helpful.

stili
I agree - 500MB/sec sounds too good to be true, but it also seems 20MB/sec is too slow!I'm not running with any special JVM args - just the default ones for Java 5.0.I'll try to through together a simple code sample below.
+1  A: 

Depending on your specific hardware and what else is going on, you might need to work reasonably hard to do much more than 20MB/sec.

I think perhaps you don't really how completely off-the-scale the 500MB/sec is...

What are you hoping for, and have you checked that your specific drive is even theoretically capable of it?

Will Dean
A: 

Check static void read3() throws IOException {

        // read from the file with buffering
        // and with direct access to the buffer

        MyTimer mt = new MyTimer();
        FileInputStream fis = 
                     new FileInputStream(TESTFILE);
        cnt3 = 0;
        final int BUFSIZE = 1024;
        byte buf[] = new byte[BUFSIZE];
        int len;
        while ((len = fis.read(buf)) != -1) {
            for (int i = 0; i < len; i++) {
                if (buf[i] == 'A') {
                    cnt3++;
                }
            }
        }
        fis.close();
        System.out.println("read3 time = " 
                                + mt.getElapsed());
    }

from http://java.sun.com/developer/JDCTechTips/2002/tt0305.html

The best buffer size might depend on the operating system. Yours is maybe to0 small.

kohlerm
+1  A: 

The Java Garbage Collector could be a bottleneck here.

I would make the buffer larger and private to the class so it is reused instead of allocated by each call to doChunk().

public class MultiFileReader {

   private byte buf[] = new byte[256*1024];

   ...

}
Ville Krumlinde
Indeed. Correct buffer size can speed up things dramatically.
Dev er dev