views:

202

answers:

3

Hi,

I have approx. 30000 files (1MB each) which I want to put into a native method, which requires just an byte array and the size of it as arguments.

I looked through some examples and benchmarks (like http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly) but all of them do some other fancy things.

Basically I don't care about the contents of the file, I don't want to access something in that file or the byte array or do anything else with it. I just want to put a file into a native method which accepts an byte array as fast as possible.

At the moment I'm using RandomAccessFile, but that's horribly slow (10MB/s).

Is there anything like

byte[] readTheWholeFile(File file){ ... }

which I could put into

native void fancyCMethod(readTheWholeFile(myFile), myFile.length())

What would you suggest?

+1  A: 

I'm not entirely sure this is what you're asking but it sounds like you want to efficiently pass the contents of a file as a byte array to a native method.

If that's the case, I suggest you read the file contents in Java using a BufferedInputStream and store them in a ByteBuffer that was allocated via ByteBuffer#allocateDirect, this way it can be passed to the JNI side and accessed in whole. Now, in the native method, you can call GetDirectByteBufferAddress to access the buffer directly.

maerics
A: 

Here is a sample of readFileFully that you could implement

   public static byte[] readFileFully(String aFileName) throws IOException
   {
      byte[] retData = null;

      File inputFile = new File(aFileName);
      if (inputFile == null || !inputFile.exists() || !inputFile.canRead())
      {
         throw new IOException("INVALID FILE : " + aFileName);
      }

      // Read in the file data
      BufferedInputStream iStream = null;
      try
      {
         iStream = new BufferedInputStream(new FileInputStream(inputFile));
         int size = (int)inputFile.length();
         retData = new byte[size];
         int bytes_read = 0;

         // read stuff in here
         while (bytes_read < size)
         {
            bytes_read += iStream.read(retData,bytes_read,size - bytes_read);
         }
      }
      finally
      {
         if (iStream != null)
         {
            try
            {
               iStream.close();
            }
            catch(IOException e)
            {
            }
         }
         inputFile = null;
      }
      return retData;
   }
Romain Hippeau
+1  A: 

Using regular arrays may be inefficient, as the VM may copy the array when passing it to native code, and may also use intermediate memory during I/O.

For the fastest IO, use ByteBuffer.allocateDirect to allocate a byte buffer. The underlying array is "special" in that it is not part of the regular JVM heap. Native code and I/O can access the array directly.

To read data into the buffer use,

ByteBuffer byteBuffer = ByteBuffer.allocateDirect(randomAccessFile.length());
RandomAccessFile.getChannel().read(byteBuffer, 0);

To get the backing array to pass to JNI use

byte[] byteArray = byteBuffer.array();

You can then pass this array and the file length to JNI.

The direct buffers are realtively heavy to create, As all your files are 1MB (or thereabouts) you should be able to reuse the same buffer on multiple files.

Hope this helps!

mdma
Thanks for your answer mdma!I'm just wondering, how can I be sure that array() will work?The Javadoc says "Invoke the hasArray method before invoking this method in order to ensure that this buffer has an accessible backing array."And allocateDirect() tells me "Whether or not it has a backing array is unspecified."I wonder if that wil work?
soc
That's the rub with some of the more platform-dependent features, it's VM dependent. You could catch the Excption thrown by array() and obtain the array using ByteBuffer.get(byte[]) as a fallback. If you really direct access for all VMs, you could code a small JNI stub method that takes the direct ByteBuffer instance, and calls GetDirectByteBufferAddress, which you then forward to your original JNI method.If the ByteBuffer has to copy the data once to a new array, it's going to be quick - these are optimized methods, and much quicker than reading a file piecemeal into a byte[].
mdma
One other point that may help your performance - use multi-threading. Even though your app is going to be I/O bound, the I/O will block waiting for data (e.g. non-contiguous files.) Having several threads reading different files simultaneously will give you a speedup, especially using async I/O. The ForkJoin framework (JSR 166) is really useful for this kind of work, and very easy to use:Refactor the file operation as a Task. Create a Task for each file you want to process, and put them all in the task queue. The task queue then runs these tasks at the level of parallelism you specify.
mdma