Alright. So I have a very large amount of binary data (let's say, 10GB) distributed over a bunch of files (let's say, 5000) of varying lengths.
I am writing a Java application to process this data, and I wish to institute a good design for the data access. Typically what will happen is such:
- One way or another, all the data will be read during the course of processing.
- Each file is (typically) read sequentially, requiring only a few kilobytes at a time. However, it is often necessary to have, say, the first few kilobytes of each file simultaneously, or the middle few kilobytes of each file simultaneously, etc.
- There are times when the application will want random access to a byte or two here and there.
Currently I am using the RandomAccessFile class to read into byte buffers (and ByteBuffers). My ultimate goal is to encapsulate the data access into some class such that it is fast and I never have to worry about it again. The basic functionality is that I will be asking it to read frames of data from specified files, and I wish to minimize the I/O operations given the considerations above.
Examples for typical access:
- Give me the first 10 kilobytes of all my files!
- Give me byte 0 through 999 of file F, then give me byte 1 through 1000, then give me 2 through 1001, etc, etc, ...
- Give me a megabyte of data from file F starting at such and such byte!
Any suggestions for a good design?