tags:

views:

117

answers:

1

I have written a parser class for a particular binary format (nfdump if anyone is interested) which uses java.nio's MappedByteBuffer to read through files of a few GB each. The binary format is just a series of headers and mostly fixed-size binary records, which are fed out to the called by calling nextRecord(), which pushes on the state machine, returning null when it's done. It performs well. It works on a development machine.

On my production host, it can run for a few minutes or hours, but always seems to throw "java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code", fingering one of the Map.getInt, getShort methods, i.e. a read operation in the map.

The uncontroversial (?) code that sets up the map is this:

    /** Set up the map from the given filename and position */
    protected void open() throws IOException {
            // Set up buffer, is this all the flexibility we'll need?
            channel = new FileInputStream(file).getChannel();    
            MappedByteBuffer map1 = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
            map1.load(); // we want the whole thing, plus seems to reduce frequency of crashes?
            map = map1;
            // assumes the host  writing the files is little-endian (x86), ought to be configurable
            map.order(java.nio.ByteOrder.LITTLE_ENDIAN);
            map.position(position);
    }

and then I use the various map.get* methods to read shorts, ints, longs and other sequences of bytes, before hitting the end of the file and closing the map.

I've never seen the exception thrown on my development host. But the significant point of difference between my production host and development is that on the former, I am reading sequences of these files over NFS (probably 6-8TB eventually, still growing). On my dev machine, I have a smaller selection of these files locally (60GB), but when it blows up on the production host it's usually well before it gets to 60GB of data.

Both machines are running java 1.6.0_20-b02, though the production host is running Debian/lenny, the dev host is Ubuntu/karmic. I'm not convinced that will make any difference. Both machines have 16GB RAM, and are running with the same java heap settings.

I take the view that if there is a bug in my code, there is enough of a bug in the JVM not to throw me a proper exception! But I think it is just a particular JVM implementation bug due to interactions between NFS and mmap, possibly a recurrence of 6244515 which is officially fixed.

I already tried adding in a "load" call to force the MappedByteBuffer to load its contents into RAM - this seemed to delay the error in the one test run I've done, but not prevent it. Or it could be coincidence that was the longest it had gone before crashing!

If you've read this far and have done this kind of thing with java.nio before, what would your instinct be? Right now mine is to rewrite it without nio :)

+3  A: 

I would rewrite it without using mapped NIO. If you're dealing with more than one file there is a problem that the mapped memory is never released so you will run out of virtual memory: NB this isn't necessarily just an OutOfMemoryError which interacts with the garbage collector, it would be a failure to allocate the new mapped buffer. I would use a FileChannel.

Having said that, large-scale operations on NFS files are always extremely problematic. You would be much better off redesigning the system so that each file is read by its local CPU. You will also get immense speed improvements this way, far more than the 20% you will lose by not using mapped buffers.

EJP
I did think of missing virtual address space, but like you said that ought to manifest itself in a mapping failure (plus I'm only reading one file at a time, and on a 64-bit system).I will probably rearrange the servers so that the files live on the same server as the java process, and avoid whatever NFS issue this is. In the short term I will just read it all into a ByteBuffer, but because multiple threads are reading the same files, often at the same time, it is reimplementing stuff that mmap *ought* to be an elegant solution to!
Matthew Bloch
Yes, I was hoping for an answer that let me keep mmap, I just needed a push for someone else to say "it ain't gonna work" :)The open() code now just reads the whole lot into a allocated ByteBuffer. While my instinct was to worry about memory wastage (as several readers = several copies on the heap), I haven't seen a performance drop compared to previous runs, so can't really complain.I've left the old code commented in the hope I can restore the "elegant" mmap, but assuming my nfdump files stay the same size I probably won't need it again.
Matthew Bloch
'several readers = several copies on the heap': only if you make those several copies. Can't you organize some kind of singleton access?
EJP