I need to store up to tens or even hundreds of millions of pieces of data on-disk. Each piece of data contains information like:
id=23425
browser=firefox
ip-address=10.1.1.1
outcome=1.0
New pieces of data may be added at the rate of up-to 1 per millisecond.
So its a relatively simple set of key-value pairs, where the values can be strings, integers, or floats. Occasionally I may need to update the piece of data with a particular id, changing the flag field from 0 to 1. In other words, I need to be able to do random key lookups by id, and modify the data (actually only the floating point "outcome" field - so I'll never need to modify the size of the value).
The other requirement is that I need to be able to stream this data off disk (the order isn't particularly important) efficiently. This means that the hard disk head should not need to jump around the disk to read the data, rather it should be read in consecutive disk blocks.
I'm writing this in Java.
I've thought about using an embedded database, but DB4O is not an option as it is GPL and the rest of my code is not. I also worry about the efficiency of using an embedded SQL database, given the overhead of translating to and from SQL queries.
Does anyone have any ideas? Might I have to build a custom solution to this (where I'm dealing directly with ByteBuffers, and handling the id lookup)?