I am using Delphi 2009. I have a very simple data structure, with 2 fields:
- A string that is the key field I need to retrieve by and is usually 4 to 15 characters in length.
- A string that is the data field that may be any size, from 1 character up to say 10,000 characters.
The difficulty is that I may have several million of these records, so they may total as much or more than 10 GB in size. Obviously, I'm looking for an on-disk solution rather than an in-memory solution.
My program needs to randomly retrieve these records, based on the key field. That's that part that needs to be made as efficient as possible.
Should I use a database for such a simple structure, and if so, which database would be best to handle this and be simplest to implement?
Alternatively, is there a simple on-disk data structure not requiring a full-blown database that would work just as well?
Well, all I needed was the one answer to kick me back into reality. I was looking for something simpler than even a simple database. But when the no-duh answer is to use a database, then I realize I've already answered this question with my own answer to another question: Best database for small applications and tools.
My answer was DISQLite3 for the reasons I specified there. And that's what I'll probably with for my implementation.
A few more good answers with some possibilities. That's great. I'll be able to try a few different methods to see what works best.
More contemplation, and I have had to change the accepted answer to the GpStructuredStorage solution.
In my case, a million records totalling several Gigabytes will put a strain on a database structure. Specifically, the B* tree that is used to store the index in most databases is fast, but will slow down for some operations such as reindexing a million values.
The only thing you'll find faster than B* for an index is a hash table. And that is precisely what is provided in gabr's suggested addition to the GpStructuredStorage solution. I think it's quite elegant the way he segmented the hash value to give a 4 level directory structure.
The key reason why I can go to a hash solution is that I only need random access by the keys. I do not need to sort by the keys. If sorting was needed, then the gains of the speed of the hash table would be lost and the database system would be a no-brain winner.
When I get down to implementing this, I should do a comparison of this technique versus a database. Maybe I'll compare with both Firebird and SQLite which would both be worthy opponents.