Generally, in-memory data structures such as B-Tree or STL map usually has good performance on both lookup and update. However, the contained data are lost once the process is closed or power is shutdown. On the other hand, on-disk data structures such as embedded database and file mapping persist data, at the cost of low performance because data and index must be serialized before writing and deserialized after reading. Cache is introduced to address the problem. But in order to get comparable performance, the size of cache also becomes "comparable" large. The more cache is, the less persistence is, because the large cache needs rebuilding once process is closed. Complicated cache hierarchies could also lead to bad performance because of locking and mutex.
My question is, is there a good balance between high performance and yet persistent? Is there standard solution or library in common OS (Windows, Linux) and imperial programming languages (C, C++; I guess dynamic programming languages would never hit the performance requirement, right?) If not, is there good paper to start with? I'm asking because I usually have the demand to store index or cache in my projects, and I'm tired of making ad-hoc solutions.
Please note that the scale of the data here discuss should be small enough fit completely in memory (100% not persistent), i.e. at maximum of 2GB, 4GB. Otherwise, client/server database would be the only appropriate way to manage data.
Furthermore, I think this question should be divided to specific access patterns, because each pattern might have different approach and optimization techniques.
- No original data to begin with. Most data are generated based on user actions. It can be expected of a lot of writing and few reading especially at early stage.
- Similar to 1. but data are only inserted, never updated or deleted. Deletion and vacuum are done separately. This is possible when data is generated deterministically while its usage is based on user actions. For example, store displayed bitmaps which are selected by user.
- Data is ready on disk. Few writing and most reading. Locking is still required.
- Read-only.
Thank you!