Problem setting
- Entities arrive for processing and are taken through a series of steps which operate on those entities and possibly on other, related, entities and generate some results;
- Some of the entities are required to be processed in real-time, without any database access;
- Currently the implementation simply looks up entities in the database, without any caching.
Optimisation time :-)
Possible approaches
Simple cache
A simple in-memory cache has 2 flaws:
- it may overflow, since we are talking about a large number of entities;
- it does not guarantee that the required entities are found in the cache, and it has no way of being queried about the availability or being asked to "preload" itself.
So this is a no-go.
Entity analysis + preloading
I'm considering building some sort of analyser to find out which data needs to be retrieved for a given entity, even in large forms, and do a request for the caches to load the required data out-of-band.
The steps would be:
- Entity arrives. If it's required to be processed in-memory, send a cache load request;
- Entity is placed in a cache waiting queue until the cache loaded response is received. This may be immediate if the data is available;
- Entity is sent for processing and makes use of the loaded data;
- Caches are cleared. This does have the potential for clearing policies but I'm not concerned about those at the moment.
Questions
What are your opinions about this approach? Am I missing some well-known data access patterns which can be applied in this case?
Update 1 : Forgot to mention that the whole processing is single-threaded, and that does restrict the options considerably.