views:

169

answers:

2

I have several fastcgi processes that are supposed to share data. The data is bound to a session (a unique session id string) and should be able to survive a server reboot. Depending on the number of sessions, the shared data might be too big to fit into main memory. Ideally, in the case when the shared data exceeds a certain treshold, the data bound to sessions that have been the least active should exist on disk only, whereas the most active session data should be available from main memory. After a session has been inactive for some time, the sessions data is to be destroyed.

My question is (being a newbie to C/++):

Are there any approaches or libraries that can help me tackle this quite hairy problem ?

Is it possible to use mmap() with shared memory considering the requirement that inactive session data should be destroyed ?

+2  A: 

Well, most people would use a SQL database for this, and either implement a cache or depend on the database to do the recently-used caching. Inactive destruction would be the job of a background thread. At reboot, you'd need to clean out the leftovers from old dead sessions.

The 'weight' of a solution is a funny thing. If you use a database, you'll perhaps have a lot less of your code, and something between a dolphin and a blue whale swimming along behind. If you build a persistence mechanism from scratch, you'll have a lot of code.

Have a look at bdb as an intermediate alternative.

bmargulies
Thanks you for answering! - I am looking for something much more lightweight than a full(or even half) blown RDBMS...
I sincerely hope I don't have to build a persistence mechanism from scratch :). Oracle says: "Berkeley DB is a C library that runs in the same process as your application, avoiding the interprocess communication delays of using a remote database server" - does this mean I can access the data that bdb is keeping in memory directly as shared memory ?
No, not directly. That would not be feasable -- however you only pay for a memory copy.
Hassan Syed
+2  A: 

After your comment to bmargulies I should caution you that I myself tried to do what you are describing, and I found that I was writing a ACID database. To recap, you have asked for :

  • Statistical Caching
  • data persistance
  • data sharing between processes

This is the role of a database system. It is far better to use one written by others. IMO your choices are sqlite and berkeley-db.. Sqlite is not for parallel access, berkeley-db on the other hand is very scaleable however it uses a string - string dictionary as its data model.

BDB can have databases entirely in-memory or the normal way which is serialized to disk and cached in memory. You can also tune the ACID semantics to suite your particular needs -- i.e., you can disable durable writes, and this would give you instant write characteristics while sacrificing bullet-proof data durability.

There are loads of more advanced solutions but these are for real world problems -- i.e., you have to build a cluster.

Hassan Syed
Hi Hassan! Thanks for your warning and the pointer to tuning bdb. I will take a thorough look at berkeley db.
thanks for the accept :P
Hassan Syed