views:

228

answers:

5

What are some of the popular techniques you can adopt to add durability to your in-memory data structures (ie) if the process crashes, you can preserve all previously executed operations on that data structure?

If my data structure involves just a list of tuples, then I would just store them in a SQL DB and that would give me durability for free. But what if my data structure was a graph or a tree?

The one thing I could think of is to explicitly log all operations to disk (append-only log) and in the event of a crash, replay the log to preserve the previous state. If the log becomes too big, then there will be a compaction step. I'm guessing this is what a database engine does internally for durability (checkpointing is what this process is called)?

Btw note that this is not a scenario where the entire dataset doesn't fit in memory.

A: 

The word you're looking for is "serialization".

Rik
I have obviously heard of the word "serialization". And obviously naive serialization of the entire data structure to disk after every single operation will work in theory but not in practice. I was talking about doing this efficiently (if not what's the point of having the data in-memory?)
Harish
Ah! Perhaps some caching mechanism would suit you better then?
Rik
A: 

You could come up with some way to serialize your structure, whether with XML, YAML, JSON, etc. Then you could either store that in the DB, or perhaps put one big try/catch around the main execution point to the program. Then if some uncaught exception happens, which will cause the program to crash, you could serialize your data, ans well as log any error messages, stack traces, etc.

pkaeding
A: 

Yes, you would want to serialize the data to some format - xml, binary, whatever. Depending on the programming languagem this may be built in for you. Java has ObjectStreams, .NET has XmlSerializer and also a BinaryFormatter.

Chris Marasti-Georg
A: 

Any answer to your question will entail doing something like what an ACID database system does. So I would say your best bet is to use an RDBMS to store your application state, updating whenever you have an (application) transaction that must not be lost.

Jeff C
+4  A: 

You might want to try an object prevalence engine. For .NET, you might want to try Bamboo.Prevalence, which is a port of a similar engine called Prevayler for Java.

Justice
Object Prevalence Engines seem to work just like the way I suggested - changes journaled for system recovery. Anyways thanks for those links. They seem interesting!
Harish
Justice, did you use Prevayler? Could you answer my question about it? http://stackoverflow.com/questions/454294/what-are-synchronizing-strategies-for-prevayler
Sergey
In my experience, prevalence works very well for small, write-intensive data sets. It works less well with developers that don't intuitively understand the constraints demanded of prevalent objects -- in particular, determinism in the face of time- or space-shifted computation.
Jeffrey Hantin