tags:

views:

2843

answers:

19

For an open source project I am looking for a good, simple implementation of a Dictionary that is backed by a file. Meaning, if an application crashes or restarts the dictionary will keep its state. I would like it to update the underlying file every time the dictionary is touched. (Add a value or remove a value). A FileWatcher is not required but it could be useful.

class PersistentDictionary<T,V> : IDictionary<T,V>
{
    public PersistentDictionary(string filename)
    {

    } 
}

Requirements:

  • Open Source, with no dependency on native code (no sqlite)
  • Ideally a very short and simple implementation
  • When setting or clearing a value it should not re-write the entire underlying file, instead it should seek to the position in the file and update the value.

Similar Questions

+1  A: 

Sounds cool, but how will you get around changes to the stored value (if it was a reference type) itself? If its immutable then all is well but if not you're kinda stuffed :-)

If you're not dealing with immutable values, I would suspect a better approach would be to handle persistence at the value level and to just rebuild the dictionary as necessary.

(edited to add a clarification)

chrisb
Id be happy for only immutable data support (I could add an interface with an OnChanged for this kind of stuff, or perhaps proxy object which can get a bit yucky)
Sam Saffron
+1  A: 

Just use serialization. Look at the BinaryFormatter class.

leppie
That will not give me the performance Im after
Sam Saffron
You should edit the question to include your performance requirements.
spoulson
I did "When setting or clearing a value it should not re-write the entire underlying file, instead it should seek to the position in the file and update the value."
Sam Saffron
A: 

I don't know of anything to solve your problem. It will need to be a fixed size structure, so that you can meet the requirements of being able to rewrite records without rewriting the entire file.

This means normal strings are out.

Douglas Leeder
Perhaps one can specify the max length in the ctor and have it throw when people add values that are greater?
Quibblesome
A: 

Like Douglas said, you need to know the fixed size of your types (both T and V). Also, variable-length instances in the object grid referenced by any of those instances are out.

Still, implementing a dictionary backed by a file is quite simple and you can use the BinaryWriter class to write the types to disk, after inheriting or encapsulating the Dictionary<TKey, TValue> class.

Omer van Kloeten
The whole point would be supporting dynamically sized types... SQL does it and uses a single file so I see no reason why this is not possible
Sam Saffron
Use a database behind it then, instead of a file. I wouldn't go reinventing anything that's been done better before.
Omer van Kloeten
SQL uses caching and multi-threaded tasks to perform the way it does. To do that all for a single dictionary object is insane. You're requirements are unrealistic.
Cameron MacFarland
"Reinventing" something like this is a great way to learn lots of neat stuff. His stated performance does not need anything like multiple threads or caching logic... beyond just keeping the whole thing in memory while running.
darron
After all, that's how Dictionary works and he just wants it to persist.
darron
Dynamic size isn't too bad either. Memory instances keep file position pointers, writing a larger value deletes and rewrites at the end... Destructor rewrites the file to compress deleted gaps. (Additionally a "gap count" could flag this to happen occasionally if the app stays up for a while)
darron
Add some critical sections for thread safety and you're done.
darron
+15  A: 
  • The C5 Generic Collection Library

    C5 provides functionality and data structures not provided by the standard .Net System.Collections.Generic namespace, such as persistent tree data structures, heap based priority queues, hash indexed array lists and linked lists, and events on collection changes.

  • bplustreedotnet

    The bplusdotnet package is a library of cross compatible data structure implementations in C#, java, and Python which are useful for applications which need to store and retrieve persistent information. The bplusdotnet data structures make it easy to store string keys associated with values permanently.

  • ESENT Managed Interface

    Not 100% managed code but it's worth mentioning it as unmanaged library itself is already part of every windows XP/2003/Vista/7 box

    ESENT is an embeddable database storage engine (ISAM) which is part of Windows. It provides reliable, transacted, concurrent, high-performance data storage with row-level locking, write-ahead logging and snapshot isolation. This is a managed wrapper for the ESENT Win32 API.

lubos hasko
Thanks for the tip, added them to my list of useful libraries.
Tiberiu Ana
@lubos From C5 docs: *A persistent sorted collection implements interface IPersistentSorted<T> and is asorted collection of which one can efficiently make a read-only snapshot, or a “copy”that is not affected by updates to the original collection. It describes a method Snapshot() that returns a sorted collection with exactly the same items as the persistent sorted collection. The TreeSet<T> and TreeBag<T> collection classes are persistent sorted collections; their Snapshot() methods take constant time, but sub-sequent updates to the given tree take more time and space* - not what is required
Konstantin Spirin
C5 data structures are not persistent in that way. See http://en.wikipedia.org/wiki/Persistent_data_structure
Mauricio Scheffer
+5  A: 

one way is to use the Extensible Storage Engine built into windoows to store your stuff. It's a native win database that supports indexing, transactions etc...

Mladen Prajdic
ESE is good for this problem ... aynde has some wrappers for it.
Sam Saffron
A: 

Consider a memory mapped file. I'm not sure if there is direct support in .NET, but you could pinvoke the Win32 calls.

kenny
A: 

I haven't actually used it, but this project apparently provides an mmap()-like implementation in C#

Mmap

axel_c
A: 

I am not much of a programmer, but wouldn't creating a really simple XML format to store your data do the trick?

<dico> 
   <dicEntry index="x">
     <key>MyKey</key>
     <val type="string">My val</val>
   </dicEntry>
   ...
</dico>

From there, you load the XML file DOM and fill up your dictionary as you like,

XmlDocument xdocDico = new XmlDocument();
string sXMLfile;
public loadDico(string sXMLfile, [other args...])
{
   xdocDico.load(sXMLfile);
   // Gather whatever you need and load it into your dico
}
public flushDicInXML(string sXMLfile, dictionary dicWhatever)
{
   // Dump the dic in the XML doc & save
}
public updateXMLDOM(index, key, value)
{
   // Update a specific value of the XML DOM based on index or key
}

Then whenever you want, you can update the DOM and save it on disk.

xdocDico.save(sXMLfile);

If you can afford to keep the DOM in memory performance-wise, it's pretty easy to deal with. Depending on your requirements, you may not even need the dictionary at all.

He didn't want to rewrite the entire file on updates.
ck
Oups, looks like I wasn't paying enough attention.
+14  A: 

Let me analyze this:

  1. Retrieve information by key
  2. Persistant storage
  3. Do not want to write back the whole file when 1 value changes
  4. Should survive crashes

I think you want a database.

Edit: I think you are searching for the wrong thing. Search for a database that fits your requirements. And change some of your requirements, because I think it will be difficult to meet them all.

GvS
I understand what you are saying, but the database part is just another implementation detail, for a drop in solution that depends on sqlite you need to have code that handles x64 vs x32 you need a repository and a bunch of other things, so though, yes an embedded db may be a solution...
Sam Saffron
... the devil is in the detail, and there is not detail or code in this answer. I explicitly asked for an implementation of IDictionary
Sam Saffron
+1  A: 

I found this link and sound like what you are looking for. It is programmed in Java but it shouldn't be that hard to port it to C#:

http://www.javaworld.com/javaworld/jw-01-1999/jw-01-step.html

Igor Zelaya
+1 looks interesting
Sam Saffron
+1  A: 

I think your issue is likely to be that last point:

When setting or clearing a value it should not re-write the entire underlying file, instead it should seek to the position in the file and update the value.

This is exactly what a DB does - you're basically describing a simple file based table structure.

We can illustrate the problem by looking at strings.

Strings in memory are flexible things - you don't need to know the length of a string in C# when you declare its type.

In data storage strings and everything else are fixed sizes. Your saved dictionary on disk is just a collection of bytes, in order.

If you replace a value in the middle it either has to be exactly the same size or you will have to rewrite every byte that comes after it.

This is why most databases restrict text and blob fields to fixed sizes. New features like varchar(max)/varbinary(max) in Sql 2005+ are actually clever simplifications to the row only actually storing a pointer to the real data.

You can't use the fixed sizes with your example because it's generic - you don't know what type you're going to be storing so you can't pad the values out to a maximum size.

You could do:

class PersistantDictionary<T,V> : Dictionary<T,V>
    where V:struct

...as value types don't vary in storage size, although you would have to be careful with your implementation to save the right amount of storage for each type.

However your model wouldn't be very performant - if you look at how SQL server and Oracle deal with table changes they don't change the values like this. Instead they flag the old record as a ghost, and add a new record with the new value. Old ghosted records are cleaned up later when the DB is less busy.

I think you're trying to reinvent the wheel:

  • If you're dealing with large amounts of data then you really need to check out using a full-blown DB. MySql or SqlLite are both good, but you're not going to find a good, simple, open-source and lite implementation.

  • If you aren't dealing with loads of data then I'd go for whole file serialisation, and there are already plenty of good suggestions here on how to do that.

Keith
MSSQL (and I'd assume other RDBMS) use a row offset table to track beginning of rows from a page.
Mark Brackett
+1  A: 

I'd recommend SQL Server Express or other database.

  • It's free.
  • It integrates very well with C#, including LINQ.
  • It's faster than a homemade solution.
  • It's more reliable than a homemade solution.
  • It's way more powerful than a simple disk-based data structure, so it'll be easy to do more in the future.
  • SQL is an industry standard, so other developers will understand your program more easily, and you'll have a skill that is useful in the future.
Jay Bazuzi
SQL Express is not a drop in solution, its a messy complex installer, sql ce may be a solution ...
Sam Saffron
A reasonable concern.
Jay Bazuzi
+1  A: 

I wrote up an implementation myself based on a very similar (I think identical) requirement I had on another project a while ago. When I did it, one thing I realised was that most of the time you'll be doing writes, you only do a read rarely when the program crashes or when it's closed. So the idea is to make the writes as fast as possible. What I did was make a very simple class which would just write a log of all the operations (additions and deletions) to the dictionary as things occurred. So after a while you get a lot of repeating between keys. Because of that, once the object detects a certain amount of repetition, it'll clear the log and rewrite it so each key and its value only appears once.

Unfortunately, you can't subclass Dictionary because you can't override anything in it. This is my simple implementation, I haven't tested it though I'm sorry, I thought you might want the idea though. Feel free to use it and change it as much as you like.

class PersistentDictManager {
    const int SaveAllThreshold = 1000;

    PersistentDictManager(string logpath) {
        this.LogPath = logpath;
        this.mydictionary = new Dictionary<string, string>();
        this.LoadData();
    }

    public string LogPath { get; private set; }

    public string this[string key] {
        get{ return this.mydictionary[key]; }
        set{
            string existingvalue;
            if(!this.mydictionary.TryGetValue(key, out existingvalue)) { existingvalue = null; }
            if(string.Equals(value, existingvalue)) { return; }
            this[key] = value;

            // store in log
            if(existingvalue != null) { // was an update (not a create)
                if(this.IncrementSaveAll()) { return; } // because we're going to repeat a key the log
            }
            this.LogStore(key, value);
        }
    }

    public void Remove(string key) {
        if(!this.mydictionary.Remove(key)) { return; }
        if(this.IncrementSaveAll()) { return; } // because we're going to repeat a key in the log
        this.LogDelete(key);
    }

    private void CreateWriter() {
        if(this.writer == null) {
            this.writer = new BinaryWriter(File.Open(this.LogPath, FileMode.Open)); 
        }
    }

    private bool IncrementSaveAll() {
        ++this.saveallcount;
        if(this.saveallcount >= PersistentDictManager.SaveAllThreshold) {
            this.SaveAllData();
            return true;
        }
        else { return false; }
    }

    private void LoadData() {
        try{
            using(BinaryReader reader = new BinaryReader(File.Open(LogPath, FileMode.Open))) {
                while(reader.PeekChar() != -1) {
                    string key = reader.ReadString();
                    bool isdeleted = reader.ReadBoolean();
                    if(isdeleted) { this.mydictionary.Remove(key); }
                    else {
                        string value = reader.ReadString();
                        this.mydictionary[key] = value;
                    }
                }
            }
        }
        catch(FileNotFoundException) { }
    }

    private void LogDelete(string key) {
        this.CreateWriter();
        this.writer.Write(key);
        this.writer.Write(true); // yes, key was deleted
    }

    private void LogStore(string key, string value) {
        this.CreateWriter();
        this.writer.Write(key);
        this.writer.Write(false); // no, key was not deleted
        this.writer.Write(value);
    }

    private void SaveAllData() {
        if(this.writer != null) {
            this.writer.Close();
            this.writer = null;
        }
        using(BinaryWriter writer = new BinaryWriter(File.Open(this.LogPath, FileMode.Create))) {
            foreach(KeyValuePair<string, string> kv in this.mydictionary) {
                writer.Write(kv.Key);
                writer.Write(false); // is not deleted flag
                writer.Write(kv.Value);
            }
        }
    }

    private readonly Dictionary<string, string> mydictionary;
    private int saveallcount = 0;
    private BinaryWriter writer = null;
}
Ray Hidayat
If you wanted to improve it, you could make it detect idle periods and only flush the full file then. I know you wanted a system that would only update the relevant parts of the file, but not only would that be quite difficult to write, I'm sure this will be more than enough for your needs!
Ray Hidayat
+2  A: 

I was working on porting EHCache to .NET. Take a look at the project

http://sourceforge.net/projects/thecache/

Persistent caching is core functionality that is already implemented. All main Unit Tests are passing. I got a bit stuck on distributed caching, but you do not need that part.

Timur Fanshteyn
Theoretically this should work properly for the problem I have. Thanks
Sam Saffron
+1  A: 

check this blog out: http://ayende.com/Blog/archive/2009/01/17/rhino.dht-ndash-persistent-amp-distributed-storage.aspx

looks to be exactly you are looking for.

Mouk
I'm accepting this, because this is a drop in solution to my problem. I will probably end ups writing my own framework for this and once I do I will amend this answer
Sam Saffron
A: 

http://csharp.illusionalworld.com/2009/02/persistent-dictionary.html

It does all you want.

chikak
+2  A: 

I've implemented the kind of PersistedDictionary you are looking for. The underlying storage is the ESENT database engine which is built into windows. The code is available here:

http://managedesent.codeplex.com/

I am now trying out the PersistentDictionary, but I keep getting the following exception when I instantiate the dictionary: Error TempPathInUse (JET_errTempPathInUse, Temp path already used by another database instance).The documentation does not say I need to use it as a singelton, nor does it say I have to dispose it properly.So something must clearly be wrong.
Thomas Eyde
See the discussion on the ManagedEsent codeplex site.
Laurion Burchall