views:

1286

answers:

8

I'm thinking about using/implementing some kind of an embedded key-value (or document) store for my Windows desktop application. I want to be able to store various types of data (GPS tracks would be one example) and of course be able to query this data. The amount of data would be such that it couldn't all be loaded into memory at the same time.

I'm thinking about using sqlite as a storage engine for a key-value store, something like y-serial, but written in .NET. I've also read about FriendFeed's usage of MySQL to store schema-less data, which is a good pointer on how to use RDBMS for non-relational data. sqlite seems to be a good option because of its simplicity, portability and library size.

My question is whether there are any other options for an embedded non-relational store? It doesn't need to be distributable and it doesn't have to support transactions, but it does have to be accessible from .NET and it should have a small download size.

UPDATE: I've found an article titled SQLite as a Key-Value Database which compares sqlite with Berkeley DB, which is an embedded key-value store library.

+1  A: 

Have you considered simply serializing an ADO DataSet for your data store?

Paul Sasik
If I understand you correctly, this would mean all the data would have to be loaded into memory, which is not an option (too much data). I want to be able to access data from the disk on an individual record basis.
Igor Brejc
That's correct. i thought the size of your data might be smaller the way you were describing it.
Paul Sasik
I've updated the question. Anyway, if it could all fit into memory, I probably wouldn't really need a data store in the first place - I could simply binary serialize all the data objects into a file.
Igor Brejc
+1  A: 

Personally I would go for SQLite with NHibernate (and Fluent NHibernate). NHibernate can generate the database schema automatically for your classes, so you just need to specify what classes you want to persist, and that's quite easy with Fluent NHibernate. Furthermore, you can search for specific objects and you don't need to load all data to memory.

Best Regards
Oliver Hanappi

Oliver Hanappi
But he wanted a schema-less store....
Viktor Klang
Astor is right: I want to avoid the relational model. I want to be able to store practically any kind of data without first having to prepare the database schema for it. Also, having a strict relational model can be problematic if the data structure changes later - I would need to write SQL change scripts for the existing data in the store.
Igor Brejc
I know what he is looking for but such tools like NHibernate with schema generation hide the relational aspect nearly completely. You do not need to define any schema but only the mapping for you classes (which is really straight forward with Fluent NHibernate) and when your classes change, you will need to do some kind of update in any persistence strategy.
Oliver Hanappi
I appreciate that, but hiding relational aspects isn't the same as not having relational model at all. And in practice this hiding goes only so far - sooner or later you need to deal with it "manually" (like in the case of the changes in the model). On the other hand, if you store the data as documents (which some NoSQL solutions do), you don't really need to update the old data - you just need to make sure you support the reading of the data in the older form.
Igor Brejc
A: 

You can use document store MongoDB, it has a .Net driver and it doesn't have a schema. However it is not embedded, MongoDB runs as a separate process. See http://github.com/samus/mongodb-csharp

tuinstoel
+3  A: 

Windows has a built-in embedded non-relational store. It is called ESENT and is used by several Windows applications, including the Active Directory and Windows Desktop Search.

http://blogs.msdn.com/windowssdk/archive/2008/10/23/esent-extensible-storage-engine-api-in-the-windows-sdk.aspx

If you want .NET access you can use the ManagedEsent layer on CodePlex.

http://managedesent.codeplex.com/

That project has a PersistentDictionary class that implements a key-value store that implements the IDictionary interface, but is backed by a database.

Laurion Burchall
@Laurion, I've seen ESENT and was initially very excited. The only problem is that it's Windows-only (think Mono + Linux/Mac).
Igor Brejc
+1  A: 

Applying the KISS principle to your problem I would recommend you use files.

As in filename is the key. File contents is the value. Windows folder is the index.

Simple, quick, efficient, flexible, and foolproof (providing the fools have low intelligence).

James Anderson
Nice approach, although I feel using files for storing values would be a bit over-the-top for simple values (a single integer, for example).
Igor Brejc
The question sort of implies that what is being stored can be quiet large (documents / too much data to be loaded into memory). One of the advantages of the file approach is that you get a nice set of Stream handling classes for free, which is very useful when dealing with big chunks of data and much cleaner than for example splitting data into arbitary nMB blobs and storing it in a database.
James Anderson
True. What about physical limits of the file system? How would such store behave when the number of records reaches > 100.000?Also: when I talked about "too much data", I meant the _whole_ database - I mentioned this to avoid answers like object tree serialization and similar.
Igor Brejc
+1  A: 

Thanks for your kind mention of y_serial... more precisely, it is a Python module:

warehouse Python objects with SQLite

"Serialization + persistance :: in a few lines of code, compress and annotate Python objects into SQLite; then later retrieve them chronologically by keywords without any SQL. Most useful "standard" module for a database to store schema-less data."

http://yserial.sourceforge.net

In my experience, SQLite is a faster and more reliable choice than most databases (including PostgresQL and Berkeley DB) for the majority of projects -- and of course, it does not need a server daemon.

yserial is very easy to implement (and far faster than the "filename is the key / file contents is the value" approach ;-)

code43
Yes, I really like y-serial's approach, especially since it uses sqlite. Keep up the good work! Maybe when I get some time from my other projects, I'll try to do something similar in C# :)
Igor Brejc
+1  A: 

Could you create a simple sqlite database with two columns:

==documents==
id|data

and data would be json data.

You could also create a key table which would be:

==keys==
keyname|keyvalue|id

that would be indexed on keyname and keyvalue for quick lookups.

A single db file could be a collection, and you could create multiple db files for multiple collections.

You could use folders as "dbs" to match mongodb's hierarchy of db->collection->document

RobKohr
Just a note: you would create a template sqlite db file, and copy that for any time you needed to create a new collection. If anyone wants to create a php setup to handle this and open source it, let me know. I think it would be great, but never bothered to make it myself.
RobKohr
@RobKohr, your suggestions are in the direction of how y-serial does things. Have you seen it? http://yserial.sourceforge.net/
Igor Brejc
Nope, but I am looking for a php solution myself.
RobKohr
+2  A: 

Take a look at RavenDB. It looks as though it can be embedded and is schemaless and works with .NET

From the website:

  • Scalable infrastructure: Raven builds on top of existing, proven and scalable infrastructure
  • Simple Windows configuration: Raven is simple to setup and run on windows as either a service or IIS7 website
  • Transactional: Raven support System.Transaction with ACID transactions. If you put data in it, that data is going to stay there
  • Map/Reduce: Easily define map/reduce indexes with Linq queries
  • .NET Client API: Raven comes with a fully functional .NET client API which implements Unit of Work and much more
  • RESTful: Raven is built around a RESTful API
Jafin