I am evaluating a number of NoSQL implementations (RavenDB and MongoDB at the moment) as a means of solving a specific set of requirements that involve storage/retrieval of data that is schema-less. I want to get some feedback on whether NoSQL is the direction I should be looking in, or if there are other (potentially simpler) options.
Essentially we have a software product that (among other things) defines a basic domain model that consists of a few related entities, each of which have a number of attributes (key/value). As we release to the customer, we work with them to setup the attributes and values, which is essentially the configuration of the system. This is fairly straightforward, and because the design is known up front, we don't need anything dynamic to achieve this and make it perform (we will use an RDBMS). The attributes are not known up front, but again this is not a problem as this part of the system pretty much revolves around an attribute model.
The problem is that for different customers, and AFTER we release and are in production, we find that we need to query for specific sets of attribute data that we knew nothing about when we compiled and released the code (and before we configured the attributes for the customer). We basically need to produce data from the attribute maps that we can store (we won't know the structure up front) and then query that stored data later in ways we can't anticipate. The thinking right now is that we can create hooks that get hit during processing and allow us to plug-in libraries (likely via MEF) that create the data so it gets stored, and then query it later when needed (not for reporting--usually to create additional data/attributes).
(Note that creating the hooks and plug-in libraries is a separate problem, and is not intended to be part of this question.)
A common scenario might be: "I want to know how many times xxx occurred in the last 10 days". So I would create a plug-in that would recognize that xxx has occurred, and write it to a data store with a date/time. Then I would create another plug-in (probably in the same DLL) that would perform the query, and add an attribute to the model called "CountOfxxxInLast10Days". Another scenario might be to create configurable lookups. So I might have a plug-in that runs at startup to create/update a table of lookup data that could convert one attribute value to another, or (more likely) a range of values that would convert to a lookup values. So the conversion plugin might add a table with columns: bottom_value, top_value, multiplier, and the query plugin would query the table using an attribute value, like "SELECT multiplier FROM table WHERE [attribute_value] BETWEEN bottom_value AND top_value". The result might add the result to the an attribute called "Multiplier".
In certain cases, old data could be purged after a specified period of time. In the first scenario described above, it might be desirable to remove data from the store/cache that was older than ten days.
In other cases data would need to be persisted permanently, like in the second scenario above. It's possible this data could simply be re-created at startup, as opposed to held in a permanent store.
Additional requirements:
- The datastore/cache can be backed up and restored while online
- Can be replaced/recovered from the last backup in the case of a crash
- Data survives events like machine reboot
- Proven/production-tested technology
We are pretty committed to the .Net platform at this point, so any option would have to have a solid .Net client/API.