views:

230

answers:

2

I have a certain object type that is stored in a database. This type now gets additional information associated with it which will differ in structure among instances. Although for groups of instances the information will be identically structured, the structure will only be known at runtime and will change over time.

I decided to just add a blob field to the table and store the key/value pairs there in some serialized format. From your experience, what format is most advisable?

In the context of my application, the storage space for this is secondary. There's one particular operation that I want to be fast, which is looking up the correct instance for a given set of key / value pairs (so it's a kind of variable-field composite key). I guess that means, is there a format that plays particularly well with typical database indexing?

Additionally, I might be interested in looking for a set of instances that share the same set of keys (an adhoc "class", if you wish).

I'm writing this in Java and I'm storing in various types of SQL databases. I've got JSON, GPB and native Java serialization on my radar, favouring the cross-language formats. I can think of two basic strategies:

  • store the set of values in the table and add a foreign key to a separate table that contains the set of keys
  • store the key/value pairs in the table
+1  A: 

Not really an anwser to your question, but did you considered looking at the Java Edition of BerkeleyDB ? Duplicate keys and serialized values can be stored with this (fast) engine.

Pierre
Thanks, this is an interesting link, although I'm really not sure how this would help me in my current situation.
Hanno Fietz
+1  A: 

If your goal is to take advantage of database indexes, storing the unstructured data in a BLOB is not going to be effective. BLOBs are essentially opaque from the RDBMS's perspective.

I gather from your description that the unstructured part of the data takes the form of an arbitrary set of key-value pairs associated with the object, right? Well, if the types of all keys are the same (e.g. they're all strings), I'd recommend simply creating a child table with (at least) three columns: the key, the value, and a foreign key to the parent object's row in its table. Since the keys will then be stored in the database as a regular column, they can be indexed effectively. The index should also include the foreign key to the parent table.

A completely different approach would be to look at a "schemaless" database engine like CouchDB, which is specifically designed to deal with unstructured data. I have zero experience with such systems and I don't know how well the rest of your application would lend itself to this alternative storage strategy, but it might be worth looking into.

Rob H
If you decide to take the schemaless approach, Amazon.com SimpleDB is something else you can look into - http://aws.amazon.com/simpledb/
Jack Leow
Parts of my data are already schemaless, I serialize some objects completely and store them in a generic table, much like FriendFeed does (see http://bret.appspot.com/entry/how-friendfeed-uses-mysql). I do this where I don't need the RDBM features and a strict schema would make things harder, but some parts of my data are perfectly suitable for a standard RDBM. Since I use this hybrid strategy, I don't want to switch to CouchDB etc.
Hanno Fietz
Thanks for your comments on the key-value table. I had already decided against that approach, but after reading your post, I reconsidered and it seems I'm changing my mind.
Hanno Fietz