views:

124

answers:

3

Imagine a system that works with data, which structure changes over time. Say, for example, today your User object contains Name : String and Email : String, but tomorrow you need to add Age : Integer and Address which consists of Country, Zip code, etc. Then you may want to create new field User.Contacts and move Email and Address to that field, like refactoring. And it should be done in runtime, without coding and redeployment, because it will be done by customers or administrators, not developers.

What approaches and tools would you consider for storing such data? Will it be separate table for each class of objects and altering table each time structure is changed, or 1-to-many relationship between object and it's property values (like table StringProperties with fields ObjectID, PropertyID, StringValue); or one big table for all objects (with generic fields StringField1, NumericField2, etc)

How would you make indexing work?

Would you consider using less mainstream tools like CouchDB? Are there any other tools I should know about?

Are there any existing examples of application with similar idea - allowing users to define and augment their own data structures?

EDIT: I do not expect anyone to solve my entire design problem. Rough ideas or links to tools like CouchDB or Prevayler are more than welcome. Any article links are welcome too.

+1  A: 

I think this would largely depend on the longevity of the data and the language you're in.

For a short lived structure, in a dynamic language, then I would be tempted to go low-brow and use a Hash of Lists.

On the other end of the scale - something you need to persist and you really want a relational database, then I would probably move over to a more modular architecture, whereby the client code takes care of the whole lifecycle of the data - up to and including the create table statements, the marshalling and unmarshalling, and querying of the data.

For the marshalling/unmarshalling/query problem, there's another fork in the road which may be using a ORM tool, or using a more low-tech/raw SQL approach. Either way, you would need some kind of staged approach which is part of the modular design.

Of course, how you arrange your data structure when it's in memory could be a straight forward Map of Lists, or more type safe approach, such as found with Eclipse's IAdaptable "pattern".

Otherwise, you're in the territory of tools like Prevayler which are more advanced serialized to disk tools than RDBMS.

On a side note, you could do a lot worse than CouchDB.

jamesh
I am considering Hash and Lists to store data in runtime for sure, but it needs to be persisted anyway.
Pavel Feldman
Prevayler looks interesting, thanks.
Pavel Feldman
+1  A: 

If the requirement is that you need the ability to create name-value pairs at random, then in some form or other, you'll end up with one or more tables of name-value pairs:

ID     USER_ID   PROPERTY_NAME         PROPERTY_VALUE
---------------------------------------------------------------
1      1         Name                  Chris
2      1         Occupation            Developer
3      2         Name                  Joe
4      2         Hair Color            Brown

... and so on. Of course, these name-value tables grow like crazy with time, so indexing and partitioning are important; to the extent you can classify property types into separate tables while still retaining the flexibility you need, you'll be able to keep the table size under relative control. I've worked on projects that used this approach, with table sizes stretching into the tens of millions of rows (on SQL Server & Oracle -- I can't vouch personally for others) without issue.

I'm no DBA, but in terms of indexing, my understanding is that you'll want one clustered index, to keep associated records in relative proximity on disk, and in my example, you'd probably want a non-clustered, non-unique index on USER_ID also, since you'd be querying directly against it, but beyond that, I don't have specific recommendations -- other than maybe to read Stephane Faroult's excellent book The Art of SQL, which offers way more insight than I ever could on this complex topic. Good luck!

Christian Nunciato
Looking at the other answers, it looks like I might've only answered part of your question; I skated right past the ORM piece (for which I don't have a strong personal rec anyway). Hope it's not totally off the mark.
Christian Nunciato
+1  A: 

If you want the end user / administrator to be making the changes, you really cant have them accessing the schema and adding / removing relationships and tables etc, they will break something.

Previously I've used an XML based generic schema for storing information such as this. You end up with something like this (psydo-xml):

<Data>
<Name>Something</Name>
<DOB>2008-01-01</DOB>
<Address>
<Street>1a Foo St</Street>
...
</Address>
</Data>

To be able to render this to HTML (or anything else) a metadata file was used to specify what each type was, (i.e. DOB = date, Address/Street = string) and the correct renderer used to render the output to screen in the same heirachal appearance as the XML

Not sure if this is suitable for your needs, as because its so generic, you have limited ability to specify layout and those sorts of things. The above approach was used for rendering generic configuration files.

RM