views:

3204

answers:

12

What is a recommended architecture for providing storage for a dynamic logical database schema?

To clarify: Where a system is required to provide storage for a model whose schema may be extended or altered by its users once in production, what are some good technologies, database models or storage engines that will allow this?

A few possibilities to illustrate:

  • Creating/altering database objects via dynamically generated DML
  • Creating tables with large numbers of sparse physical columns and using only those required for the 'overlaid' logical schema
  • Creating a 'long, narrow' table that stores dynamic column values as rows that then need to be pivoted to create a 'short, wide' rowset containing all the values for a specific entity
  • Using a BigTable/SimpleDB PropertyBag type system

Any answers based on real world experience would be greatly appreciated

+2  A: 

A strongly typed xml field in MSSQL has worked for us.

Bloodhound
Voted up to protect you, seems like people who disagree tend to vote down even if the answer is relevant.
Sklivvz
I did the same for you.
Bloodhound
+2  A: 

The whole point of having a relational DB is keeping your data safe and consistent. The moment you allow users to alter the schema, there goes your data integrity...

If your need is to store heterogeneous data, for example like a CMS scenario, I would suggest storing XML validated by an XSD in a row. Of course you lose performance and easy search capabilities, but it's a good trade off IMHO.

Sklivvz
A: 

Sounds to me like what you really want is some sort of "meta-schema", a database schema which is capable of describing a flexible schema for storing the actual data. Dynamic schema changes are touchy and not something you want to mess with, especially not if users are allowed to make the change.

You're not going to find a database which is more suited to this task than any other, so your best bet is just to select one based on other criteria. For example, what platform are you using to host the DB? What language is the app written in? etc

To clarify what I mean by "meta-schema":

CREATE TABLE data (
    id INTEGER NOT NULL AUTO_INCREMENT,
    key VARCHAR(255),
    data TEXT,

    PRIMARY KEY (id)
);

This is a very simple example, you would likely have something more specific to your needs (and hopefully a little easier to work with), but it does serve to illustrate my point. You should consider the database schema itself to be immutable at the application level; any structural changes should be reflected in the data (that-is, the instantiation of that schema).

Daniel Spiewak
A: 

In the past I've chosen option C -- Creating a 'long, narrow' table that stores dynamic column values as rows that then need to be pivoted to create a 'short, wide' rowset containing all the values for a specific entity.. However, I was using an ORM, and that REALLY made things painful. I can't think of how you'd do it in, say, LinqToSql. I guess I'd have to create a Hashtable to reference the fields.

@Skliwz: I'm guessing he's more interested in allowing users to create user-defined fields.

Danimal
A: 

I know that models indicated in the question are used in production systems all over. A rather large one is in use at a large university/teaching institution that I work for. They specifically use the long narrow table approach to map data gathered by many varied data acquisition systems.

Also, Google recently released their internal data sharing protocol, protocol buffer, as open source via their code site. A database system modeled on this approach would be quite interesting.

Check the following:

Entity-attribute-value model

Google Protocol Buffer

A: 

sql already provides a way to change your schema: the ALTER command.

simply have a table that lists the fields that users are not allowed to change, and write a nice interface for ALTER.

longneck
+3  A: 

I did it ones in a real project:

The database consisted of one table with one field which was an array of 50. It had a 'word' index set on it. All the data was typeless so the 'word index' worked as expected. Numeric fields were represented as characters and the actual sorting had been done at client side. (It still possible to have several array fields for each data type if needed).

The logical data schema for logical tables was held within the same database with different table row 'type' (the first array element). It also supported simple versioning in copy-on-write style using same 'type' field.

Advantages:

  1. You can rearrange and add/delete your columns dynamically, no need for dump/reload of database. Any new column data may be set to initial value (virtually) in zero time.
  2. Fragmentation is minimal, since all records and tables are same size, sometimes it gives better performance.
  3. All table schema is virtual. Any logical schema stucture is possible (even recursive, or object-oriented).
  4. It is good for "write-once, read-mostly, no-delete/mark-as-deleted" data (most Web apps actually are like that).

Disadvantages:

  1. Indexing only by full words, no abbreviation,
  2. Complex queries are possible, but with slight performance degradation.
  3. Depends on whether your preferred database system supports arrays and word indexes (it was inplemented in PROGRESS RDBMS).
  4. Relational model is only in programmer's mind (i.e. only at run-time).

And now I'm thinking the next step could be - to implement such a database on the file system level. That might be relatively easy.

Thevs
+8  A: 

What you are proposing is not new. Plenty of people have tried it... most have found that they chase "infinite" flexibility and instead end up with much, much less than that. It's the "roach motel" of database designs -- data goes in, but it's almost impossible to get it out. Try and conceptualize writing the code for ANY sort of constraint and you'll see what I mean.

The end result typically is a system that is MUCH more difficult to debug, maintain, and full of data consistency problems. This is not always the case, but more often than not, that is how it ends up. Mostly because the programmer(s) don't see this train wreck coming and fail to defensively code against it. Also, often ends up the case that the "infinite" flexibility really isn't that necessary; it's a very bad "smell" when the dev team gets a spec that says "Gosh I have no clue what sort of data they are going to put here, so let 'em put WHATEVER"... and the end users are just fine having pre-defined attribute types that they can use (code up a generic phone #, and let them create any # of them -- this is trivial in a nicely normalized system and maintains flexibility and integrity!)

If you have a very good development team and are intimately aware of the problems you'll have to overcome with this design, you can successfully code up a well designed, not terribly buggy system. Most of the time.

Why start out with the odds stacked so much against you, though?

Don't believe me? Google "One True Lookup Table" or "single table design". Some good results: http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056

http://thedailywtf.com/Comments/Tom_Kyte_on_The_Ultimate_Extensibility.aspx?pg=3

http://www.dbazine.com/ofinterest/oi-articles/celko22

http://thedailywtf.com/Comments/The_Inner-Platform_Effect.aspx?pg=2

Matt Rogish
Yep. I take your point. The question was really to see if there were any patterns or systems I wasn't aware of given that this is one of those requirements that seems to come up over and over again. Thanks,
Fake Jim
Sure, makes sense. Good luck!
Matt Rogish
A: 

Create 2 databases

  • DB1 contains static tables, and represents the "real" state of the data.
  • DB2 is free for users to do with as they wish - they (or you) will have to write code to populate their odd-shaped tables from DB1.
AJ
+3  A: 

Like some others have said, don't do this unless you have no other choice. One case where this is required is if you are selling an off-the-shelf product that must allow users to record custom data. My company's product falls into this category.

If you do need to allow your customers to do this, here are a few tips:
- Create a robust administrative tool to perform the schema changes, and do not allow these changes to be made any other way.
- Make it an administrative feature; don't allow normal users to access it.
- Log every detail about every schema change. This will help you debug problems, and it will also give you CYA data if a customer does something stupid.

If you can do those things successfully (especially the first one), then any of the architectures you mentioned will work. My preference is to dynamically change the database objects, because that allows you to take advantage of your DBMS's query features when you access the data stored in the custom fields. The other three options require you load large chunks of data and then do most of your data processing in code.

Josh Yeager
A: 

I have a similar requirement and decided to use MongoDB.

MongoDB (from "humongous") is an open source, scalable, high-performance, schema-free, document-oriented database written in the C++ programming language. (Wikipedia)

Highlights:

  • has rich query functionality (maybe the closest to SQL DBs)
  • production ready (foursquare, sourceforge use it)

Lowdarks (stuff you need to understand, so you can use mongo correctly):

clyfe
A: 

i have same problem,, on my ongoing thesis project, what solution i propose is to create dynamic query base on user entry data (in the case of naming field of table) , and system generator for name of table..

so far i don't have any problem with my app testing process,, but my questions are...

  1. how i can describe the dynamic side on my e-r diagram ?
  2. since it was thesis project what defend arguments if it was considered by a tester as a mistake design?

thanks..

agan
Put this on the "Ask a question" page (top right button), this is the answers section. Also your question makes absolutely no sense whatsoever, please rephrase.
clyfe