views:

765

answers:

9

If you have to create an application like - let's say a blog application, creating the database schema is relatively simple. You have to create some tables, tblPosts, tblAttachments, tblCommets, tblBlaBla… and that's it (ok, i know, that's a bit simplified but you understand what i mean).

What if you have an application where you want to allow users to define parts of the schema at runtime. Let's say you want to build an application where users can log any kind of data. One user wants to log his working hours (startTime, endTime, project Id, description), the next wants to collect cooking recipes, others maybe stock quotes, the weekly weight of their babies, monthly expenses they spent for food, the results of their favorite football teams or whatever stuff you can think about.

How would you design a database to hold all that very very different kind of data? Would you create a generic schema that can hold all kind of data, would you create new tables reflecting the user data schema or do you have another great idea to do that?

If it's important: I have to use SQL Server / Entity Framework

A: 

I would just give them a copy of SQL Server Management Studio, and say, "go nuts!" Why reinvent a wheel within a wheel?

RedFilter
Not a good idea since the user has no idea what "SQL" or a "Management Studio" is. Funny answer but not really helpful.
JRoppert
+3  A: 

On the surface, a schema-less or document-oriented database such as CouchDB or SimpleDB for the custom user data sounds ideal. But I guess that doesn't help much if you can't use anything but SQL and EF.

Joel Mueller
+3  A: 

I'm not familiar with the Entity Framework, but I would lean towards the Entity-Attribute-Value (http://en.wikipedia.org/wiki/Entity-Attribute-Value_model) database model.

So, rather than creating tables and columns on the fly, your app would create attributes (or collections of attributes) and then your end users would complete the values.

But, as I said, I don't know what the Entity Framework is supposed to do for you, and it may not let you take this approach.

Antony
FWIW, Entity Framework is a nascent ORM library for the Microsoft .NET framework.
Bill Karwin
Thanks for that Bill - I'm not a .NET guy so had never heard of it. I'm thinking, though, that an ORM would struggle with EAV?
Antony
It's possible with ORM (in fact easier!), you can have a business object interface which is composed of some EAV interface property. e.g. IEntity composed of List<IAttribute> where IAttribute is composed of List<IValue> (and also links back to IEntity). So long as your defining your entity -> attribute -> values in the standard way your ORM tool does, it's just like any other relationship.
Si
A: 

Check out this post you can do it but it's a lot of hard work :) If performance is not a concern an xml solution could work too though that is also alot of work.

mhenrixon
I know that article but i have no clue why that should be an answer to my question!?
JRoppert
Fair enough, was a quick answer. You could of course leave the "dynamic" parts of the schema to some other engine that is the beauty of domain driven design anyways but I have a hard time figuring out how you would solve that with EF or L2S. I mean what type of application would that be targeting logging everyones various needs anyway? Is it just for the fun of it or is it a real project?
mhenrixon
+3  A: 

There's no way you can predict how complex their data requirements will be. Entity-Attribute-Value is one typical solution many programmers use, but it might be be sufficient, for instance if the user's data would conventionally be modeled with multiple tables.

I'd serialize the user's custom data as XML or YAML or JSON or similar semi-structured format, and save it in a text BLOB.

I'm a critic of the Entity-Attribute-Value anti-pattern.


Re comment from @Si:

Here's an SO answer where I list some problems with Entity-Attribute-Value: "Product table, many kinds of products, each product has many parameters."

Here's a blog I posted the other day with some more discussion of EAV problems: "EAV FAIL."

Some more examples can be found in my "SQL Antipatterns" slides here: http://www.slideshare.net/billkarwin

Bill Karwin
Care to elaborate on your criticisms of EAV Bill? I think it has its place, although there are constraints of course, but see the link in my response.
Si
a blob for everything? yuck! that'll make even simple reporting damn near impossible, plus it'll be harder to get good performance.
Booji Boy
@Booji Boy: I agree, picking apart XML (or whatever) is hard in SQL, but the OP didn't say he needed to do any reporting. It's just as likely that the whole blob would have to be loaded into the application to do whatever the user wanted to do with it. And that's hard to anticipate anyway.
Bill Karwin
@Bill, I agree there's no perfect solution. There's some interesting cross over with auditing too. One point I'd like to make though, the 'Value' part (column or separate table) doesn't just have to consist of a single column, each data type can exist as a separate column, and it's the responsibility of the 'Attribute' to determine which column (data type) to use.
Si
@Si: Right, I've seen that solution too. It solves the problem of data-type validation, but none of the other problems of EAV. Plus it makes queries even more unintelligible.
Bill Karwin
+4  A: 

I would go for a Hybrid Entity-Attribute-Value model, so like Antony's reply, you have EAV tables, but you also have default columns (and class properties) which will always exist.

Here's a great article on what you're in for :)

As an additional comment, I knocked up a prototype for this approach using Linq2Sql in a few days, and it was a workable solution. Given that you've mentioned Entity Framework, I'd take a look at version 4 and their POCO support, since this would be a good way to inject a hybrid EAV model without polluting your EF schema.

Si
That's a great article you found Si - bookmarked for next time!
Antony
+1  A: 

Not as a critical comment, but it may help save some of your time to point out that this is one of those Don Quixote Holy Grail type issues. There's an eternal quest for probably over 50 years to make a user-friendly database design interface.

The only quasi-successful ones that have gained any significant traction that I can think of are 1. Excel (and its predecessors), 2. Filemaker (the original, not its current flavor), and 3. (possibly, but doubtfully) Access. Note that the first two are limited to basically one table.

I'd be surprised if our collective conventional wisdom is going to help you break the barrier. But it would be wonderful.

le dorfier
Second this you can either be very limited and totally non performant with some entity attribute scheme, or, you can come up with something flexiable, complex and a nightmare to maintain which is really just a piss poor implementation of data defnition language.
James Anderson
+1  A: 

Rather than re-implement sqlservers "CREATE TABLE" statement, which was done many years ago by a team of programmers who were probably better than you or I, why not work on exposing SQLSERVER in a limited way to the users -- let them create thier own schema in a limited way and leverage the power of SQLServer to do it properly.

James Anderson
+2  A: 

Let's try again.

If you want them to be able to create their own schema, then why not build the schema using, oh, I dunno, the CREATE TABLE statment. You have a full boat, full functional, powerful database that can do amazing things like define schemas and store data. Why not use it?

If you were just going to do some ad-hoc properties, then sure.

But if it's "carte blanche, they can do whatever they want", then let them.

Do they have to know SQL? Umm, no. That's your UIs task. Your job as a tool and application designer is to hide the implementation from the user. So present lists of fields, lines and arrows if you want relationships, etc. Whatever.

Folks have been making "end user", "simple" database tools for years.

"What if they want to add a column?" Then add a column, databases do that, most good ones at least. If not, create the new table, copy the old data, drop the old one.

"What if they want to delete a column?" See above. If yours can't remove columns, then remove it from the logical view of the user so it looks like it's deleted.

"What if they have eleventy zillion rows of data?" Then they have a eleventy zillion rows of data and operations take eleventy zillion times longer than if they had 1 row of data. If they have eleventy zillion rows of data, they probably shouldn't be using your system for this anyway.

The fascination of "Implementing databases on databases" eludes me.

"I have Oracle here, how can I offer less features and make is slower for the user??"

Gee, I wonder.

Will Hartung