views:

423

answers:

9

I am currently messing around with some stuff for an idea for a site - where I pretty much want to enable my users to create "Tables" which holds data and then allow them to query over this data (in a less geeky way than writing up SQL Queries and hopefully easier than using excel).

My idea, so far, is to represent this in my database using a couple of tables - have one table representing a table, one table representing columns for the table, having one table that represents each row in a table and finally one that represents values. Something akin to (PSEUDO SQL):

CREATE TABLE 'Tables' (
   Id INT NOT NULL PRIMARY KEY,
   NAME VARCHAR(255)
)

CREATE TABLE 'TableColumns' (
   Id INT NOT NULL PRIMARY KEY,
   TableId INT NOT NULL FOREIGN KEY ON 'Tables',
   NAME VARCHAR(255)
)

CREATE TABLE 'TableRows' (
   Id INT NOT NULL PRIMARY KEY,
   TableId INT NOT NULL FOREIGN KEY ON 'Tables',
   RowNumber INT NOT NULL
)

CREATE TABLE 'TableValues' (
   RowId INT NOT NULL PRIMARY KEY,
   ColumnId INT NOT NULL PRIMARY KEY,
   Value VARCHAR(255)
)

(note that the TableValues table has 2 primary key fields here, it's supposed to represent a "composite" primary key, don't bother too much with the fact that my syntax is not legal SQL, it's just supposed to show the idea).

I did a bit of testing with this and was able to successfully do simple querying (simple filtering, ordering and so forth). My way of doing this was to first query the TableRows table - for filtering I then filtered out the rows who's columns did not match the criteria, and for sorting i sorted the RowIds based on their column's content (as specified by the sorting specified). Resulting in a list of Row Ids in the desired order, from here on it was merely just to select what was needed.

All this works fine, but I am a bit stuck from here on. I'd like to somehow be able to represent different data types (which is my main issue really) and also later on work out how to do joins.

While thinking all this through I start to wonder if there's a better way of doing this. Note that performance here, of course, is a factor, but I'm not planning on supporting virtual tables with hundreds of thousands of rows, maybe about 1000 rows per virtual table - of course the entire system needs to be capable of handling many of these.

I know I could always just actually create tables in my database with queries created on the fly in C# to accomplish this, and likewise query using just SQL Queries - however I have never been a huge fan of letting users "construct" queries against my database like this - and it seems to me as if that would lead down a path where many bugs would appear - and in worst case scenario end up allowing the user to kill the database in one way or another.

Also, then my issue becomes how I can deal with this in a way that would make sense from a C# perspective. So far I think I am leaning on using LINQ and then create my own extension methods which would apply the needed functionality - that is ExtensionMethods extending IQueryable.

So what I'd really like would be some ideas of how this could be done, ideas of how to tune performance, ideas of how to deal with separate data types in the table (of course store the type in the tablecolumn, but how to actually store the value so I can filter, sort and so forth by it? - without just adding a "TextValue", "MoneyValue" and so forth column on my tablevalues table). And last but not least, hopefully some good discussions here - I at the very least consider this to be somewhat an interesting topic.

+2  A: 

At the risk of a bunch of downvotes, why not just have them install MS Access?

belgariontheking
This is intended as a Web Application, not a Desktop app - and why not make a webpage that tells people to use MS Access? - Because my mom wouldn't, my sister wouldn't, and honestly no one I know that doesn't think computers are facinating would.
kastermester
@kastermester: Then I would send them to google docs.
Geoffrey Chetwood
@kaster: fair enough, but I'll keep my answer up anyway, as it might help someone come up with a better answer for you.
belgariontheking
@Rich B - I am sure both you and I, and everyone else here, could achieve exactly what we wanted using the spreadsheet in Google Docs, Microsoft Office, Open Office and so forth - but believe it or not, most people out there cannot. I am not setting forth to make a gigantic look-alike to those sorts of things, but rather a small subset of the general querying that we use daily, to allow "normal" people to use this sort of stuff too.
kastermester
@belgariontheking - yes please leave it, it was a fair comment/question/answer.
kastermester
@kastermester: Hahahaha, good luck.
Geoffrey Chetwood
You could use SQL Express and just create a new database for each person. Then allow them to do whatever they want in that database.
Matthew Whited
Or just table prefixes like you would do in MySQL or Oracle
Matthew Whited
+1  A: 

I've run into an approach like this in Microsoft's Business Contact Manager add-in for Outlook. The way they handle field types is to have a table defining the type for each field, then they store the actual field values in a table containing nothing but Varbinary columns. The conversion to/from varbinary is controlled by the field type table.

dsteele
+1  A: 

I'm not sure why all the hate and nobody's actually trying to answer your question. Even if it's ultimately a "how would you implement Google Docs" question, it's still a fair question.

Assuming you've thought through whether you really want to do this (what I'm assuming all the hate is about), here's at least a better idea:

DATA_TABLE {
    TABLEID INT,
    INT1 INT,
    INT2 INT,
    VARCHAR1 VARCHAR(255),
    ... etc
}

Then have metadata stored somewhere that records what a specific TABLEID is named, what fields are used & their user-facing names are, etc. Joins are easy to support as they're just self joins on the data table. Whether you store said metadata in the database or elsewhere is up to you.

This approach can work perfectly fine (I can attest that many successful .com sites used this kind of schema, both that I was involved with and/or knew people who were), but to get optimal performance I'd recommend using CHAR over VARCHAR, aligning your tuples so they fit on pages well, etc.

I think this is fairly (very) similar to dsteele's answer

Matt
Thanks Matt, that looks like an interesting and promising idea - I will look into that.
kastermester
Matt
I'm not sure I understand this answer? If everything is mapped to my data_table, how do I know how many fields I have, what types, etc. Unless you're recommending a data_table per user/use-case?
Chris Kaminski
+9  A: 

For some reason, everybody encounters that idea at some point or another.

It seems right, it should work.

It would. Sort of.

The comments about TheDailyWTF have a point. Re-implementing a DBMS on top of a DBMS is really not a good idea. Going meta like that is going to give you

  • an underperfoming system
  • a maintenance nightmare

If you really need that kind of flexibility (do you ?), you time would be much better spent implementing the layer that allows you to store metadata in some tables and generate the schema for the actual tables in the database.

There are a few examples of this kind of system that I know of:

  • Microsoft OSLO (specifically the Repository system)
  • the ASAM-ODS server architecture (look for the ASAM-ODS package)

And I am sure there are others.

The up-side of that kind of design is that your database actually makes sense in the end, and it uses the RDBMS for its strength. Also since that kind of configuration should not happen all the time once a table has been created, it allows the user to fine-tune the database if needed (in terms of indexing mainly).

I really strongly feel the only correct answer to the kind of system you propose is don't.

Denis Troller
Thanks, I will look into it :).
kastermester
+6  A: 

It's an interesting idea, but using SQL in this way is probably going to be very painful over time.
If I get you correctly, you want users to be able to define data structures and then save data to those structures. You also want to be able query it. I think there are maybe a couple of other ways to go about this;

  • What about using XML? Allow each user to store an XML file per "table" and just maintain it's schema. Each "row" would be an XML element with child elements. You could optionally stick the XML in SQL or just store it by some other means. This wouldn't work very well with large datasets, but for thousands of records it is surprisingly fast; I did some tests with 20+MB XML files in C# and was able to create them, read them and parse them in less than 1 second. Using LINQ to XML you could even construct fairly sophisticated queries and joins. I wouldn't use XML for a large enterprise system but you'd be surprised how far it will go on modern machines with lots of memory and fast processors - and it is infinitely flexible.
  • Could you use an object-oriented database (Matisse etc) instead? I haven't got experience of these myself, but I think you can quite easily do something like the XML approach but with better performance.
  • Amazon Simple DB: If I remember correctly this is essentially a name/value pair-based database which you can use. Could your application use that in the background instead to avoid you having to deal with all the plumbing? If you had to pay for SQL Server then Amazon DB might be cheaper and has the muscle to scale large, but without some of the things like relational queries.
Frans
You've put up some very interesting ideas here, and I cannot believe I didn't think about XML before, I will most definatly look into that as well.
kastermester
Good idea about the Key/Value pair system.I'm not so much with you on the XML though, especially because kastermester mentionned supporting a potentially pretty big number of rows. Using XML will require to load/save the entire file for every little update, which might end up being prohibitive.
Denis Troller
For large sets, XML would be prohibitive, but for the 1000/2000 records, probably fine; What I found is that C# is incredibly fast at dealing with XML. My test program will read a 20MB XML file (10,000 "records"), evaluate each node and output it to a CSV file in less than 0.4s on my laptop. I would estimate half the time is actually spent writing the CSV file. I think SQL has some built-in handling for querying inside XML "files" which may be faster.
Frans
You can store each record as a single value in a table with an ID of an object type and a secondary index table. You then get the column flex from XML and the indexing from the DBMS
Matthew Whited
+1  A: 

I did something for a system once that stored data in a very generic and therefore extremely similar way.

The system was write few and read many and the complex joins to get the data out made the system quite slow.

Now I know this goes against all database practice, but, I denormalised the data for each "Table" as you have it and physically created a table in the database called Table_1, Table_2.

I created and dropped the Table_1 and Table_2 based on triggers on the Tables table, I added and dropped columns to these tables with triggers on the TableColumns table, inserted and deleted rows with triggers on the TableRows table and updated values with triggers on the TableValues table.

Yes, as expected, our performance on writes was greatly reduced, but our performance on reads (which was extremely important at the time) was greatly improved as we could read directly from these "denormalised" tables.

So basically you could actually create and drop the tables in your example and add and remove the columns accordingly. You can create the columns of the actual data types you need and then you can store whatever you want in them.

Might make the queries that you write a lot easier as well as you're only looking at a single table for your user.

Robin Day
+3  A: 

The design you show in your original question is a variation of the Entity-Attribute-Value design.

The impatience some people are expressing is probably due to the fact that practically every database developer "discovers" this design and tries to use it for exactly what you're using it for -- a flexible system to support an extensible database without the inconvenience of using CREATE TABLE and ALTER TABLE.

But EAV has a lot of downside. Here's just one: how would you make any given column mandatory (the equivalent of a NOT NULL constraint)?

A relational database assumes that you know the tables up front, and you can define them. RDBMS isn't the right tool for fully dynamic relations or relations with fully variable sets of attributes. There are a number of other technologies for that, such as XML, RDF, or CouchDB.

See also my answer to "The Next-gen Databases."

Bill Karwin
A: 

You definitely want to simplify this - allow your users to create tables, perhaps in a TEMPDB. Simply prepending a schema suffix to the "CREATE TABLE"d tables might allow you to avoid having them bollox up REAL system tables. Makes it easy for you to filter them as well. Problem is most databases won't allow non-dba's to create tables, so it requires some up-front setup. You could even create a schema in your database specifically for this, and put constraints on size and table-size limitations if your database supports them.

Don't reinvent the wheel.

Do it this way, and you automatically get joins, unions, etc and interfaces with Crystal Reports and other tools that don't require grokkin' your specific schema.

Chris Kaminski
A: 

SharePoint does this and more on top of SQL Server. I would investigate how SharePoint uses SQL Server.

Matt Spradley