views:

344

answers:

2

I am using SQL Server 2005 Express and Visual Studio 2008.

I have a database which has a table with 400 Columns. Things were (just about manageable) until I had to perform bi-directional sync between several databases.

I am wondering what arguments are for and against using 400 column database or 40 table database are?

The table in not normalised and comprises of mainly nvarchar(64) columns and some TEXT columns. (there are no datatypes as it was converted from text files).

There is one other table that links to this table and is a 1-1 relationship (i.e one entry relates to one entry in the 400 column table).

The table is a list files that contained parameters that are "plugged" into a application.

I look forward to your replies.

Thank you

+2  A: 

For having a wide table:

  • Quick to report on as it's presumably denormalized and so no joins are needed.
  • Easy to understand for end-consumers as they don't need to hold a data model in their heads.

Against having a wide table:

  • Probably need to have multiple composite indexes to get good query performance
  • More difficult to maintain data consistency i.e. need to update multiple rows when data changes if that data is on multiple rows
  • As you're having to update multiple rows and maintain multiple indexes, concurrent performance for updates may become an issue as locks escalate.
  • You might end up with records with loads of nulls in columns if the attribute isn't relevant to the entity on that row which can make handling results awkward.
  • If lazy developers do a SELECT * from the table you end up dragging loads of data across the network, so you generally have to maintain suitable subset views.

So it all really depends on what you're doing. If the main purpose of the table is OLAP reporting and updates are infrequent and affect few rows then perhaps a wide, denormalized table is the right thing to have. In an OLTP environment then it's probably not and you should prefer narrower tables. (I generally design in 3NF and then denormalize for query performance as I go along.)

You could always take the approach of normalizing and providing a wide-view for readers if that's what they want to see.

Without knowing more about the situation it's not really possible to say more about the pros and cons in your particular circumstance.

Edit:

Given what you've said in your comments, have you considered just having a long & skinny name=value pair table so you'd just have UserId, PropertyName, PropertyValue columns? You might want to add in some other meta-attributes into it too; timestamp, version, or whatever. SQL Server is quite efficient at handling these sorts of tables so don't discount a simple solution like this out-of-hand.

Trevor Tippins
I did consider this which led to the following question yesterday:http://stackoverflow.com/questions/1755562/sql-syntax-create-a-single-table-with-column-names-created-from-data-stored-overAnd things started to get complicated!
Belliez
I did try the long and skinny list approach but as the properties are all different data types it would be difficult to sort and make it usable.
Belliez
How about having a side table that records the names and types of properties (plus any other constraints like ranges, isNullable, etc.), then store all the actual properties as varchar/text and have the consumer interpret them as necessary (you might want to perhaps have both number and varchar/text columns in your name=value table but that's an implementation detail).
Trevor Tippins
+1  A: 

Based on your process description I would start with something like this. The model is simplified, does not capture history, etc -- but, it is a good starting point. Note: parameter = property.


- Setup is a collection of properties. One setup can have many properties, one property belongs to one setup only.
- Machine can have many setups, one setup belongs to one machine only.
- Property is of a specific type (temperature, run time, spindle speed), there can be many properties of a certain type.
- Measurement and trait are types of properties. Measurement is a numeric property, like speed. Trait is a descriptive property, like color or some text.

alt text

Damir Sudarevic
wow, thanks for your efforts. I will have this design in mind when I go about the daunting task of re-arranging my tables this week...Thanks again. it is very much appreciated.
Belliez
one question, if my system had various datatypes for each property would this be a new table i.e. Trait=descriptive, Measurement=float and a new Table called Counter stores (int) datatypes etc? I am a little uncertain on this.
Belliez
Few options here, option 1 - as you described, option 2 - use int(bigint) for datatype in `measurement` and add a field for format, convert to decimal on read. This one is equivalent to storing data in smaller units.
Damir Sudarevic
thanks for your reply. The datatypes I am using are BIT, FLOAT, NVARCHAR(255), NVARCHAR(MAX), INT.I like Option 2; keeping to single table. Do you suggest a table structure like this:Value BIGINT, Format NVARCHAR(20), Unit NVARCHAR(20) and if I need to store a decimal or float read back the Format nvarchar, determine its datatype and cast appropriately in my software (.Net) or in SQL Server? thanks again for your help and advice, its much appreciated.
Belliez