views:

1634

answers:

12

An application I am dealing has a single table with 170 columns.

In my mind, it would make more sense to break this table up into smaller, more logical groupings, though you could have an issue where numerous tables have to be joined if you needed data from columns in the various tables.

I am wondering what the pros and cons of these approaches are? What would be the benefit of a single table with this many columns?

+2  A: 

The only advantage of that (170 columns) is poor design and the disavantage is poor design. Hehe.

You can actually build better queries with more tables and build more knowledge with the same information. You are absolutely correct in your second paragraph.

I would imagine a lot of those fields have to be null.

Daniel A. White
+5  A: 

Errrr yeah, of course you have to join tables together when you have more than one. That's what relational databases are for. If you're deciding what to do with this application--without trying to be rude--I would suggest that you're not qualified for this particular job as, from your question, I presume that your knowledge of databases, data normalization, entity-relationship modelling and relational algebra is little to none.

This isn't the sort of thing that anyone here can solve as a simple answer. This question is tantamount to "How do I create a data model?"

cletus
+3  A: 

If there's a 170 column table, odds are it's so denormalized it's not even funny. Without more information I can't be sure, but every time I've seen a table that big, I see things like 'address1', 'address2', 'address3', etc. While (in theory) this can improve performance, it typically leads to issues at some point - like when you have someone with a 4th foo, but you've only got foo1 through foo3. Properly normalized databases tend to be much more resistant to change when the requirements change.

Harper Shelby
A: 

I guess the fewer joins you do the better performance you'll get, but do you really need that extra tiny bit of performance for the sake of bad design and lack of flexibility. I'd also guess it is not an efficient way to store your data. How many of the columns are empty on a typical row? Using multiple tables could mean less empty space.

Colin Desmond
+2  A: 

That table may have been designed by a non-programmer, most likely a business analyst or an eager project manager. You can optimize by breaking this table up into more meaningful tables based on the business usage. If you are worrying about performance, find out where your database is comfortable with joins, some are optimized for 6 or so. You can break it up accordingly.

Do you just have a datagrid that feeds of this one big master table with lots of filters? I am curious to know how this table is used.

CodeToGlory
+1  A: 

The advantage of the current design is...you can be lazy. You don't have to think about application architecture. You need information...add another column.

If you seperate your data in to normalized tables, you're going to gain several things:

  1. Understandable Information Architecture

  2. Decreased database size (assumably, not all rows have all the columns filled. You've got space being taken by all those null columns...with normalized data, you wouldn't have them).

  3. Increased performance. (you'll have to join tables, yes...but you'll be able to create meaningful indexes on those tables...and when you query, you're not going to be querying all 170 columns anymore).

Justin Niessner
+6  A: 

Just to go back to the text book a bit...is every non-key attribute non-transitively dependant on the whole primary key? Its hard to imagine what scenario would lead to 170 attributes in a fully normalized table.

Maybe you can tell us what entity the table represents so we would understand better.

Vincent Ramdhanie
+1  A: 

I would go with having multiple groupings, it will be faster to find the data you want to view and edit. If there is a query you need to do a lot that has joins in it you can make a view for it.

Scott
+1  A: 

You main issue is the row length.

It may or may not be a problem.

In Oracle, trailing NULL columns consume no space.

If your first 2 columns are non-NULL and other 168 are NULL's, then the row will occupy only the space required for the first 2 columns.

But if the 1-st and the 170-th columns are both non-NULL, then all space required for 170 columns needs to be reserved.

When performing FULL TABLE SCAN, you will have to read more data blocks. This is better than a JOIN if you need all columns, but worse in you need only some of them.

If you rarely need some columns, you better create a separate CLUSTERED table for them and use the JOIN's.

Note that by a CLUSTERED table Oracle means not what others mean.

In Oracle, a clustered table means creating two or more tables that share a common key they will be joined on.

When placing the table rows into the tablespace, Oracle will place rows with same values of the key into same datablocks. This vastly increases the speed of JOIN's.

This looks like this:

CREATE CLUSTER data_of_170_columns (id NUMBER(10))
/
CREATE TABLE main_data_of_3_columns CLUSTER data_of_170_columns(id) (id NUMBER(10) NOT NULL PRIMARY KEY, col2 VARCHAR2(20) NOT NULL, col3 VARCHAR2(50) NOT NULL)
/
CREATE TABLE secondary_data_of_167_columns CLUSTER data_of_170_columns(id) (id NUMBER(10) NOT NULL PRIMARY KEY, col4 VARCHAR2(100) NULL, …)
/

From both tables, the rows with the same id will reside in same datablock or close to it, and this query:

SELECT  *
FROM    main_data_of_3_columns d3, secondary_data_of_167_columns d167
WHERE   d3.id = d167.id

will be almost as efficient as selecting from a single large table, while this one:

SELECT  *
FROM    main_data_of_3_columns d3

will be faster than selecting from a single large table.

Quassnoi
+2  A: 

I'd be dammed if one of those records ALWAYS has the 170 fields with meaningfull info... In my mind I'm seeing groups of info, and with some luck, some of those groups aren't even needed at all for some records. So, in highly technical relational-DB terms, you need to chop it up, and key the appendice table records to the master table ones. You'll improve architecture, save space, and save yours and future developers hair.

jpinto3912
A: 

Some other problems arise when you have that many columns. One is that all databases I know of have a limit on the amount of information that can be stored in one record. They will often let you create tables that exceed that limit if the fields are not required (well at least I know SQL Server will do this). This leads to issues when someone suddenly needs to exceed that limit for one record and you have to urgently redesign becasue you can;t get around a database limit.

Also depending on the database and the situation (this needs testing for each individual case), the one large table may be slower to access than smaller related tables due to how the data needs to be stored (i know this can be true in SDQL Server , not sure about your databases).

Further if you break up into groups that make sense, you may find that most of the time you don't need to access all the fields. Depending on what the columns are, you may also find that it is easier to maintain in related tables as well. If you have columns like cell Phone, home phone, work phone what happens when you need to add the second work phone (he works out of two offices). In the current design it means you have to add a column. In a relational design, you just add a record (if your phone type definitions must be unique, you may need to add another there too, but it is still all adding record to table snot changing structure). Changing structure may mean that code will break. Maybe even a lot of code.

If the fields are things that you might need to combine together in one field, you may need to self join to this huge table multiple times to get the answers you need.

HLGEM
A: 

The pros and cons depend on how the schema is design and if it made sense to denormalize the table into the 170 column entity you posses. If denormalizing the table prove to be a benifitial thing for the majority of the query to this schema then it was a wise choice. If the net benifit to the denormalization is moot then a redesign it needed. if you sub-divide the large table into smaller table and replace the large table with a view does it impact performance? That question needs to be answered before you decide to break the large table up. There is a plus and minus to normalize database schema. When the cost is too great in term of query performance then some denormalization is needed. There is also a point where denormalizing something into infinitum defeats having a database and you are left with a giant spreadsheet. The best solution is a health medium of the 2, where you have both normalization and denormalization in a schema.

MichaelN