views:

2711

answers:

16

A former coworker insisted that a database with more tables with fewer columns each is better then one with fewer tables with more columns each. For example rather than a customer table with name, address, city, state, zip, etc. columns, you would have a name table, an address table, a city table, etc.

He argued this design was more efficient and flexible. Perhaps it is more flexible, but I am not qualified to comment on its efficiency. Even if it is more efficient, I think those gains may be outweighed by the added complexity.

So, are there any significant benefits to more tables with fewer columns over fewer tables with more columns?

+2  A: 

It depends on your database flavor. MS SQL Server, for example, tends to prefer more narrower tables. That's also the more 'normalized' approach. Other engines might prefer it the other way around. Mainframes tend to fall in that category.

Joel Coehoorn
+7  A: 

This is like arguing a tractor is better than a digger.

It's not really about the tables, it's about the purposes they will serve.

Ed Guiness
+1  A: 

i would consider normalizing as the first step, so cities, counties, states, countries would be better as separate columns... the power of SQL language, together with today DBMS-es allows you to group your data later if you need to view it in some other, non-normalized view.

When the system is being developed, you might consider 'unnormalizing' some part if you see that as an improvement.

zappan
My 2 cents: I have to disagree; doing that kind of optimization during design is a classic case of premature optimization. Wait until you see that performance is a problem *before* you sacrifice a good design.
JosephStyons
+1  A: 

I think balance is in order in this case. If it makes sense to put a column in a table, then put it in the table, if it doesn't, then don't. Your coworkers approach would definately help to normalize the database, but that might not be very useful if you have to join 50 tables together to get the information you need.

I guess what my answer would be is, use your best judgement.

Craig H
+5  A: 

It doesn't sound so much like a question about tables/columns, but about normalization. In some situations have a high degree of normalization ("more tables" in this case) is good, and clean, but it typically takes a high number of JOINs to get relevant results. And with a large enough dataset, this can bog down performance.

Jeff wrote a little about it regarding the design of StackOverflow. See also the post Jeff links to by Dare Obasanjo.

swilliams
In my experience, this is patently false. I've worked with queries that join dozens of tables, *each* containing 1million + rows, and as long as you're joining on primary keys, the results come back very quickly.
JosephStyons
What's 'quickly'? If you're running a website trying to serve thousands of pageviews a second 'fast enough' as an entirely different meaning than a single user database where all you're concerned about is response time for the user.
Chris Upchurch
"as long as you're joining on primary keys, the results come back very quickly"Well, yeah. But, in my experience with more tables, the more likely it is for joins to happen on non-pk's, non-indexed columns, etc.
swilliams
Normalisation and the subsequent joining of tables usually helps performance since by definition you can be more selective and avoid table scans - the slowest method of selecting.
Ed Guiness
Poor design is usually the biggest factor in poor performance, not normalisation.
Ed Guiness
Aye, I had a real-time data processing app and the joins killed the queries. I de-normalised the data and all was good, it gets integrated back into the normalised database at the end of the day when the number of requests chill out.
Quibblesome
A: 

There are advantages to having tables with fewer columns, but you also need to look at your scenario above and answer these questions:

Will the customer be allowed to have more than 1 address? If not, then a separate table for address is not necessary. If so, then a separate table becomes helpful because you can easily add more addresses as needed down the road, where it becomes more difficult to add more columns to the table.

Dillie-O
+1  A: 

There are many sides to this, but from an application efficiency perspective mote tables can be more efficient at times. If you have a few tables with a bunch of columns every time the db as to do an operation it has a chance of making a lock, more data is made unavailable for the duration of the lock. If locks get escalated to page and tables (well hopefully not tables :) ) you can see how this can slow down the system.

Kevin
+1  A: 

There are huge benefits to queries using as few columns as possible. But the table itself can have a large number. Jeff says something on this as well.

Basically, make sure that you don't ask for more than you need when doing a query - performance of queries is directly related to the number of columns you ask for.

ColinYounger
+2  A: 

The multi-table database is a lot more flexible if any of these one to one relationships may become one to many or many to many in the future. For example, if you need to store multiple addresses for some customers, it's a lot easier if you have a customer table and an address table. I can't really see a situation where you might need to duplicate some parts of an address but not others, so separate address, city, state, and zip tables may be a bit over the top.

Chris Upchurch
A: 

I think you have to look at the kind of data you're storing before you make that decision. Having an address table is great but only if the likelihood of multiple people sharing the same address is high. If every person had different addresses, keeping that data in a different table just introduces unnecessary joins.

I don't see the benefit of having a city table unless cities in of themselves are entities you care about in your application. Or if you want to limit the number of cities available to your users.

Bottom line is decisions like this have to take the application itself into considering before you start shooting for efficiency. IMO.

Tundey
+4  A: 

I would argue in favor of more tables, but only up to a certain point. Using your example, if you separated your user's information into two tables, say USERS and ADDRESS, this gives you the flexibility to have multiple addresses per user. One obvious application of this is a user who has separate billing and shipping addresses.

The argument in favor of having a separate CITY table would be that you only have to store each city's name once, then refer to it when you need it. That does reduce duplication, but in this example I think it's overkill. It may be more space efficient, but you'll pay the price in joins when you select data from your database.

Bill the Lizard
+1  A: 

Like everything else: it depends.

There is no hard and fast rule regarding column count vs table count.

If your customers need to have multiple addresses, then a separate table for that makes sense. If you have a really good reason to normalize the City column into its own table, then that can go, too, but I haven't seen that before because it's a free form field (usually).

A table heavy, normalized design is efficient in terms of space and looks "textbook-good" but can get extremely complex. It looks nice until you have to do 12 joins to get a customer's name and address. These designs are not automatically fantastic in terms of performance that matters most: queries.

Avoid complexity if possible. For example, if a customer can have only two addresses (not arbitrarily many), then it might make sense to just keep them all in a single table (CustomerID, Name, ShipToAddress, BillingAddress, ShipToCity, BillingCity, etc.).

Here's Jeff's post on the topic.

Michael Haren
+2  A: 

A fully normalized design (i.e, "More Tables") is more flexible, easier to maintain, and avoids duplication of data, which means your data integrity is going to be a lot easier to enforce.

Those are powerful reasons to normalize. I would choose to normalize first, and then only denormalize specific tables after you saw that performance was becoming an issue.

My experience is that in the real world, you won't reach the point where denormalization is necessary, even with very large data sets.

JosephStyons
+2  A: 

Each table should only include columns that pertain to the entity that's uniquely identified by the primary key. If all the columns in the database are all attributes of the same entity, then you'd only need one table with all the columns.

If any of the columns may be null, though, you would need to put each nullable column into its own table with a foreign key to the main table in order to normalize it. This is a common scenario, so for a cleaner design, you're likley to be adding more tables than columns to existing tables. Also, by adding these optional attributes to their own table, they would no longer need to allow nulls and you avoid a slew of NULL-related issues.

Mark Cidade
+15  A: 

I have a few fairly simple rules of thumb I follow when designing databases, which I think can be used to help make decisions like this....

  1. Favor normalization. Denormalization is a form of optimization, with all the requisite tradeoffs, and as such it should be approached with a YAGNI attitude.
  2. Make sure that client code referencing the database is decoupled enough from the schema that reworking it doesn't necessitate a major redesign of the client(s).
  3. Don't be afraid to denormalize when it provides a clear benefit to performance or query complexity.
  4. Use views or downstream tables to implement denormalization rather than denormalizing the core of the schema, when data volume and usage scenarios allow for it.

The usual result of these rules is that the initial design will favor tables over columns, with a focus on eliminating redundancy. As the project progresses and denormalization points are identified, the overall structure will evolve toward a balance that compromises with limited redundancy and column proliferation in exchange for other valuable benefits.

Chris Ammerman
What exactly is a 'downstream table'?
lhnz
I mean "downstream" in the context of a "data flow". Which essentially means you have a process which uses the normalized tables as a source, and transforms the data somehow, and then deposits the result somewhere else.
Chris Ammerman
+1  A: 

Hmm.

I think its a wash and depends on your particular design model. Definitely factor out entities that have more than a few fields out into their own table, or entities whose makeup will likely change as your application's requirements changes (for instance - I'd factor out address anyways, since it has so many fields, but I'd especially do it if you thought there was any chance you'd need to handle foreign country addresses, which can be of a different form. The same with phone numbers).

That said, when you're got it working, keep an eye out on performance. If you've spun an entity out that requires you to do large, expensive joins, maybe it becomes a better design decision to spin that table back into the original.

John Christensen