views:

598

answers:

8

I know it's probably not the right way to structure a database but does the database perform faster if the data is put in one huge table instead of breaking it up logically in other tables?

I want to design and create the database properly using keys to create relational integrity across tables but when quering, is JOIN'ing slower than reading the required data from one table? I want to make the database queries as fast as possible.

+3  A: 

It depends on the dbms flavor and your actual data, of course. But generally more smaller (narrower) tables are faster than fewer larger (wider) tables.

Joel Coehoorn
+13  A: 

So many other facets affect the answer to your question. What is the size of the table? width? how many rows? What is usage pattern? Are there different usage patterns for different subsets of the columns in the table? (i.e., are two columns hit 1000 times per second, and the other 50 columns only hit once or twice a day? ) this scenario would be a prime candidate to split (partition) the table vertically (two columns in one table, the rest on another)

In general, normalize the schema to the maximum degree possible, then run performance testing with typical or predicted loads and usage patterns, and denormalize and partition to the point where the performance becomes acceptable, and no more...

Charles Bretana
+3  A: 

Access is a little slower when joins must be performed. How much slower depends greatly on the features offered by your particular DBMS, and how the physical database design exploits those features, and on the most frequent access patterns. There are a few access patterns where storing a lot of data in one row wastes time, because the entire row is retrieved, but only a little of the row is used. It depends.

When data is stored in a single table and the normalization rules are deviated from, update is typically slower. How important speed of of update is versus speed of query is dependant on the particular way you use this database.

In general, a lot of newbie database designers tend to put more weight on speed issues than those issues deserve. If your data model is inflexible and incomprehensible, but you gain a 10% speed improvement, you have probably done more harm than good.

Walter Mitty
A: 

What you're asking about is denormalization - it can speed up reads if done in the right way, and if you are able to ensure that you're not introducing anomalies into your database because of it.

Patrick Harrington
+1  A: 

Are you building a "read-only" database like a data warehouse? If so, storing data "pre-joined" may make sense. For everyday OLTP databases you need to take into account the performance and ease of inserts, updates and deletes as well. Also, what about queries that only want the data that would have been in one or two of the smaller tables? Now they have to grind through a big fat table full of stuff they don't care about.

It's worth remembering that joining tables is bread-and-butter stuff to a decent DBMS - they are very good at it.

Tony Andrews
+1  A: 

It is often true that querying a single table is faster than querying multiple joined tables. But a normalized design allows you to query the data in multiple ways, with adequate performance across many types of queries.

If you denormalize the tables, you may improve performance of one specific query, while sacrificing performance of other queries against that data. And of course you'll have to manage referential integrity and redundancy manually.

Bill Karwin
A: 

Remember also that there is a hard limit to the amount of data that can be stored in one record. (not knowing which database you have, I can't say what it is.) Too many columns and you will hit that limit. Also if you are having columns like phone1, phone2, phone3 then you need to normalize. If you would need to add a column if the number of items to be inserted about a record changes (if you statred needing 4 instead of 3 phone numbers for instance), you need to normalize instead.

HLGEM
A: 

What's true for optimising SELECTS is often not so great at optimising INSERTS, UPDATES and DELETES, and thus it is with this approach. Breaking out the data into properly normalised tables reduces the overhead of changing the data.

While it's tru that in a data warehouse or decision suport system we'd often store pre-joined data (as Tony says), it usually only happens in the context of a precomputed summary (eg. a materialized view) and not for data at the atomic level of granularity. The reason for this is that pushing repeated longer character strings (eg. "Supplier Name") into a dimension table reduces total required storage space and number of physical reads required to retrieve the data. The joins are usually equijoins, and these are performed at almost no cost for large data sets.

David Aldridge