views:

77

answers:

3

Here is my situation: I have about 50 different fields of data that I need to store for 1 record (none are the same or repeating). About 15 of the fields are ones that I commonly need to use in queries, while the remainder are used on occasion (~40% of the queries).

Should I setup 2 tables, one containing the common fields and the other with the less common fields and join them in a 1:1 relationship? Or should I just put them all in one table?

Is there any advantage, speed or otherwise, do doing one or the other?

+4  A: 

Assuming the rows of your table are logically individual entities and all the fields are relevant to the entity, then they should all be in the same table.

You may have a table, however, where each row is logically multiple entities; refactoring so that this is an entity-per-table is usually called normalization, or such a schema: normalized.

A normalized schema is usually considered "correct" but there are performance considerations when joining between tables.

oxbow_lakes
+3  A: 

The main thing is to minimize NULLs. If you the other 35 fields in your table are typically NULL, think about going with the separate table approach. If they're usually non-NULL, one table is probably fine.

echo
+5  A: 

Two tables means 40% of your queries have joins and 60% don't have joins.

You don't save storage overall.

40% of your queries are slower.

60% of your queries are faster for two reasons. 1) no join. 2) fewer physical data blocks.

Is this performance different "important"? Does this make things "better"?

Until you more clearly define the time spent on each query -- and run benchmark tests to measure the difference -- it doesn't matter.

You have to actually build and measure the performance before you can decide which is "better".

S.Lott
Is it best that a query returns bad data fast? A join at worst costs subseconds - which doesn't even register to an end user. Seriously, who votes for this garbage?
OMG Ponies
Good point rexem, but I don't see your answer/suggestion?
Darryl Hein
The answer is this: "Until you more clearly define the time spent on each query -- and run benchmark tests to measure the difference -- it doesn't matter." In short: You Must Benchmark.
S.Lott