views:

209

answers:

4

We have to create few tables for the sole purpose of reporting in Oracle.

Option 1

Receivable Table

  • RefNo
  • Date
  • TrnType eg:Tax, Fee, Premium
  • Amount

Option 2

Receivable Table

  • RefNo
  • Date
  • Tax
  • Fee
  • Premium

Note: For a given RefNo all types Tax, Fee and Premium or a sub set of them can exist.

What would be the optimal structure(Tables will have over 100k records)

+1  A: 

If your transaction types are explicitly defined and unlikely to be sparsely populated (ie, most records will have values for all 3) then the latter is more likely to be your best option. It also represents the data in a format that's closer to how you'd think about it in reality.

Even if the values can be sparse, "gut instict" makes me still lean toward the column-based approach rather than the row-based.

Adam Robinson
+8  A: 

Neither of those is actually the best (in terms of how DBAs think). The best would be (assuming RefNo is unique and therefore the primary key):

Receivables:
    RefNo
    Date
ReceivableDollarVals:
    RefNo
    Type
    Amount

If RefNo/Date is the primary key, add date to the second table as well.

This allows you to minimize storage space for those rows that don't have all three types (although savings is minimal). Then you use WHERE clauses combining the two tables (or JOINs) to do your queries.

It also allows you to add other types at will without restructuring the database.

However, you need to keep in mind that third normal form is an ideal. It's quite acceptable to violate the rules to gain performance as long as you understand the implications.

100,000 records is actually quite small so, unless you think you're going to be adding more types in the near future, I'd go for your option 2 and use zeros for those values that don't exist. NULLs would most likely make your queries a little more complicated.

paxdiablo
+1. It also doesn't have the potential to give different dates for the same RefNo.
mghie
Storing the data in discreet fields rather than individual rows is not necessarily a violation of 3NF. Violating 3NF would mean that the values are not directly related to the primary key (refno). I'd be very surprised if this was the case.
Adam Robinson
Good point, @Adam, since 3NF is just column depends on key, whole key and nothing but key. We still design our tables to avoid the ways given in the question but that's because we tend to have LOTS of 'columns' which may be NULL. Maybe I should call my approach 3.5NF :-)
paxdiablo
clarification: As i'hv mensioned this table is included just for reporting so denormalization is acceptable (since lots of joins will be required to get summaries without this type of table).
Eranga
@SecretWiz, your expected table size (both rows [<10] and columns [100K]) will not cause any serious degradation in storage and the speed you will gain from a non-normalized table should be more than noticeable. I'd still go for option 2 in your particular case.
paxdiablo
@PAX: The advisability of that depends on a number of factors (how sparse it really is, etc.). In this scenario--while I'm obviously not entirely familiar--it doesn't really seem like you're going to have a majority of rows with only one or two of the values. This matters more with many fields.
Adam Robinson
+1  A: 

The real advantage of the fully normalized version in the first answer comes when the requirements change -- when someone changes the specs so you have to add types beyond the 3 you've identified.

Like discount, refund, whatever. Those changes do happen.

The normalized structure should let you do that more easily, without needing to change either table structure or most of the programs that use the data.

But the normalized structure does require more investment in the beginning -- every new transaction involves inserting into 2 tables, you need to have a check constraint to control the types, etc.

Generally, you'll do better in the long term with the normalized structure. However, with a simple case lik this, you can sometimes get away without normalizing and not have to pay the consequences (at least, nobody has to pay until you're long gone and it's somebody else's problem).

Professionally, reasonable levels of normalization should be your standard strategy and you should require yourself to have very good reasons for denormalizing.

Jim Hudson
A: 

If Table will have over 100k records. Then Option 2 is the good choice. Bcz option 1 slow down the accessing of data.