Where's the tradeoff between normalization (SQL View) and performance/fiability (SQL Table)

views:

353

answers:

+3 Q:

Where's the tradeoff between normalization (SQL View) and performance/fiability (SQL Table)

I've got quite a long business process which eventually results into financial operations.

What matters in the end is quite exclusively these final operations, although I've got to keep a log of everything which led to it.

Since all the information contained into the final operations is available in other tables (used during the business process), it makes sense to use a view, but the view logic would be quite complicated (there are dozens of tables implicated), and I'm concerned that :

even with appropriate indexes, a table will probably be way faster (my table will eventually contain millions if items, and should be fully searchable on almost all its columns)
the view logic would be complicated, so I'm afraid it may complicate things in a few years if I want to evolve my business logic.

Because of those two reasons, I'm a bit tempted to write the data in a table at the end of my business process instead of relying on a view, but duplicating the data doesn't smells right (and it also looks a bit like premature optimization, but since it's such a central point in my design, I'd like to address the issue ASAP)

Have you ever faced such a choice? What did you decide?

Edit : creating a table would clearly lead to duplication in my situation, ie. the data written in the table exists somewhere else in the database and could be retrieved using only joins without any calculations.

+3 A:

I think you answered your question writing it down Brann.

This problem can be seen in this way: from one hand you have "real time data". You have fresh data and from them it's nice to create view to show "real time data" too.

But as time goes on, there are more data and logic changes. So it's good to have written down summaries of data you had some time ago. It's very pragmatic - you do not duplicate data, because you recalculate them and save into new table summary of them.

So when you think of it in this way, it's obvious that in this example new table will be better. As you write:

Faster access
Can have more complicated logic
Have archive data unchanged when logic changes

So when you meet this (or part) of this criteria as you requirement than its not choice - you go into tables.

When I would use view is only when showing fresh data out of other fresh data. In very, very simple problems. And when it gets more complicated - you always switch to new table.

So do not be afraid to go into it. Having one summary table with faster access is very pretty solution and it's a sign of well formed database.

Take care of the design of this table - so when business logic changes - you do not need to change everything from one stone in this table. And then everything will be OK!

tomaszs 2009-03-23 09:51:19

I don't fully understand why you state that I "don't duplicate the data" ; to me, there's clearly some duplication occurring here, since I'll get the data in various table and inserting into my final tables instead of relying on joins in a view to get it.

Brann 2009-03-23 10:29:33

For me duplication of data occurs when you take data like there are and plain copy them to elsewhere. But when you do some calculation it's not duplication anymore. The same like you have product list and sum of product cost in order table - it's not duplication, it's calculation.

tomaszs 2009-03-23 11:15:57

@tomaszs : maybe I didn't express my situation clearly enough. What I'm doing is clearly duplication, ie copying data available from multiple tables using complex joins to one single table.

Brann 2009-03-23 12:10:16

Ah, ok. I thought that you'll do some more calculations. But still - even if it is duplication, if it is complicated - use of another table seems responsible for me.

tomaszs 2009-03-23 12:26:27

I always decide to have better normalization. In your case , though the view may be complicated , it's better to have that than the new table which has to be kept in sync with all the data changing operation.Plus the view would always be current while your end of business day table population would be only current for few hours a day.

Also , you have a bigger problem if the data in this table goes out of sync for whatever reasons.

Learning 2009-03-23 09:52:03

My table will always be current as well (I will populate it at the end of every business operation, not by the end of the day). That being said, you're right, my dilemma is normalization vs. pragmatism.

Brann 2009-03-23 10:02:10

+2 A:

I'm for the new table in this situation. The view has many disadvantages - performance clearly, complexity, and logic lock in. However, IMHO the over-arching reason is that as the underlying data changes, so the value in your view will change also. In most instances this is a good thing, however, with financial operations isn't it better to have a fixed record of what occured.

MrTelly 2009-03-23 10:21:24

As MrTelly alluded to, are you sure that your end result table really is a duplication of the view data? Or, is it actually a record of the final action taken as a result of the items in the view data.

For a clearer example, let's say that every time my gas tank gets to half-empty I buy $10 of gas. I write this down in a log. One day I buy my gas and write it in my log then later find out that my fuel gauge was broken and I really had 3/4 a tank of gas. Should I now erase the $10 purchase from my log because the underlying data (the level of gas in my tank) has changed? Ok, maybe that's not a clearer example, but hopefully it gets the point across. Recording the results is a different thing from recording the events that led up to the result. This is especially true in financial application. Therefore, I don't know that you're breaking normalization at all with storing the final outcome in its own table.

Tom H. 2009-03-23 13:39:49

@Tom :I've updated my question to clarify this point.

Brann 2009-03-23 13:44:10

An indexed view is the way. But there are quite a few limitations to this approach, but it's generally favorable although it has some overhead issues if implemented incorrectly. With this approach you won't need to keep track of the changes that take place in your base tables and the data would accumulate itself nicely in that indexed view of yours. In theory.

Reference:

Improving Performance with SQL Server 2005 Indexed Views
Oracle: Materialized View Concepts and Architecture

Denis Valeev 2010-09-02 20:58:27

ansaurus

tags:

views:

answers:

Where's the tradeoff between normalization (SQL View) and performance/fiability (SQL Table)

related questions