ansaurus

Question

Database Design Question regarding duplicate information.

Answer 1

+1 A:

I'd consider:

a data warehouse/OLAP solution
(as you said) run your data mining queries against a separate precomputed table/dataset
indexed/materialised views which is almost the same as the previous point

There are some questions though:

do you expect real time data?
what is your write volume?
what DB engine?

gbn 2010-04-07 18:15:36

+1: The data could be realtime with the inherit latency delays of course. I suppose putting in Batch jobs and making the data update 1/hour or somesuch could be an option as well as Eric had mentioned. The write volume would be on the order of >1000/day. However I have access to data that goes back to 2006. I'm not sure yet since I have not created and imported the data, but I'm guessing there is over 1.5 million rows of information.

galford13x 2010-04-07 19:22:49

Answer 2

+1 A:

You may want to look into using materialized views, which will only be queried periodically.

ElectricDialect 2010-04-07 18:15:46

+1: Thanks, I haven't heard of materialized views. I will certainly look into them.

galford13x 2010-04-07 19:17:13

Answer 3

A:

"The cost for the update is almost nothing."

Except that all updates must now be serialized. Because no matter what, the ancient law of physics still remains that no two things can be in the same place at the same time.

Erwin Smout 2010-04-07 18:19:48

I think I see what your saying, but I'm not sure how that applies. If there are 1000 sales every hour, that would mean 1000 inserts into the SalesHistoryTable and 1000 triggers that cause result in 2 additions and a division + an row update. That seems to be much cheaper then running the query 1000 times right?

galford13x 2010-04-07 19:16:08

Perhaps I should change my statement to, "The cost for the update is almost nothing compared to the query"? That might be a little more relative.

galford13x 2010-04-07 19:18:24

Answer 4

+2 A:

This is normally something you would use a data warehouse for, but aside from that, using a trigger to update a second table is a perfectly viable option.

You could also have a second table that is populated by a batch job on a periodic basis (a more data-warehouse like option). You could also use a materialized view if your database supports them.

Eric Petroelje 2010-04-07 18:20:37

+1: Thanks I'll look into materialized views.

galford13x 2010-04-07 19:18:54

ansaurus

tags:

views:

answers:

Database Design Question regarding duplicate information.

related questions