views:

37

answers:

2

I am new to OLAP.

I understand the table structure and ETL process.

I don't understand when data is supposed to be deleted from the fact table. Say I'm creating a reporting application for events. each event has the duration it took to complete, the exit code and total bytes read. There are several dimensions, e.g. time and location.

Say I have 1 million new records ready for my fact table daily, A total of 1 GB. If my ETL process only adds data to my fact table it grows indefinitely. When should I delete data from my fact table? Should I divide the data into several fact tables (e.g. monthly tables)?

Is there any rule-of-thumb?

Thanks

A: 

History should never be deleted.

Period.

However, some people get nervous that 1Gb per day may turn into 1Tb every 3 years. This rarely actually matters, but some people still like to worry about the price of storage.

Your time spent designing a data purge can be more expensive than the storage you're attempting to save.

[I found 3 DBA's and 2 programmers debating ways to save a few hundred MB's. I said that I would drive them all down to Best Buy and purchase a 500Mb disk drive with the spare change on the floor of my car. The price of 5 consultants merely walking into the room to discuss it had already exceeded the price of the storage they were attempting to "save".]

The question of "can we summarize?" is entirely up to the users. Sometimes you can't usefully summarize, so you can't easily delete anything either.

Some folks will say that the business cycle is 20 years or something like that, and want details for the first 20 years (on 7Tb) and then summaries for time periods before that.

S.Lott
"History should never be deleted"http://www.amazon.com/Delete-Virtue-Forgetting-Digital-Age/dp/0691138613Delete looks at the surprising phenomenon of perfect remembering in the digital age, and reveals why we must reintroduce our capacity to forget. Digital technology empowers us as never before, yet it has unforeseen consequences as well. Potentially humiliating content on Facebook is enshrined in cyberspace for future employers to see. Google remembers everything we've searched for and when. The digital realm remembers what is sometimes better forgotten, and this has profound implicati
adolf garlic
Damn. Comment formatting sucks on this site.
adolf garlic
I don't think this OLAP is comparable with drunken college pictures on Facebook.
S.Lott
+1  A: 

Never. You can use partitioning to deal with old records and move partitions to different drives. If you partition fact tables by date (month quarter, year), then for all the practical purposes you mostly access few latest partitions most of the time.

Keep in mind that DW belongs to business users and not to IT. Do not limit (not try to assume) questions a business analyst may want to ask -- query the DW.

Damir Sudarevic