views:

444

answers:

3

Does anyone have resources that give a list of things to consider when designing a ROLAP cube, as opposed to MOLAP (I'm doing it in Pentaho, but I guess the principles are not dis-similar for other implementations). For example, I'm thinking of things like:

  1. should extra transformational work be done at the ETL stage to reduce computational work when querying the cube?

  2. should all my dimension tables be in the same database as my cube?

+1  A: 

Hi,

I'm a Pentaho implementor in Indonesia. First, of course you should try to aggregate all your measures group by surrogate keys involved.

And in Mondrian, you can "cache" some computations using additional aggregate tables. You can do it in Pentaho Aggregate Designer. But after that you will need additional work in your data warehouse / ETL stage.

Regards,

Feris

http://pentaho-en.phi-integration.com

Feris Thia
A: 

Thanks to Feris for the link and input, but in the end I went for this book:

http://www.amazon.com/Pentaho-Solutions-Business-Intelligence-Warehousing/dp/0470484322/ref=sr_1_1?ie=UTF8&s=books&qid=1258408259&sr=8-1

I had a good long look at the Mondrian site + docs, but the book seems more comprehensive.

davek
+1  A: 

First off - the designs are similar but they are driven by different performance & scalability strategies.

Secondly - the etl process is pretty much the same. Except - you'll typically see a lot more data in a rolap cube than a molap cube because of scalability features in relational databases. And you'll often see a rolap cube within a non-rolap database (warehouse, even transactional database) that does more than just support rolap.

Lastly, you'll typically generate aggregate table if you've got much data volume. That aggregation can be done a lot of different ways, but I'd say it is not typically driven by your ETL process unless you lack the ability to manage a separate asychronous process or have data volumes that make it impractical to run period summary jobs.

KenFar
+1 thanks for the info. Pre-aggregation in my ETL is proving to be a must...
davek
in that case - a bit more info:It's pretty easy to create a generic aggregator class for this purpose. And if you're running ETL every hour then you can use ETL to then generate aggregates at the hourly level. If you have the etl-generated hourly aggregates go into a staging table and merge the results into the main table it'll provide aggregates up to the current hour. Then you can also have a separate process if necessary to further roll that data up to the daily level.
KenFar