views:

66

answers:

3

I'm still learning the ropes of OLAP, cubes, and SSAS, but I'm hitting a performance barrier and I'm not sure I understand what is happening.

So I have a simple cube, which defines two simple dimensions (type and area), a third Time dimension hierarchy (goes Year->Quarter->Month->Day->Hour->10-Minute), and one measure (sum on a field called Count). The database tracks events: when they occur, what type are, where they occurred. The fact table is a precalculated summary of events for each 10 minute interval.

So I set up my cube and I use the browser to view all my attributes at once: total counts per area per type over time, with drill down from Year down to the 10 Minute Interval. Reports are similar in performance to the browse.

For the most part, it's snappy enough. But as I get deeper into the drill-tree, it takes longer to view each level. Finally at the minute level it seems to take 20 minutes or so before it displays the mere 6 records. But then I realized that I could view the other minute-level drilldowns with no waiting, so it seems like the cube is calculating the entire table at that point, which is why it takes so long.

I don't understand. I would expect that going to Quarters or Years would take longest, since it has to aggregate all the data up. Going to the lowest metric, filtered down heavily to around 180 cells (6 intervals, 10 types, 3 areas), seems like it should be fastest. Why is the cube processing the entire dataset instead of just the visible sub-set? Why is the highest level of aggregation so fast and the lowest level so slow?

Most importantly, is there anything I can do by configuration or design to improve it?

Some additional details that I just thought of which may matter: This is SSAS 2005, running on SQL Server 2005, using Visual Studio 2005 for BI design. The Cube is set (as by default) to full MOLAP, but is not partitioned. The fact table has 1,838,304 rows, so this isn't a crazy enterprise database, but it's no simple test db either. There's no partitioning and all the SQL stuff runs on one server, which I access remotely from my work station.

A: 

When you are looking at the minute level - are you talking about all events from 12:00 to 12:10 regardless of day?

I would think if you need that to go faster (because obviously it would be scanning everything), you will need to make the two parts of your "time" dimension orthogonal - make a date dimension and a time dimension.

If you are getting 1/1/1900 12:00 to 1/1/1900 12:10, I'm not sure what it could be then...

Cade Roux
That's a good theory, I checked it out to be sure, but that's not the case. Looking at the drilled-down cube for 2009-Q1-January-1-0 gives me a count for each 10 minute. I then used those dates in a SQL query against the original data and confirmed that the sum(count) is correct.What I'm looking at is the totals for each 10 minutes in that hour window, and it also shows the aggregate totals for: each hour of that day, each day of that month, etc. But those were all there before. But now I can drill-down "for free" to anywhere because the browser loaded the whole cube into memory.
CodexArcanum
A: 

Did you verify the aggregations of your cube to ensure they were correct? Any easy way to tell is that if you get the same amount of records no matter what drill-tree you go down.

Assuming this is not the case, what Cade suggests about making a Date dimension AND a Time dimension would be the most obvious approach but it is one bigger no-no's in SSAS. See this article for more information: http://www.sqlservercentral.com/articles/T-SQL/70167/

Hope this helps.

ajdams
That article from SQL Server Central was almost universally panned in the ensuing discussion because 1) the CTE dimension handling was wastefully inefficient and 2) the separate date and time dimensions are generally accepted as superior for almost all analysis usage
Cade Roux
I was wondering about that actually, since I was under the impression that analysts universally recommend a DateTimeTable for OLAP. Seemed about as fundamental as a hashmap to that field. Thanks for the dissenting opinion, nice to see all the thoughts on the matter.
CodexArcanum
@CodexArcanum I haven't seen a DateTime table. In Kimball, it's always a date dimension with a natural int key of form YYYYMMDD and a time dimension (in appropriate grain) with a natural int key of HHMMSS. The ONLY case where a DateTime would be useful is for arbitrary ranges with date and time - everything from 8:38 am on Monday to 10:25 on Friday, say. These are relatively uncommon and since it is basically continuous, it's possible that this is better to be a datetime fact column which is an additional filter in the query (which already specifies a date dimension criteria to get close).
Cade Roux
A: 

I would also check to ensure that you are running the latest sp for sql server 2005

The RTM version had some SSAS perf issues.

also check to ensure that you have correctly define attribute relationships on you time dimension and other dims as well.

Not having these relationships defined will the SSAS storage engine to scan more data then necessary

more info: http://ms-olap.blogspot.com/2008/10/attribute-relationship-example.html

as stated above, splitting out the date and time will significantly decrease the cardinality of your date dimension which should increase performance and allow a better analytic experience.

JasonHorner