ansaurus

Question

How does OLAP address dimensions that are numeric ranges?

Answer 1

+1 A:

Hi

I have found this link to be handy http://www.ssas-info.com/

Check out the webcasts section where in they walk you through different aspects starting from, what is BI, Warehousing TO designing a cube, dimensions, calculations, aggregations, KPIs, perspectives etc.

In OLAP aggregations help in reducing the query response time by having pre-calculated values which would be used by the query. However, the flip side is increase in storage space as more space would be needed to store aggregations apart from the base data.

SQL Server Analysis Services has Usage Based Optimization Wizard which helps in aggregation design by analyzing queries that have been submitted by clients (reporting clients like SQL Server Reporting Services, Excel or any other) and refining the aggregation design accordingly.

I hope this helps.

cheers

Andriyev 2009-07-29 19:45:37

Answer 2

+2 A:

Your velocity and value column examples are usually not the sort of columns that you would query in an OLAP way - they are the values you're trying to retrieve, and would presumably be in the result set, either as individual rows or aggregated.

However, I said usually. In our OLAP schema, we have a good example of a column you're thinking of: event_time (a date-time field, with granualarity to the second). In our data, it will be nearly unique - no two events will be happening during the same second, but since we have years of data in our table, that still means there are hundreds of millions of potentially discrete values, and when we run our OLAP queries, we almost always want to constrain based on time ranges.

The solution is to do what David Raznick has said - you create a "bucketed" version of the value. So, in our table, in addition to the event_time column, we have an event_time_bucketed column - which is merely the date of the event, with the time part being 00:00:00. This reduces the count of distinct values from hundreds of millions to a few thousand. Then, in all queries that constrain on date, we constrain on both the bucketed and the real column (since the bucketed column will not be accurate enough to give us the real value), e.g.:

   WHERE event_time BETWEEN '2009.02.03 18:32:41' AND '2009.03.01 18:32:41'
     AND event_time_bucketed BETWEEN '2009.02.03' AND '2009.03.01'

In these cases, the end user never sees the event_time_bucketed column - it's just there for query optimization.

For floating point values like you mention, the bucketing strategy may require a bit more thought, since you want to choose a method that will result in a relatively even distribution of the values and that preserves contiguity. For example, if you have a classic bell distribution (with tails that could be very long) you'd want to define the range where the bulk of the population lives (say, 1 or 2 standard deviations from mean), divide it into uniform buckets, and create two more buckets for "everything smaller" and "everything bigger".

Steve Broberg 2009-07-29 21:43:12

ansaurus

tags:

views:

answers:

How does OLAP address dimensions that are numeric ranges?

related questions