views:

116

answers:

2

To preface this, I'm not familiar with OLAP at all, so if the terminology is off, feel free to offer corrections.

I'm reading about OLAP and it seems to be all about trading space for speed, wherein you precalculate (or calculate on demand) and store aggregations about your data, keyed off by certain dimensions. I understand how this works for dimensions that have a discrete set of values, like { Male, Female } or { Jan, Feb, ... Dec } or { @US_STATES }. But what about dimensions that have completely arbitrary values like (0, 1.25, 3.14156, 70000.23, ...)?

Does the use of OLAP preclude the use of aggregations in queries that hit the fact tables, or is it merely used to bypass things that can be precalculated? Like, arbitrary aggregations on arbitrary values still need to be done on the fly?

Any other help regarding learning more about OLAP would be much appreciated. At first glance, both Google and SO seem to be a little dry (compared to other, more popular topics).

Edit: Was asked for a dimension on which there are arbitrary values.

  • VELOCITY of experiments: 1.256 m/s, -2.234 m/s, 33.78 m/s
  • VALUE of transactions: $120.56, $22.47, $9.47
+1  A: 

Hi

I have found this link to be handy http://www.ssas-info.com/

Check out the webcasts section where in they walk you through different aspects starting from, what is BI, Warehousing TO designing a cube, dimensions, calculations, aggregations, KPIs, perspectives etc.

In OLAP aggregations help in reducing the query response time by having pre-calculated values which would be used by the query. However, the flip side is increase in storage space as more space would be needed to store aggregations apart from the base data.

SQL Server Analysis Services has Usage Based Optimization Wizard which helps in aggregation design by analyzing queries that have been submitted by clients (reporting clients like SQL Server Reporting Services, Excel or any other) and refining the aggregation design accordingly.

I hope this helps.

cheers

Andriyev
+2  A: 

Your velocity and value column examples are usually not the sort of columns that you would query in an OLAP way - they are the values you're trying to retrieve, and would presumably be in the result set, either as individual rows or aggregated.

However, I said usually. In our OLAP schema, we have a good example of a column you're thinking of: event_time (a date-time field, with granualarity to the second). In our data, it will be nearly unique - no two events will be happening during the same second, but since we have years of data in our table, that still means there are hundreds of millions of potentially discrete values, and when we run our OLAP queries, we almost always want to constrain based on time ranges.

The solution is to do what David Raznick has said - you create a "bucketed" version of the value. So, in our table, in addition to the event_time column, we have an event_time_bucketed column - which is merely the date of the event, with the time part being 00:00:00. This reduces the count of distinct values from hundreds of millions to a few thousand. Then, in all queries that constrain on date, we constrain on both the bucketed and the real column (since the bucketed column will not be accurate enough to give us the real value), e.g.:

   WHERE event_time BETWEEN '2009.02.03 18:32:41' AND '2009.03.01 18:32:41'
     AND event_time_bucketed BETWEEN '2009.02.03' AND '2009.03.01'

In these cases, the end user never sees the event_time_bucketed column - it's just there for query optimization.

For floating point values like you mention, the bucketing strategy may require a bit more thought, since you want to choose a method that will result in a relatively even distribution of the values and that preserves contiguity. For example, if you have a classic bell distribution (with tails that could be very long) you'd want to define the range where the bulk of the population lives (say, 1 or 2 standard deviations from mean), divide it into uniform buckets, and create two more buckets for "everything smaller" and "everything bigger".

Steve Broberg