ansaurus

Question

What data structures and algorithms are applied within data warehouse cubes?

Answer 1

A:

Generally, a data warehouse uses a relational database, but the tables aren't normalized like an operational relational database.

A data warehouse is subject oriented. Data warehouse subject tables usually have the following characteristics:

Many indexes.
No joins, except to look up tables.
Duplicated data, the subject table is highly denormalized.
Contains derived and aggregated information.

The database tables in a data warehouse are arranged in a star schema. A star schema is basically one subject table with an array of look up tables. The keys of the look up tables are foreign keys in the subject table. If you draw an entity relationship diagram of the subject table, the look up tables would surround the subject table like star points.

As far as the queries, that depends on the subject tables and the number of rows. Generally, expect queries to take a long time (many minutes, sometimes hours).

Here's a general article to get you started: Developing a Data Warehouse Architecture

Here's a high level overview of the design of a star schema: Designing the Star Schema Database

Gilbert Le Blanc 2010-05-23 11:55:10

Thanks Gilbert - can you go into more detail on how the duplicated, derived and/or aggregated data is stored and accessed? I.e. hash tables keyed on the dimensional keys? perhaps containing a collection of aggregates, etc.?

Jeff Meatball Yang 2010-05-23 15:29:11

Hi Jeff. Take a look at figure 10 of the star schema link. WalMart, to take a retailer, would have trillions of SalesFact rows. The queries against SalesFact would never end if it wasn't for the summary tables TimeDimension and StoreDimension. The summary tables answer most of the common queries.The actual data storage doesn't matter as much as designing the data warehouse so that most of the queries are answered by the summary tables.Now building and maintaining the summary tables, that takes a batch process running periodically.

Gilbert Le Blanc 2010-05-23 23:13:37

@Gilbert Le Blanc Summary facts are not stored in dimensions. Those are facts that are just aggregated at a different grain (and perhaps with different dimensions), but they are still stored in fact tables.

Cade Roux 2010-05-24 05:31:33

@Jeff Ok. You tell me what a dimension is.

Gilbert Le Blanc 2010-05-24 07:56:06

@Gilbert Le Blanc I have given an example of generating aggregate summary information in my answer.

Cade Roux 2010-05-24 14:52:42

Answer 2

+1 A:

In a star-schema database, facts are usually acquired and stored at the finest grain.

So let's take the SalesFact example from Figure 10 in http://www.ciobriefings.com/Publications/WhitePapers/DesigningtheStarSchemaDatabase/tabid/101/Default.aspx

Right now, the grain is Product, Time (at a day granularity), Store.

Let's say you want that rolled up by month, pre-aggregated (this particular example is very unlikely to need pre-aggregation, but if the sales were detailed by customer, by minute, pre-aggregation might be necessary).

Then you would have a SalesFactMonthly (or add a grain discrimination to the existing fact table since the dimensions are the same - sometimes in aggregation, you may actually lose dimensions just like you can lose grain, for instance if you only wanted by store and not by product).

ProductID
TimeID (only linking to DayOfMonth = 1)
StoredID
SalesDollars

And you would get this by doing:

INSERT INTO SalesFactMonthly (ProductID, TimeID, StoreID, SalesDollars)
SELECT sf.ProductID
    ,(SELECT TimeID FROM TimeDimension WHERE Year = td.Year AND Month = td.Month AND DayOfMonth = 1) -- One way to find the single month dimension row
    ,sf.StoreID
    ,SUM(sf.SalesDollars)
FROM SalesFact AS sf
INNER JOIN TimeDimension AS td
    ON td.TimeID = sf.TimeID
GROUP BY td.Year, td.Month

What happens in cubes is you basically have fine-grain stars and pre-aggregates together - but every implementation is proprietary - sometimes you might not even have the finest-grain data in the cube, so it can't be reported on. But every way you might want to slice the data needs to be stored at that grain, otherwise you can't produce analysis that way.

Cade Roux 2010-05-24 14:51:45

Answer 3

+2 A:

There is a fantastic book that describes many internal details of SSAS implementation, including storage and query mechanism details:

http://www.amazon.com/Microsoft-Server-Analysis-Services-Unleashed/dp/0672330016

Michael 2010-06-09 10:35:53

ansaurus

tags:

views:

answers:

What data structures and algorithms are applied within data warehouse cubes?

related questions