views:

211

answers:

1

I know a bit about database internals. I've actually implemented a small, simple relational database engine before, using ISAM structures on disk and BTree indexes and all that sort of thing. It was fun, and very educational. I know that I'm much more cognizant about carefully designing database schemas and writing queries now that I know a little bit more about how RDBMSs work under the hood.

But I don't know anything about multidimensional OLAP data models, and I've had a hard time finding any useful information on the internet.

How is the information stored on disk? What data structures comprise the cube? If a MOLAP model doesn't use tables, with columns and records, then... what? Especially in highly dimensional data, what kinds of data structures make the MOLAP model so efficient? Do MOLAP implementations use something analogous to RDBMS indexes?

Why are OLAP servers so much better at processing ad hoc queries? The same sorts of aggregations that might take hours to process in an ordinary relational database can be processed in milliseconds in an OLTP cube. What are the underlying mechanics of the model that make that possible?

+5  A: 

I've implemented a couple of systems that mimiced what OLAP cubes do, and here are a couple of things we did to get them to work.

1) The core data was held in a n dimensional array, all in memory, and all the keys were implemented via hierarchies of pointers to the underlying array. In this way we could have multiple different sets of keys for the same data. The data in the array was the equivalent of the fact table, often it would only have a couple of pieces of data, in one instance this was price and number sold.

2) The underlying array was often sparse, so once it was created we used to remove all the blank cells to save memory - lots of hard core pointer arithmetic but it worked.

3) As we had heirarchies of keys, we could write routines quite easily to drill down/up a hierarchy easily. For instance we would access year of data, by going through the months keys, which in turn mapped to days and/or weeks. At each level we would aggregate data as part of building the cube - made calculations much faster.

4) We didn't implement any kind of query language, but we did support drill down on all axis (upto 7 in our biggest cubes), and that was tied directly to the UI which the users liked.

5) We implemented core stuff in C++, but these days I reckon C# could be fast enough, but I'd worry about how to implement sparse arrays.

Hope that helps, sound interesting.

MrTelly