Help figuring out approaches to (near) real time multi dimensional data querying

views:

109

answers:

+1 Q:

Help figuring out approaches to (near) real time multi dimensional data querying

I have a system that involves numerous related tables. Think of a standard category/product/order/customer/orderitem scenario. Some tables are self referencing (like Categories). None of the tables are particularly large (around 100k rows with an estimated scale to around 1 million rows). There are a lot of dimensions to this data I need to consider, but must be queried in a near real time way. I also don't know which dimensions a particular user is interested in- it can be one or many criteria across numerous tables. Things can range from

Give me everything with a category of Jackets
Give me everything with a category of Jackets->Parkas having a red color purchased in the last month in New York
Give me everything which wasn't purchased in New York which costs over $100.

Currently we have a very long SP which uses a "cascading data" approach- we go table by table, filtering everything into a temp table using whatever criteria was specified for that table. For the next table, we join the current temp table to whatever table we're using and apply a new filter set into a new temp table. It works, but manageability and performance is slow. I need something better.

I need a new approach to this problem. It's clearly a need for OLAP, possibly using a star schema. Does this work in real time? Can it be configured to work in real time? Should I use indexed views to create a set of denormalized tables? Should I offload this outside of the database completely?

FYI We're using Sql Server.

It would probably be better to build a dynamic query in your code with all the joins you need, customized to each individual request. (properly parameterized for security of course).

You would use much of the same cascading logic you have now but you move it to to the code instead of the database. Then you only submit the exact query you need.

The performance would beat using all of the temp tables and you might get some caching benefit after a few queries were run.

Doug L. 2009-03-26 15:36:00

+2 A:

As you say, this is perfect for OLAP. With Sql Server 2005 and 2008 you can set up an almost real time solution. You should:

Create a denormalized star schema
Build an OLAP cube using that schema
Enable proactive caching to update the cube when the underlying data source changes.

It's not a trivial job, and you need the Enterprise version of Sql Server to use proactive caching. You also need some front-end tool (maybe excel would do) to consume the cube.

santiiiii 2009-03-26 16:17:42

Your dilemma sounds to me like "Is it better to achieve the same result by performing complex processing every time I need it, or should I do it once only for each new piece of data?".

David Aldridge 2009-03-27 01:49:48

ansaurus

tags:

views:

answers:

Help figuring out approaches to (near) real time multi dimensional data querying

related questions