views:

5803

answers:

6
+19  Q: 

Star-Schema Design

Is a Star-Schema design essential to a data warehouse? Or can you do data warehousing with another design pattern?

+4  A: 

The thing about star schemas is they are a natural model for the kinds of things most people want to do with a data warehouse. For instance it is easy to produce reports with different levels of granularity (month or day or year for example). It is also efficient to insert typical business data into a star schema, again a common and important feature of a data warehouse.

You certainly can use any kind of database you want but unless you know your business domain very well it is likely that your reports will not run as efficiently as they could if you had used a star schema.

Mike
It is basically object-oriented design in SQL ;)
Hamish Grubijan
+4  A: 

Star schemas are used to enable high speed access to large volumes of data. The high performance is enabled by reducing the amount of joins needed to satsify any query that may be made against the subject area. This is done by allowing data redundancy in dimension tables.

You have to remember that the star schema is a pattern for the top layer for the warehouse. All models also involve staging schemas at the bottom of the warehouse stack, and some also include a persistant transformed merged staging area where all source systems are merged into a 3NF modelled schema. The various subject areas sit above this.

Alternatives to star schemas at the top level include a variation, which is a snowflake schema. A new method that may bear out some investigation as well is Data Vault Modelling proposed by Dan Linstedt.

Mike McAllister
+3  A: 

Star schemas are a natural fit for the last layer of a data warehouse. How you get there is another question. As far as I know, there are two big camps, those of Bill Inmon and Ralph Kimball. You might want to look at the theories of these two guys if/when you decide to go with a star.

Also, some reporting tools really like the star schema setup. If you are locked into a specific reporting tool, that might drive what the reporting mart looks like in your warehouse.

Josh McAdams
+1 - Kimball vs. Inmon is one of the great religious wars. IMHO the presence of a religious divide of this sort is a clear indicator that neither argument is compelling. I've built systems with and without ODS layers - and had good reasons for the architectural decisions.
ConcernedOfTunbridgeWells
+30  A: 

Using star schemas for a data warehouse system gets you several benefits and in most cases it is appropriate to use them for the top layer. You may also have an operational data store (ODS) - a normalised structure that holds 'current state' and facilitates operations such as data conformation. However there are reasonable situations where this is not desirable. I've had occasion to build systems with and without ODS layers, and had specific reasons for the choice of architecture in each case.

Without going into the subtlties of data warehouse architecture or starting a Kimball vs. Inmon flame war the main benefits of a star schema are:

  • Most database management systems have facilities in the query optimiser to do 'Star Transformations' that use Bitmap Index structures or Index Intersection for fast predicate resolution. This means that selection from a star schema can be done without hitting the fact table (which is usually much bigger than the indexes) until the selection is resolved.

  • Partitioning a star schema is relatively straightforward as only the fact table needs to be partitioned (unless you have some biblically large dimensions). Partition elimination means that the query optimiser can ignore patitions that could not possibly participate in the query results, which saves on I/O.

  • Slowly changing dimensions are much easier to implement on a star schema than a snowflake.

  • The schema is easier to understand and tends to involve less joins than a snowflake or E-R schema. Your reporting team will love you for this

  • Star schemas are much easier to use and (more importantly) make perform well with ad-hoc query tools such as Business Objects or Report Builder. As a developer you have very little control over the SQL generated by these tools so you need to give the query optimiser as much help as possible. Star schemas give the query optimiser relatively little opportunity to get it wrong.

Typically your reporting layer would use star schemas unless you have a specific reason not to. If you have multiple source systems you may want to implement an Operational Data Store with a normalised or snowflake schema to accumulate the data. This is easier because an ODS typically does not do history. Historical state is tracked in star schemas where this is much easier to do than with normalised structures. A normalised or snowflaked Operational Data Store reflects 'current' state and does not hold a historical view over and above any that is inherent in the data.

ODS load processes are concerned with data scrubbing and conforming, which is easier to do with a normalised structure. Once you have clean data in an ODS, dimension and fact loads can track history (changes over time) with generic or relatively simple mechanisms relatively simply; this is much easier to do with a star schema, Many ETL tools (for example) provide built-in facilities for slowly changing dimensions and implementing a generic mechanism is relatively straightforward.

Layering the system in this way provdies a separation of responsibilities - business and data cleansing logic is dealt with in the ODS and the star schema loads deal with historical state.

ConcernedOfTunbridgeWells
A: 

It's possible to do without. However, you will make life hard for yourself -- your organization will want to use standard tools that live on top of DWs, and those tools will expect a star schema -- a lot of effort will be spent fitting a square peg in a round hole.

A lot of database-level optimizations assume that you have a star schema; you will spend a lot of time optimizing and restructuring to get the DB to do "the right thing" with your not-quite-star layout.

Make sure that the pros outweigh the cons..

(Does it sound like I've been there before?)

-D

SquareCog
+1  A: 

Star schema is a logical data model for relational databases that fits the regular data warehousing needs; if the relational environment is given, a star or a snowflake schema will be a good design pattern, hard-wired in lots of DW design methodologies.

There are however other than relational database engines too, and we can build efficient data warehouses using them. Multidimensional storage engines can be very fast for OLAP tasks (TM1 eg.); we can not apply star schema design in this case. Other examples requiring special logical modelling include XML databases or column-oriented databases (eg. the experimental C-store)).

csaba
"other than relational database engines"... Interesting. What design pattern do they use for the data? A star schema or some other kind of design?
S.Lott
Multidimensional (MOLAP) databases store their data in various multidimensional array structures. Conceptually, in my interpretation, when building a data warehouse we build a conceptual data model first (with dimensions and data cubes), then we map it to the logical level (tables and constraints), which is then implemented on phisical level (files on disks, handled by the DBMS). MOLAP engines however map the conceptual model directly to phisical level. As star schema is the logical model of relational dws, therefore it is omitted in a MOLAP environment.
csaba