views:

101

answers:

4

Could someone give me a good, practical definition of what a data warehouse is?

+1  A: 

A database optimized for retrieval, in general denormalized data, usually a star schema(but could be snowflake) and uses dimensional modeling (fact and dimension tables)

SQLMenace
http://en.wikipedia.org/wiki/Data_warehouse
Rusty
+2  A: 
mcha
+1  A: 

I'm surprised no one has posted Inmon's definition:

A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process

From the same page you can pick up Kimball's definition:

A copy of transaction data specifically structured for query and analysis

I think that, unfortunately, data warehousing is a wide-ranging field. There is a lot of variety with very few standard paradigms, specifically I'm thinking of Kimball's dimensional modelling. Inmon does not have as a specific a methodology as Kimball's and thus some 3NF models may or may not conform to his principles.

Because Inmon has broadened his scope for what warehousing is meant to accomplish, it can encompass unstructured data. However, analysis of unstructured data is very different than traditional analysis.

As applied to SQL Server, typically the largest Data Warehouses on SQL Server are modelled dimensionally, because this lends itself well to the non-distributed, non-massively parallel model. Massively parallel systems like Teradata generally perform a lot better with 3NF models. These are still table-based systems with the various tables connected with foreign key constraints (perhaps not enforced, but at least logical).

Of course, we are also seeing NoSQL data processing systems like Map/Reduce which are not really databases at all in the sense of normalized, denormalized or non/poorly-normalized relational databases which we have had for 40 years now.

Cade Roux
A: 

While this is not an academic definition, it might serve as a practical one. A data warehouse is a collection of datamarts and will combine datasets across the breadth of an organization.

A datamart will contain datasets specific to certain portions of the business. In the datamart you will find fact tables, measurable pieces of information, along with dimensions, attributes of your measurable pieces.

A true data warehouse will have conformed dimension tables that can be shared across datamarts.

An example...

Your company may build a datamart around sales. And another datamart around human resources. If the customer dimension table is shared across both these datamarts, it would be considered a conformed dimension. All three of these entities together would make up a data warehouse.

As someone else stated you can find more detailed information by searching for Ralph Kimball's Data Strategies.

Markus