data-warehouse

In a Data Warehouse scenario is there any disadvantage to using WITH(NOLOCK)

I have a Kimball-style DW (facts and dimensions in star models - no late-arriving facts rows or columns, no columns changing in dimensions except expiry as part of Type 2 slowly changing dimensions) with heavy daily processing to insert and update rows (on new dates) and monthly and daily reporting processes. The fact tables are partiti...

Primary Keys and Constraints

In my brand new data warehouse that is built (of course) from the OLTP database, I have dropped all the IDENTITY columns and changed them to INT columns. What are the best practices regarding the following especially since the warehouse is denormalized: Primary Key -> this may now be a composite key because several tables have come t...

Dimensional Modeling: should a fact table have a foreign key?

Can a fact table have no keys at all? or if it can, is it a good design? If a fact table do not have any dimensions, on what basis is it analyzed? What if a fact table has primary key/s only and no foreign key/s? ...

Are textual fields ever allowed in fact tables?

Are there any cases where I can have a textual field such as a description in a fact table? I currently have a fact table of meeting events (grain: row per meeting) with a number of dimensions such as date, client, location etc. I need to put the meeting subject in the fact table. Is this ok even though it is not a measure (I have not s...

what if a fact table/view is a template(meant to contain only table structure but no data)?

I noticed that the fact tables used in a cube were actually views. Infact they were the templates of the fact tables (i noticed it in the script that "where 1=2" was used for the fact-views). So, if the template is used, there wont be any data in the view at any cost (and i dont know if I can insert in the view becasue I dont have inse...

Data Warehouse: Modelling a future schedule

I'm creating a DW that will contain data on financial securities such as bonds and loans. These securities are associated with payment schedules. For example, a bond could pay quarterly, while a mortage would usually pay monthly (sometimes biweekly). The payment schedule is created when the security is traded and, in the majority of case...

Good place to start learning data warehousing?

I am interested in learning more about data warehousing. I see terms like "dimension", "snowflake schema" and "star schema" thrown about. Where would one start in learning about this stuff? Are there good books or Internet resources? ETL is in this space too right? ...

Open Source Metadata Management Tool

I'm not sure "Metadata Management" is the right term.... Basically, I have a client who asked for recommendations on "Metadata Management" tools with regard to a data warehousing project they have. I'm guessing the term has to do with creating something like a data dictionary, but I have relatively little experience in this area and am...

In SQL Server CDC with SSIS, which data should be stored for windowing (LSN or Date)?

I have implemented delta detection while loading data warehouse from transaction systems using an identity column or date-time column in source transaction tables. When data needs to be extracted next time, the maximum date-time value extracted last time is used in the filter of extraction query to identify new or changed records. This w...

How to handle a fact table with more than 16 keys in ms sql server

I have a fact table that has 17 keys. Normally I have been designating the primary key as all of my dimensional keys. MS SQL server 2008 has a limitation of 16 columns in a primary key or unique constraint. Are there any work arounds? ...

Should I create a clustered index on a fact table? never? always?

In a data warehouse, are there disadvantages to creating clustered indexes on fact tables? (most of the time, it will be on the datetime column) Would you answer yes or no "by default..."? If I shouldn't create clustered indexes by default, then why? (I know the pros of clustered indexes, but what are some cons?) References http://...

What are the alternatives to SQL Server Analysis Services?

Are there any alternatives to SQL Server Analysis Services on the Windows x64 platform? I'm vaguely curious, because I haven't heard of any (though admittedly I haven't looked very hard). Basically, a product that allows multi-dimensional cubes and querying those to generate reports (though the generation and presentation of those repo...

Type II dimension joins

I have the following table lookup table in OLTP CREATE TABLE TransactionState ( TransactionStateId INT IDENTITY (1, 1) NOT NULL, TransactionStateName VarChar (100) ) When this comes into my OLAP, I change the structure as follows: CREATE TABLE TransactionState ( TransactionStateId INT NOT NULL, /* not an IDENTITY column i...

Tracking what the MERGE command and its OUTPUT did

I am modifying a Type 2 dimension using the following (long) SQL statement: INSERT INTO AtlasDataWarehouseReports.District ( Col01, Col02, Col03, Col04, Col05, Col06, Col07, Col08, Col09, Col10, StartDateTime, EndDateTime ) SELECT Col01, Col02, Col03, Col04, Col05, ...

Labor Day Vs. Thanksgiving

I am creating a calendar table for my warehouse. I will use this as a foreign key for all the date fields. The code shown below creates the table and populates it. I was able to figure out how to find Memorial Day (last Monday of May) and Labor Day (first Monday of September). SET NOCOUNT ON DROP Table dbo.Calendar GO Create Table d...

Calendar table for Data Warehouse

For my data warehouse, I am creating a calendar table as follows: SET NOCOUNT ON DROP Table dbo.Calendar GO Create Table dbo.Calendar ( CalendarId Integer NOT NULL, DateValue Date NOT NULL, DayNumberOfWeek Integer NOT NULL, NameOfDay VarChar (10) NOT NULL, NameOfMonth VarChar (10) NOT NULL, WeekOfY...

How does OLAP address dimensions that are numeric ranges?

To preface this, I'm not familiar with OLAP at all, so if the terminology is off, feel free to offer corrections. I'm reading about OLAP and it seems to be all about trading space for speed, wherein you precalculate (or calculate on demand) and store aggregations about your data, keyed off by certain dimensions. I understand how this wo...

Talk to data warehouse-style tables with ActiveRecord?

As my Rails app matures, it's becoming increasingly apparent that it has a strong data warehouse flavour, lacking only a facts table to make everything explicit. On top of that, I just read Chapters 2 (Designing Beautiful APIs) and 3 (Mastering the Dynamic Toolkit) of Ruby Best Practices. Now I'm trying to figure out how best to design...

Linux Data Warehouse System for User Files?

I work at a large university and much of my department's backup requirements are provided by central network services. However, many of the users have collections of large files such as medical imaging scans, which exceed the central storage available to them. I am seeking to provide an improved backup solution for departmental resource...

Linked Measure Groups and Local Dimensions

Mulling over something I've been reading up on. According to Chris Webb, A linked measure group can only be used with dimensions from the same database as the source measure group. So I took this to mean as long as two cubes share a database, a linked measure group can be used with a dimension. So I created a new cube and ad...