tags:

views:

136

answers:

3

I was thinking of putting staging tables and stored procedures that update those tables into their own schema. Such that when importing data from SomeTable to the datawarehouse, I would run a Initial.StageSomeTable procedure which would insert the data into the Initial.SomeTable table. This way all the procs and tables dealing with the Initial staging are grouped together. Then I'd have a Validation schema for that stage of the ETL, etc.

This seems cleaner than trying to uniquely name all these very similar tables, since each table will have multiple instances of itself throughout the staging process.

Question: Is using a user schema to group tables/procs/views together an appropriate use of user schemas in MS SQL Server? Or are user schemas supposed to be used for security, such as grouping permissions together for objects?

+1  A: 

I think it's appropriate enough, it doesn't really matter, you could use another database if you liked which is actually what we do.

I'm not sure why you would want a validation schema though, what are you going to do there?

Paul Creasey
The columns in some initial tables are all varchar(2000), so I stage the data into properly typed(int, bool, etc.) Validation tables on a column per column basis, making sure the cast is successful, logging errors if not, and also check business rules that apply to single columns during the staging into the validation tables, and then after the staging I will run queries against the tables for more complex business rules that involve multiple columns. It is important to know what fails validation on a field per field basis because there will be some manual data cleaning and resubmission.
AaronLS
I'm picturing a situation where you might have 10,000 rows in staging, then you move them to validation and any rows which fail to validate could be left in staging for manual correction/deletion, before beginning the main processing, seems like a nice way to single out failing rows actually, im sold.
Paul Creasey
+1  A: 

Both the reasons you list (purpose/intent, security) are valid reasons to use schemas. Once you start using them, you should always specify schema when referencing an object (although I'm lazy and never specify dbo).

One trick we use is to have the same-named table in each of several schemas, combined with table partitioning (available in SQL 2005 and up). Load the data in first schema, then when it's validated "swap" the partition into dbo--after swapping the dbo partition into a "dumpster" schema copy of the table. Net Production downtime is measured in seconds, and it's all carefully wrapped in a declared transaction.

Philip Kelley
+1  A: 

This is actually a recommended practice. Take a look at the Microsoft Business Intelligence ETL Design Practices from the Project Real. You will find (download doc from the first link) that they use quite a few schemata to group and identify objects in the warehouse.

In addition to dbo and etl, they also use admin, audit, part, olap and a few more.

Damir Sudarevic