Hi.
We are working on a datawarehouse for a bank and have pretty much followed the standard Kimball model of staging tables, a star schema and an ETL to pull the data through the process.
Kimball talks about using the staging area for import, cleaning, processing and everything until you are ready to put the data into the star schema. In practice this typically means uploading data from the sources into a set of tables with little or no modification, followed by taking data optionally through intermediate tables until it is ready to go into the star schema. That's a lot of work for a single entity, no single responsibility here.
Previous systems I have worked on have made a distinction between the different sets of tables, to the extent of having Upload (raw source system data, unmodified) tables, Staging (intermediate processing, typed and cleansed) tables and Warehouse tables. You can stick these in separate schemas and then apply differing policies for archive/backup/security etc. One of the other guys has worked on a warehouse where there is a StagingInput and a StagingOutput, similar story. The team as a whole has a lot of experience, both datawarehouse and otherwise.
However, despite all this, looking through Kimball and the web there seems to be absolutely nothing in writing about giving any kind of structure to the staging database. One would be forgiven for believing that Mr Kimball would have us all work with staging as this big deep dark unstructured pool of data.
Whilst of course it is pretty obvious how to go about it if we want to add some more structure to the staging area, it seems very odd that there seems to be nothing written about it.
So, what is everyone else out there doing? Is staging just this big unstructured mess or do folk have some interesting designs on it?