views:

29

answers:

3

Hi,

I am working on a data warehousing project where several systems are loading data into a staging area for subsequent processing. Each table has a "loadId" column which is a foreign key against the "loads" table, which contains information such as the time of the load, the user account, etc.

Currently, the source system calls a stored procedure to get a new loadId, adds the loadId to each row that will be inserted, and then calls a third sproc to indicate that the load is finished.

My question is, is there any way to avoid having to pass back the loadId to the source system? For example, I was imagining that I could get some sort of connection Id from Sql Server, that I could use to look up the relevant loadId in the loads table. But I am not sure if Sql Server has a variable that is unique to a connection?

Does anyone know?

Thanks,

A: 

a local temp table (with one pound sign #temp) is unique to the session, dump the ID in there then select from it

BTW this will only work if you use the same connection

SQLMenace
I thought of this, but temp tables are actually dropped when they go out of scope. And they go out of scope when the stored procedure that creates them finishes. So the data would be lost.
mr_miles
you need to create them before the proc calls, then call the procs..I don't know how your process works so this might not work for you
SQLMenace
A: 

I assume the source systems are writing/committing the inserts into your source tables, and multiple loads are NOT running at the same time...

If so, have the source load call a stored proc, newLoadStarting(), prior to starting the load proc. This stored proc will update a the load table (creates a new row, records start time)

Put a trigger on your loadID column that will get max(loadID) from this table and insert as the current load id.

For completeness you could add an endLoading() proc which sets an end date and de-activates that particular load.

If you are running multiple loads at the same time in the same tables...stop doing that...it's not very productive.

Markus
That's pretty much what I went for in the end, though by locking down the tables enough, I did away with the need for a loadStarting sproc.
mr_miles
Glad I could help
Markus
A: 

In the end, I went for the following solution "pattern", pretty similar to what Markus was suggesting:

  • I created a table with a loadId column, default null (plus some other audit info like createdDate and createdByUser);
  • I created a view on the table that hides the loadId and audit columns, and only shows rows where loadId is null;
  • The source systems load/view data into the view, not the table;
  • When they are done, the source system calls a "sp__loadFinished" procedure, which puts the right value in the loadId column and does some other logging (number of rows received, date called, etc). I generate this from a template as it is repetitive.

Because loadId now has a value for all those rows, it is no longer visible to the source system and it can start another load if required.

I also arrange for each source system to have its own schema, which is the only thing it can see and is its default on logon. The view and the sproc are in this schema, but the underlying table is in a "staging" schema containing data across all the sources. I ensure there are no collisions through a naming convention.

Works like a charm, including the one case where a load can only be complete if two tables have been updated.

mr_miles

related questions