views:

152

answers:

2

I have a set of tables that's actually a stubby tree. At the top, there's a Customer, and below that Invoices, and Invoice Detail records. (In actuality, there's about two dozen of these tables all referring to the Customer but the principle should apply with just three tables.)

What I'd like to do is to copy the Customer and all of the records belonging to that Customer without having to enumerate every single field in every record. Everything is foreign-key constrained to the thing above it, and most of the tables have auto-increment identity fields.

Below is a T-SQL script to set up a database. Yes it's messy, but it's complete.

CREATE TABLE [dbo].[Customer](
    [custID] [int] IDENTITY(1,1) NOT NULL,
    [name] [varchar](50) NOT NULL,
 CONSTRAINT [PK_Customer] PRIMARY KEY CLUSTERED ( [custID] ASC)
 WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF,
 ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY] ) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Invoice](
    [invoiceNum] [int] IDENTITY(1,1) NOT NULL,
    [custID] [int] NOT NULL,
    [Description] [varchar](50) NOT NULL,
 CONSTRAINT [PK_Invoice] PRIMARY KEY CLUSTERED ( [invoiceNum] ASC )
 WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, 
 ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY] ) ON [PRIMARY]
GO
CREATE TABLE [dbo].[InvoiceDetail](
    [invoiceNum] [int] NOT NULL,
    [sequence] [smallint] NOT NULL,
    [description] [varchar](50) NOT NULL,
    [price] [decimal](10, 2) NOT NULL CONSTRAINT [DF_InvoiceDetail_price]  DEFAULT ((0.0)),
 CONSTRAINT [PK_InvoiceDetail] PRIMARY KEY CLUSTERED ( [invoiceNum] ASC, [sequence] ASC )
 WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, 
 ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY] ) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Invoice]  WITH CHECK ADD  CONSTRAINT [FK_Invoice_Customer] 
    FOREIGN KEY([custID])
    REFERENCES [dbo].[Customer] ([custID])
GO
ALTER TABLE [dbo].[Invoice] CHECK CONSTRAINT [FK_Invoice_Customer]
GO
ALTER TABLE [dbo].[InvoiceDetail]  WITH CHECK ADD  CONSTRAINT [FK_InvoiceDetail_Invoice] 
    FOREIGN KEY([invoiceNum])
    REFERENCES [dbo].[Invoice] ([invoiceNum])
GO
ALTER TABLE [dbo].[InvoiceDetail] CHECK CONSTRAINT [FK_InvoiceDetail_Invoice]

declare @id int;
declare @custid int;
insert into Customer values ('Bob');
set @custid = @@IDENTITY;
insert into Invoice values ( @custid, 'Little Purchase');
set @id = @@IDENTITY;
insert into InvoiceDetail values (@id, 1, 'Small Stuff', 1.98);
insert into InvoiceDetail values (@id, 2, 'More Small Stuff', 0.25);
insert into Invoice values ( @custid, 'Medium Purchase');
set @id = @@IDENTITY;
insert into InvoiceDetail values (@id, 1, 'Stuff', 11.95);
insert into InvoiceDetail values (@id, 2, 'More Stuff', 10.66);
insert into Customer values ('Sally');
set @custid = @@IDENTITY;
insert into Invoice values ( @custid, 'Big Purchase');
set @id = @@IDENTITY;
insert into InvoiceDetail values (@id, 1, 'BIG Stuff', 100.00);
insert into InvoiceDetail values (@id, 2, 'Larger Stuff', 99.95);

So what I want to do is make a copy of "Bob" in this database, and call it "Bob2" without all of the hassle of specifying each column for every table. I could, but in the Real World that's a LOT of columns.

The other problem being that I'd have to write an explicit loop to get each of the invoices. I need the identity from the previous Invoice insert in order to write the Invoice Detail.

I've got a working C# "copy" program, but I'd like to do this all in the database. The naive implementation is a transact sql stored procedure with loops and cursors everywhere.

Is there a clever way of avoiding one (if not both) of these issues?

A: 

"most of the tables have auto-increment identity fields"

There's part of the problem. Using IDENTITY as a PK makes these kind of operations both difficult and costly (from a computing standpoint). Even if you didn't use IDENTITY though, you're still going to need to generate new invoice numbers for the "new" customer, which means that you're going to need to either cycle through one at a time or come up with a set-based method of assigning new invoice numbers which can then be used to create the invoice detail rows.

I'm going to assume that you understand what you're getting into from a business perspective, but I still have to point out that you're also now creating data that isn't exactly "real". If you copy one of these customers, including all of their invoices and then you report on your sales for the year you're going to be double-counting sales.

With more information on the business problem that you're trying to solve, maybe another solution could be found.

Tom H.
There aren't really "Customers" and "Invoices". It'd be improper to post actual schema in a public forum. They do mimic a real database though.
clintp
+1  A: 

I had similar problem with much more tables involved. We can indeed avoid making cursor for each rows to be copied. The only cursor is for looping the list of the table names involved. We are also going to need dynamic SQL for this. The whole operation is extremely fast, compared to the traditional cursor loop solution.

The trick is to insert the relevant rows into the same tables; and then update its FK column to its parent. How we can collect the mass @@identity is by making use of 'output' keyword during the insertion, and save them into a temporary table #refTrack. Later we join #refTrack with the tables involved for updating the their FKs.

We know that:


create table #refTrack 
(
    tbl sysname,
    id int, 
    refId int
)

insert InvoiceDetail (refId, invoiceNum, sequence, description, price)
output 'InvoiceDetail', inserted.id, inserted.refId into #refTrack 
select invoiceNum, invoiceNum, sequence, description, price from InvoiceDetail 
where custID = 808 -- denormalized original Bob^s custID

will populate temporary #refTrack table with a list of newly created auto running numbers. Our job is just to make this insert query as dynamic.

The only drawback of this method is that we need consistencies, on each table we must have:

  1. Primary key of its own by the name of 'id'. In this case we need to rename: Customer.custID to become Customer.id; Invoice.invoiceNum to become Invoice.id; and a new column 'id int identity(1, 1) primary key' in InvoiceDetail.
  2. A denormalized 'custID' column. For tables listed with 'depth' > 1, the table will require the current front end application to populate this new helper column. An 'insert trigger' will make our work a bit more complicated.
  3. A column called 'refId', defined as: int null. This column is for making the relationship of rows belong to 'Bob2' as a copy of 'Bob'.

Steps taken:

A. List all table names into @tList table variable


declare @tList table
(
     tbl sysname primary key,
     fkTbl sysname,
     fkCol sysname,
     depth int
)
insert @tList select 'Customer', null, null, 0
insert @tList select 'Invoice', 'Customer', 'custID', 1
insert @tList select 'InvoiceDetail', 'Invoice', 'invoiceNum', 2

I'd love to go abstract as to just populating 'tbl' column during insertion above; and dynamically populate the rest of the columns by updating them with the result of recursive CTE of information_schema's views. However that could be beside the point. Let's assume we have a table with the list of table names involved, ordered by the way it should be populated.

B. Loop the @tList table in a cursor.


declare 
    @depth int,
    @tbl sysname,
    @fkTbl sysname,
    @fkCol sysname,
    @exec nvarchar(max),
    @insCols nvarchar(max),
    @selCols nvarchar(max),
    @where nvarchar(max),
    @newId int,
    @mainTbl sysname,
    @custId int 


select @custId = 808 -- original Bob^s custID to copy from

select @mainTbl = tbl from @tList where fkTbl is null

declare dbCursor cursor local forward_only read_only for  
    select tbl, fkTbl, fkCol, depth from @tlist order by depth
open dbCursor   
fetch next from dbCursor into @tbl, @fkTbl, @fkCol, @depth 
while @@fetch_status = 0   
begin   
    set @where = case when @depth = 0 then 'Id' else 'custId' end + ' = ' + 
     cast(@custId as nvarchar(20))
    set @insCols = dbo.FnGetColumns(@tbl) 
    set @selCols = replace
    (
     @insCols, 
     'refId', 
     'Id'
    )
    set @exec = 'insert ' + @tbl + ' (' + @insCols + ') ' + 
     'output ''' + @tbl + ''', inserted.id, inserted.refId into #refTrack ' +
     'select ' + @selCols + ' from ' + @tbl + ' where ' + @where

    print @exec
    exec(@exec)

    -- remap parent
    if isnull(@fkTbl, @mainTbl) != @mainTbl -- third level onwards
    begin
     set @exec = 'update ' + @tbl + ' set ' + @tbl + '.' + @fkCol + ' = rf.Id from ' + 
      @tbl + ' join #refTrack as rf on ' + @tbl + '.' + @fkCol + ' = rf.refId and rf.tbl = ''' + 
      @fkTbl + ''' where ' + @tbl + '.custId = ' + cast(@newId as nvarchar(20))

     print @exec
     exec(@exec)
    end

    if @depth = 0 select @newId = Id from #refTrack
    fetch next from dbCursor into @tbl, @fkTbl, @fkCol, @depth 
end   

close dbCursor
deallocate dbCursor

select * from @tList order by depth
select * from #refTrack

drop table #refTrack 

C. The content of FnGetColumns():


create function FnGetColumns(@tableName sysname) 
returns nvarchar(max)
as
begin
    declare @cols nvarchar(max)
    set @cols = ''
    select @cols = @cols + ', ' + column_name 
     from information_schema.columns 
     where table_name = @tableName
      and column_name <> 'id' -- non PK
    return substring(@cols, 3, len(@cols))
end

I am sure we can further improve these scripts to be far more dynamic. But for the sake of solving the problem, this would be the minimum requirement.

Cheers,

Ari.

Irawan Soetomo