views:

66

answers:

2

I have a requirement to take a "snapshot" of a current database and clone it into the same database, with new Primary Keys.

The schema in question consists of about 10 tables, but a few of the tables will potentially contain hundreds of thousands to 1 million records that need to be duplicated.

What are my options here?

I'm afraid that writing a SPROC will require a locking of the database rows in question (for concurrency) for the entire duration of the operation, which is quite annoying to other users. How long would such an operation take, assuming that we can optimize it to the full extent sqlserver allows? Is it going to be 30 seconds to 1 minute to perform this many inserts? I'm not able to lock the whole table(s) and do a bulk insert, because there are other users under other accounts that are using the same tables independently.

Depending on performance expectations, an alternative would be to dump the current db into an xml file and then asynchronously clone the db from this xml file at leisure in the background. The obvious advantage of this is that the db is only locked for the time it takes to do the xml dump, and the inserts can run in the background.

If a good DBA can get the "clone" operation to execute start to finish in under 10 seconds, then it's probably not worth the complexity of the xmldump/webservice solution. But if it's a lost cause, and inserting potentially millions of rows is likely to balloon out in time, then I'd rather start out with the xml approach right away.

Or maybe there's an entirely better approach altogether??

Thanks a lot for any insights you can provide.

+1  A: 

I would suggest backing the up database, and then restoring it as new db on your server. You can use that new DB as your source. I will definitely recommend against the xml dump idea..

ps
+1. Exactly what I was about to write...
Heinzi
I was thinking this too.. but would that preserve the primary keys? (I think it does) and the way the OP worded it, I'm not sure if different keys are a requirement or just an acceptable side affect
Neil N
Sure. The whole point of database backup is to be able to restore the DB exactly how it was.
Heinzi
db backup is not the right approach.I'm trying to copy a SUBSET of the data, from one user account to another. This is a BUSINESS LOGIC function, so that user's can have a draft and working copy of the same dataset. I don't want to copy the whole db, it's schemas, sprocs, etc...I just want to copy some of the data over into a new account. Account here is simply a lookup table with a foreign key.There needs to be different versions of the data. It's just that there's a large amount of data to copy. DB backup is definitely not what I want to accomplish.
Scott
Furthermore, there could be dozens of other accounts on the system that I DON'T want to copy data for. Just a relatively small subset of the data, for one user at a time, triggered by a user action.
Scott
A: 

Does it need to be in the exact same tables? You could make a set of "snapshots" tables where all these records go, you would only need a single insert + select, like

insert into snapshots_source1 (user,col1, col2, ..., colN) 
select 'john', col1, col2, ..., colN from source1

and so on.

You can make snapshots_* to have an IDENTITY column that will create the 'new PK' and that can also preserve the old one if you so wished.

This has (almost) no locking issues and looks a lot saner.

It does require a change in the code, but shouldn't be too hard to make the app to point to the snapshots table when appropriate.

This also eases cleaning and maintenance issues

---8<------8<------8<---outdated answer---8<---8<------8<------8<------8<---

Why don't you just take a live backup and do the data manipulation (key changing) on the destination clone?

Now, in general, this snapshot with new primary keys idea sounds suspect. If you want a replica, you have log shipping and cluster service, if you want a copy of the data to generate a 'new app instance' a backup/restore/manipulate process should be enough.

You don't say how much your DB will occupy, but you can certainly backup 20 million rows (800MB?) in about 10 seconds depending on how fast your disk subsystem is...

Vinko Vrsalovic
please see my comment above.
Scott