views:

35

answers:

1

A little background:

I have a remote, stand alone SQL Server database that is truncated at the end of every weekend. The data is hardly relational, not normalized at all, and pretty annoying to work with. On top of that, the schema for this database cannot be modified at all, because it is recreated by a third party application. Before the database is destroyed each week, a backup is created of that week's data. On average each database will have between 500,000 and 2,000,000 records.

My task is to create a historical version of this database that is a superset of all of these database backups. It should tie into our other databases which contain related sets of information. I have already started on an application to perform this task, and I've gotten to the point where I'm able to match data with our other databases, but I'm wondering if theres any best practice to handling this kind of import.

How do I make sure that I have unique IDs in my historical version of this database? Are there any features in SQL Server that can do some of the heavy lifting in this for me?

Thanks for your time on this.

+1  A: 

There's definitely a feature in SQL Server that can assist you and that feature is called SSIS (SQL Server Integration Services). One of the main uses of SSIS is for ETL (Extract, Transform, Load), which means extracting data from several diverse source, transforming it into whatever you need to get into your destination database (such as a data warehouse - any linking with existing data will also happen here), and finally loading it into your destination DB.

I think the best way to get started, if that's what you want of course, is to pick up a good book on SSIS and go through it. While reading, don't forget to play around with the BIDS (Business Intelligence Development Studio - one of the SQL Server tools) to create some test packages. Furthermore, on the internet you'll find plenty of "getting started" articles.

For your case in particular what I would do is:

  • create a generic package that can import the data from a source DB (one of your weekly DBs) and insert it into the destination DB - this package can be parameterized using Parent Package Configuration.
  • create a main package that loops over all backups in a certain folder, restores them one by one and calls the generic import package for each restore. After each successful import, the Control Flow would delete the previously-restored DB.

I think I've given you enough material to investigate on now :-)

Good luck, Valentino.

Valentino Vranken
And to add to what Valentino said, yes create your own id and stage the tables, so you can update all the child tables once you have the new id for each parent record.
HLGEM