views:

127

answers:

7

Let's say I have a database like this:

Users
-----
ID int PK Identity
Name vchar(max)

Sales
-----
UserID int FK
Description vchar(max)

And some data:

Users
1, "Daniel"
2, "Barnie"
3, "Frank"

Sales
2, "New computer"
2, "Rubber duck"
3, "Cabbage"

There are also several other tables that link to this primary key. Now there is a requirement that only certain users want to be backed up, for example I only want to export the data and all the linked data for users 2 and 3.

Questions: 1) Is there a way to create a .bak file using only partial data? I don't want to backup the whole thing, just selected records. 2) If .bak files are not the best way, what else can be done? I have thought of generating a csv file or an INSERT sql script but these leads to problems in the import feature. The problem comes about when you have exported from two or more databases and you now have potential clashes in the primary key for the users table. How do you get around this? I am also using filestreaming in some tables so I have some data that cannot be pulled out into text format easily.

I'd also like to do all this programatically. Using sql server 2008.

+1  A: 

A backup is meant to be able to take a snapshot of a database so that it can be restored at another time. If you do not care about certain records then delete them, and then do the backup.

Romain Hippeau
Part of the problem is we do care about the other records in there so the rest of the data is important. We'd like to just archive a sub-set of data away and possibly delete it. And then in the the future there may be a chance it needs to come back. There also may be several of these databases, and it would be nice to farm off subsections of them all and pour them into a new super database.
DanDan
@DanDan your solution is to use BCP - http://msdn.microsoft.com/en-us/library/aa174646(SQL.80).aspx You can use queries to do your exports.
Romain Hippeau
+1  A: 

A relatively simple option would be to populate a table with the users you want to back up, make another DB on the same server instance for the archived users, then do a select into the new DB (users table first obviously). Delete from the old DB where values exist in the new DB, back up the new DB, and you are golden.

JNK
It is a nice idea but it sounds like high maintenance. I was hoping the sql server framework had a nicer solution!
DanDan
+1  A: 

What about this idea:

  • Create backups tables for each related table you want to backup.
  • With a simple query populate this tables (you'll just select data concerning users who wants the backup)
  • A simple trigger on each concerned table (add or update) to make your backup tables synchronized.
  • Now You can export a back up from this new tables
  • To restore data use insert ignore.

It's just an idea, let's criticize :)

chahedous
It's a good idea but I'll run into problems when merging the backed up data back into the main database.
DanDan
It's not a real problem An with an insert ignore.
chahedous
Sql Server doesn't have insert ignore. Still doesn't get around the real problem that these backups may have come from different sources, I may have 2 records with an id of 3, I want to insert both.
DanDan
+2  A: 

Questions: 1) Is there a way to create a .bak file using only partial data? I don't want to backup the whole thing, just selected records.

No. In SQL Server, the backup functionality will only backup an entire database.

2) If .bak files are not the best way, what else can be done?

I'd recommend setting up a second archive database on the same server as the original and use replication to sync only certain records. I would then only backup the archive database (or both but on different schedules).

If replication isn't your flavor of vodka, then you could even do a triggered upsert or delete into this archive database.

I have thought of generating a csv file or an INSERT sql script but these leads to problems in the import feature. The problem comes about when you have exported from two or more databases and you now have potential clashes in the primary key for the users table. How do you get around this? I am also using filestreaming in some tables so I have some data that cannot be pulled out into text format easily.

Is this a multi-tenant situation? Regardless, for each database I would create a second archive database that would be used to backup the information that was actually needed. Thus, no two databases would feed into the same filtered archive database.

Thomas
Thank you, this does seem like the route I will be taking. Thanks for your information.
DanDan
+1  A: 

I would take a generalized archiving approach to this problem:

  • Create a central database with all the schemas for all tables you need to export

Since you want to maintain the Filestream data, I don't see how .csv or bcp files could be used. Plus, this fits into the idea you mentioned of having one giant database to accumulate the information.

  • For each table, add a new column called DbName.

DbName will be the name of the database that the original record comes from. You can combine this column with the User ID to create a composite key. This will allow you to keep the identity fields in your tables and still be able to merge users with the same ID into one table.

  • Create a stored procedure which will load the necessary data into the new database and delete it at the same time (or at least mark for delete).

Presumably this stored procedure will be run as a SQL Agent job and you could have one in each database. The users to be removed could be referred to via a centralized table.

8kb
Thanks for your input, the composite key idea is nice, it has given me a new idea to think about.
DanDan
+2  A: 

You can use partitioning to divide the data between either different file groups or different servers. You can then choose how you backup each partition by applying different backup schedules to the filegroups/servers.

But on the whole, storage is very cheap these days. Unless you really know this is going to cost, I would just backup everything. The more complex the backup system, the more prone to failure, and the cost of saving a few gig will not equal the cost of losing all the data!

mdma
(Not) backing up user-related data is often a legal question, not a question of storage size.
Doc Brown
@Doc - Thanks for that perspective, I don't work in that kind of sector, so that didn't occur to me. But isn't it so that if the data is worth having, isn't it worth backing up? Is there really legislation to prevent this?
mdma
@mdma: it depends on the data. Sometimes the problem is that the maximum allowed retention period for user-related data is quite different from the period you keep your backups.
Doc Brown
@Doc Brown: I'm with mdma here. If it's important enough to store, it's important enough to backup. If it's online and available for use now, it should be restorable for use at any point the legal retention period. Any kind of partial backup like this is prone to errors and begs the question "did you backup everything you're legally obliged too? BTW, I'm a DBA in a Secrecy Jurisdiction (Switzerland) in the banking sector.
gbn
@gbn: no objection - especially when you are working in the banking sector, I suspect you will only get problems for losing customer data, not for storing them too long ;-). I was just guessing about the motivation of the OP for his requirement.
Doc Brown
+1  A: 

Another possibility is to use SSIS with custom select statements and raw data outputs. The content would be extremely fast importing and exporting in a native format, while getting exactly what records you want. Additionally, you could run compression on the files after export or run file commands to move them around.

Dr. Zim
Given, but you can use SSIS to restore it too. SSIS comes with MS SQL and is "free" if you own SQL Server standard (or better).
Dr. Zim