views:

270

answers:

6

We have large SQL Server 2008 databases. Very often we'll have to run massive data imports into the databases that take a couple hours. During that time everyone else's read and small write speeds slow down a ton.

I'm looking for a solution where maybe we setup one database server that is used for bulk writing and then two other database servers that are setup to be read and maybe have small writes made to them. The goal is to maintain fast small reads and writes while the bulk changes are running.

Does anyone have an idea of a good way to accomplish this using SQL Server 2008?

A: 

I'm not sure what you mean when you say everyone else's read and write slows down. Does it slow down when they read & write to the same database where the data is currently being imported or from different databases on the same server?

If it is the same database, you could always use the "with (nolock)" hint to do the reads even when the table is locked for writes/inserts. However, please be aware that the reads can be dirty reads. I am not sure how you can do faster quick writes when the table is locked because a write is already in progress. You can keep the transaction small to make the writes faster and release the locks. The other option is to have a separate database for bulk inserts and another database for reading.

msvcyc
A: 

The easiest way would be to slow down the rate at which writes occur, and feed them in one record at a time. They'll be slower, but it would make things faster for users. If the batches take "a couple hours", you perhaps can spread them out more.

le dorfier
A: 

Why not use MemCached to eliminate the reads, I've got the same situation where I work and we've been using memcached on Windows with great results. I was supprised how trivial it was to get my code running with it too. There are open-source wrapping libraries for virtually every mainstream language, and using it could result in 99% of your reads, not even touching the database (becasue you set the memcache values on the write operation of the database).

Memcached, is really just a giant hash table store (and can even be clustered or run on any machine you like since it uses sockets to read and store the hashes).

When reading the memcached value, simply check if its null (return if its not) or do your ussual database read and return. It can store just about everything, so long as each memcached key/value pair is less than 1MB.

Ash
+4  A: 

Hi, Paul. There's two parts to your question.

First, why are writes slow?

When you say you have large databases, you may want to clarify that with some numbers. The Microsoft teams have demonstrated multi-terabyte loads in less than an hour, but of course they're using high-end gear and specialized data warehousing techniques. I've been involved with data warehousing teams that regularly loaded so much data overnight that the transaction log drives had to be over a terabyte just to handle the quick bursts, but not a terabyte per hour.

To find out why writes are slow, you'll want to compare your load methods to data warehousing techniques. For example, have you tried using staging tables? Table partitioning? Data and log files on different arrays? If you're not sure where to start, check out my Perfmon tutorial to measure your system looking for bottlenecks:

http://www.brentozar.com/archive/2006/12/dba-101-using-perfmon-for-sql-performance-tuning/

Second, how do you scale out?

You asked how to set up multiple database servers so that one handles the bulk load while others handle reads and some writes. I would heavily, heavily caution against taking the multiple-servers-for-writes approach because it gets a lot more complicated quickly, but using multiple servers for reads is not uncommon.

The easiest way to do it is with log shipping: every X minutes, the primary server takes a transaction log backup and then that log backup is applied to the read-only reporting server. There's some catches with this - the data is a little behind, and the restore process has to kick all connections out of the database to apply the restore. This can be a perfectly acceptable solution for things like data warehouses, where the end users want to keep running their own reports while the new day's data loads. You can simply not do transaction log restores while the data warehouse is loading, and the users can maintain connections the whole time.

To help find out what solution is right for you, consider adding the following to your question:

  • The size of your database (GB/TB in size, # of millions of rows in the largest table that's having the writes)
  • The size of your server & storage (a box with 10 drives has different solutions available than a box hooked up to a SAN)
  • The method of loading data (is it single-record inserts, are you using bulk load, are you using table partitioning, etc)
Brent Ozar
A: 

Hey Paul

This is just an idea. Create a view over your "active" tables. Then BCP in the data into a "staging" table. When it is done, update the view to include the "staging" tables. Just an idea.

uriDium
A: 

Boy, I don't think that I would come back after being battered so much. I know that the information is useful and necessary to help with the issue and probably gives back as much as possible to help identify a solution, but you could have given a little time to look at one thing and try it. Still very good response time.