views:

367

answers:

4

We are trying to come up with an Active - Active DR strategy for our 6 TB data warehouse. Our datawarehouse has 40 DBs and everything has to be replicated on a real time bases.

Site 1 : Needs to handle all the ETL Site 2 : Will handle all the reporting queries.

  • Database Mirroring (Cannot afford to drop and create snapshots as we cannot Kill any connections)
  • Replication
  • Log shipping

Migrating to SQL Server 2008 is an option.

Which is the best way for performance and availability?

Regards, Nagy

A: 

You need to talk with your hardware vendor - especially the storage one to see if they provide some sort of hardware based replication. Looking at the volume of the data, I don't think software based solution will be optimal.

Here is how I handle it for 3 databases (11, 17 and 23 TB) right now.

  1. We are hosting the database in a EMC SAN.
  2. Every 12 hours the databases are cloned on different luns located on the same same SAN and then mounted on different servers. This is the backup in case the primary servers get hosed. These databases are generally 12 hours behind the primary databases. We use them for reporting where we can live with 12 hours old data.
  3. Every 24 hours, the clones in 2 are copied to a different SAN in a different building and mounted. This is a the secondary backup. In these databases we run the diagnostics, DBCC checks etc.
  4. In total we are running a total of 9 SQL Server Enterprise Edition (3 prod, 3 first line DR and 3 second line DR) instances.
  5. We decided to go this way, as we could live with upto 24 hours of lag in the data.

This is certainly doable, but it will require a fair bit of planning as well as investment in your part. For us the cost for 9 EE license was not much compared to the price of two SANs and the interconnect between them.

no_one
+2  A: 

Since you can't afford to drop active connections log shipping isn't an option either. You need to get exclusive access to the database to restore the log. Hardware support (SAN) will be a big help here. I'd almost like to see you ETL into one server, and then snap over making that the active server for reporting and use the other server for ETL. Thus you have a reporting server with no ETL process, and an ETL server with no reporting, but you swap which is which on a nightly? basis.

WaldenL
I like the idea. But our warehouse is a near real time datawahouse. Acceptable data lag for the business is half hour. Don't we have to drop the users when we are swapping from one server to another?
Nagendra Somasetty
"Near real time datawarehouse"... always loved that expression. :-) Anyway, yes. So that rules out the swap idea. My feeling is replication is just about your only option here. But what's the data look like when you're 1/2 through the Load part of ETL? Can't the user end up with a half-baked result?
WaldenL
LOL..I know. I am going bald maintaining this monster as its near real time. Most of data is transactional. I guess users will be ok with that. I guess Replication is my only option.
Nagendra Somasetty
Kinda defeating the purpose if you're almost realtime, yeah? You'd almost be better off optimising your OLTP design instead?
RobS
A: 

Peer to Peer transactional replication is probably the best option for you unless you want to go down the expensive SAN hardware replication path.

It's offer's near real-time so this should be good enough for reporting.

Nick Kavadias
With SAN hardware replication, don't you have to restart the sql service every time it syncs?
Nagendra Somasetty
If yes. Can you tell me who is the SAN vendor and the technology?
Nagendra Somasetty
Nick Kavadias
A: 

Pretty much SQL Server Replication, or some sort of customer solution using the SQL Service Broker are going to be your best bet. If your tables are static and all data changes are being done at one site then transactional replication may be your best bet. You'll need to large WAN pipe to handle the replication as transactional consistency is maintained even if multiple threads are used.

SQL Server 2008 has some improvements to Replication's performance as it allows multiple threads to the distributor so that may help you.

mrdenny