views:

57

answers:

3

I need to implement a SQL Server replication solution. Very simple need for now. I just need to replicate one pretty simple table from 200 remote sites or so to one central server. The data is not really transactional in nature. I just need it moved up to the central server once a day. I can't decide if I should use push or pull, and I'm not sure if the distributor should live on the server side, or on all the clients.

The server and all the remote sites all live on a fairly decent VPN. The server is 2005, and it's not being pushed very hard at the moment. Just a few jobs here and there collecting data (which I want to get away from) and pushing reports/exports to various vendors once a day. The sites are a mix of 2000/2005.

+1  A: 

If there is need to adjust the replication down the road, having the central server initiate a pull will be much easier to administrate than adjusting 200 sites to accomplish the same thing. Also, that would naturally manage the load, rather than some scheme to prevent, say, 100 remote sites all connecting at once.

wallyk
Thanks WallyK. Interesting point, I guess I was assuming that replication was smart enough to handle those sorts of data bursts. The only reason I was cosidering a push was it seems like that would delegate the workload to 200 workers instead of 1. If SQL doesn't handle that gracefully, you're right, that could get ugly.
BrainMan
A: 

Push subscriptions are the way to go here if you wish to centrally manage the data distribution of your application platform.

From what you have described you will need to make a choice between Snapshot Replication and Transactional Replication for your architecture.

Dependent on how much data you are looking to push and also the schedule of your updates will determine the most appropriate Replication Method for you to use. For example, if you looking to update all Subscriptions at the same time then dependent on how much data you need to push Snapshot Replication may not be suitable and you may be better off using Transactional Replication, perhaps pushed at specific determined intervals. Your network may even be able to support near real-time replication however conducting a small test of your environment will determine this for you. For example, setup the Publisher, local Distributor and a handful of Subscribers at geographically different locations on your network in order to test network transfer times and Replication Latency.

Things to consider:

  • How much data is to be moved across the network? Size in Kb and record volume.
  • Consider the physical location of your sites
  • What is the suitability of your network? Seed, capacity etc.
  • You may wish to consider using a dedicated Distributor.
John Sansom
+1  A: 

I'd recommend you do some scalability tests first. Replication is very verbose in terms of agent jobs and T-SQL connections for reading and writing data. 200 publications you're talking 200 publisher agents, 200 subscription agents, plus the distributor maintenance. Most sites complain about maintenance problems of having 1 publisher and 1 subscriber... Say you manage to pull this off and operate it successfully, what is going to be your upgrade story? And how are you going to implement a schema change?

The largest replication deployment I heard of (some years ago) had I believe 450 publishers and was implemented by an army of Microsoft field consultants sweating for months to bend the behemoth into shape. Your 200 replication sites project is way more ambitious than you realize.

I suggest you explore some alternatives too. If you need a periodic table snapshot then SSIS can be a good match. If you need a continuous stream of changes then Service Broker can scale way way easier than replication.

Remus Rusanu
Wow, very scary, you definitely have got me thinking. The last project I did was similar... move relatively small about of data from 200 sites to central server on a daily basis; however, this was a sort of a 60 day promo thing. It's over now and went very well. In this case, I simpy pushed out a TSQL job to all 200 linked servers and scheuled them to run a couple times a day. It all worked very well. I took that approach thinking that Replication was the way I SHOULD have been going, but I didn't have the experience and chickened out! Maybe I should stick with my first approach!? Thanks Remus
BrainMan