views:

48

answers:

3

I am beginning to design a new laboratory test data management system with many (about 30) test stations.

The system must be able to collect data offline in the event of a network outage.

Each station would keep an up-to-date, read-only copy of test structures (specifications, entity types, business rules/workflows, etc) but not test data. The actual test data would be stored locally if the server cannot be found.

There would need to be some sort of synchronization to prepare for a network outage. The synchronization would pull updated test structures. Another synchronization would push unsaved test data.

How do you recommend I achieve this? Any caveats?

Ideas / Thoughts:

  1. Install SQL server on each machine and write scripts to synchronize the server and clients (seems expensive and overkill).
  2. Save the local copy of data in application defined raw data files with synchronization scripts.
  3. Is there anything built into SQL Server to have the clients be able to collect data offline?
  4. Save all data locally, then click a "Transfer" button to push data to the network.

Environment: MS SQL Server running on Windows Server 2008

+2  A: 

My first thought would be merge replication with local (SQL Server Compact) copies of the database.

Joe Stefanelli
You know what they say about great minds... :)
bobs
A: 

It sounds like SQL Server replication is the solution you want to check out. Merge replication will allow the local test stations to receive the copy of the data from the server and send back data gathered during network outages.

bobs
Joe Stefanelli
+2  A: 

Use local SQL Express instances and push the data via Service Broker. See High Volume Contiguos Real Time Audit and ETL for an explanation why this is better than Replication from several points of view including price, performance and reliability.

With Replication you would require a higher license than SQL Express on all the periphery nodes (since Express can only be a subscriber in a replication Topology). Using SSB to push the data allows for SQL Express instances at periphery and only requires a central non-express licensed server. This means that you can easily deploy the solution on tens of workstations without worry about SQL Server licensing costs.

Another advantage is performance and throughput. SSB was designed to handle hundreds and thousands of peers in communication and was designed with a sophisticated hierarchical flow control that is capable of handling retries for thousands of destinations in the presence of network failures (I know all this because I was a member of the SSB team). Replication by comparison uses TDS protocol for data exchange, it relies on SQL Agent jobs and handles network outages in a simplistic manner that can lead to many cycles being burned on retries, making things worse exactly when they are already bad. Overall, SSB can handle large deployments (I know of deployments of +1500 servers, and MySpace has gone public with their deployment of +500 servers that relies on SSB to exchange the data. Overall, at large scales SSB outperforms Replication by any measure.

I would also argue that a solution based on messaging rather than table row copying is more appropriate for the description of the problem: push results to the central server.

The one draw back (and is not a minor one) is the steep learning curve required to deploy SSB initially. The know-how is out there but is hard to find, and SSB lacks the polished tools like Replication Monitor that make deploying a simple replication topology a straight forward job. On the other hand, SSB has a great advantage when upgrading deployments, as 2005/2008/2008R2 versions are perfectly compatible.

Remus Rusanu