views:

440

answers:

6

Hi Guys, I have a n-tier winform client server application running against sqlserver DB. I want it to be able sometimes to run "offline" (not connected to the DB) and on reconnect, reconsile the changes to the main DB. Now, this i have tough architecture decision to make: should i use database replication or manage it myself using queues/scripts etc. My application is quit compicated - i use database with tables containing autoincrement keys and forien keys constraints between tables. Part of my data is not embedded in the DB like pictures and documents. I would like to hear your opinion and past experience very much! Thanks, Adi

A: 

I've never done anything like that before, but it looks to me that if you go that way you might get into serious problems...

Technically I don't think that it's really that hard to implement. Basically you will have to set a copy of the database on each client and synchronise with the server every time the client connects to the server, but I guess you already got that far.

I would had a bit column and a datestamp on each table at the client so I could check which records have been changed off-line. On the server side a datestamp column to record the last update to the object will do the trick.

As for the primary keys with auto increment, I would loose them because you will need to set them yourself to prevent creating two records with the same key(you might need to change them on synchronising).

This is the easy part... Now is where things will get messy... You need to take into account that this will bring you a lot of headaches... All sorts or not desired events will happen, some examples: - Two users change the same record off-line. - One user changes a record on-line and another off-line - One user deletes a record on-line while another is working it off-line

The list of potential problems go on and on, before you start addressing them you must enumerate every single one and document with your clients how they expect the system to handle each case, otherwise when they loose data (and this will happen no matter what you do) it will be your fault instead of theirs.

I recommend that you build a versioning system for every table in your database that can be changed off-line. Users will mess their data and it will be nice for them to perform roll-backs.

Sergio
+2  A: 

(Disclaimer: I'm assuming that you've already considered using .NET DataSets and discounted them, given that they're designed to help with just the problem domain that you're describing.)

I used to work for a company that developed a point-of-sale system for its nationwide chain of shops. The master database was stored at head office, while each shop had its own cut-down version of this database stored locally at that site. Effectively, each shop was off-line all the time, so it's not quite the situation that you're describing, however we had to deal with some of the synchronisation/replication issues that I imagine you will need to deal with.

Our data communications happened each night: shops would connect to head office at a pre-determined time, upload a package of data changes, and download a similar package of data changes that were to be applied to that shop's local database. We then had what you might call 'data sync engines' at both sites (head office & shops) which would process these data packets, folding the changes (inserts/updates/deletions) back into the relevant database.

When you perform basic data replication like this, there are a number of potential pitfalls as Sergio has mentioned. One is identity, namely how you derive a primary key that uniquely identifies a table row. Another is versioning, and how you handle conflicts between different versions of the same row.

In our case, we made things easy(-ier!) for ourselves by using GUIDs as primary keys rather than using auto-increment columns. Using GUIDs is not without its issues, but in our case it meant that we could assign a primary key to a new data row and not have to worry about anyone else using it.

I'm a bit hazy on how we handled the versioning problem (it's been a few years!), but from memory I think we had two timestamps on each table row: one of these recorded the date/time when the row was updated at head office; the other, when it was updated at the shop. Each row also had two 'version numbers' that indicated the version of the row at head office and at the shop. Data reconciliation involved comparing these timestamps and version numbers against each other, with the most recent change 'winning' (assuming the other party hadn't changed the row of course).

As Sergio points out, your biggest problem will be handling data reconciliation conflicts. In our case, this occurred when a shop and head office changed the same data item on the same day. We worked around this by always failing the change at the shop end, and writing a custom data reconciliation application at head office, which involved a user visually comparing and merging two conflicting versions of a data item. In theory I suppose you could automate the merging of different versions using some custom processing rules, but you would need to weigh-up the cost of developing something like that versus the likelihood of conflicts arising. From memory, this never proved to be that big a problem for our system, despite there being a large number of shops (a few hundred) making changes to the same set of data. YMMV of course.

You're right Steve - in the end we did effectively have 2 timestamps, and a means of trying to keep the store server's system time within a reasonable tolerance of the head office server time.
robsoft
A: 

I've done this several times now at different places (see Steve Rands' answer below) and I would strongly urge you NOT to use normal replication - especially if there are going to be several databases involved.

The reason I say this is that in my experience replication isn't smart enough to deal with the problems that can arise when you bring a remote site back online (or when you decide to add a new site to the overall network).

Replication is fine for this kind of thing if you only have 2 or 3 different databases but if you are talking about lots of different locations that can be online/offline at any time, and information can be added (or deleted or amended) at any of those locations, it won't take you long to get something into a confused state. It's not a very technically satisfying thing to say, but you will always be able to think of special cases where you wouldn't want the replication to do what it will, by design, want to do.

If you're only dealing with 2 databases then obviously the replication problems become much more straightforward and you will probably find that you can use merge replication for the job (though you have to watch your database design).

I've just bought a second-hand copy of the Apress SQL Server 2005 Replication Bible (not in the office so don't have the author to hand but it's a well-recommended, monster tome) - within the first couple of chapters I began to realise that replication is not a magic bullet solution if you're really changing data at two (or more) ends. :-)

robsoft
A: 

Thank you Guys. It was very helpfull. I think i'll design my own synch mechanism which will be script based.

A: 

This is usualy called a briefcase model, you can use the Microsoft Synchronization Services for ADO.NET

Osama ALASSIRY
A: 

You should look at the Microsoft Sync Framework.

Building an occasionally offline solution yourself from scratch is a complex undertaking. In my career I have seen many good development teams mess it up. I'm not saying that you would have problems building it yourself, but why not use something that already exists? And if you find it doesn't meet your needs, you will probably have a better understanding of how to code your own solution.

The tradeoff is that you would have to learn the Synch Framework, but there are samples that you could probably leverage immediately.

David McDonald