views:

1798

answers:

4

I have a feeling that there must be client-server synchronization patterns out there. But i totally failed to google up one.

Situation is quite simple - server is the central node, that multiple clients connect to and manipulate same data. Data can be split in atoms, in case of conflict, whatever is on server, has priority (to avoid getting user into conflict solving). Partial synchronization is preferred due to potentially large amounts of data.

Are there any patterns / good practices for such situation, or if you don't know of any - what would be your approach?

Below is how i now think to solve it: Parallel to data, a modification journal will be held, having all transactions timestamped. When client connects, it receives all changes since last check, in consolidated form (server goes through lists and removes additions that are followed by deletions, merges updates for each atom, etc.). Et voila, we are up to date.

Alternative would be keeping modification date for each record, and instead of performing data deletes, just mark them as deleted.

Any thoughts?

+3  A: 

The question is not crystal clear, but I'd look into optimistic locking if I were you. It can be implemented with a sequence number that the server returns for each record. When a client tries to save the record back, it will include the sequence number it received from the server. If the sequence number matches what's in the database at the time when the update is received, the update is allowed and the sequence number is incremented. If the sequence numbers don't match, the update is disallowed.

erikkallen
Sequence numbers are your friend here. Think about persistent message queues.
Daniel Paull
+2  A: 

You should look at how distributed change management works. Look at SVN, CVS and other repositories that manage deltas work.

You have several use cases.

  • Synchronize changes. Your change-log (or delta history) approach looks good for this. Clients send their deltas to the server; server consolidates and distributes the deltas to the clients. This is the typical case. Databases call this "transaction replication".

  • Client has lost synchronization. Either through a backup/restore or because of a bug. In this case, the client needs to get the current state from the server without going through the deltas. This is a copy from master to detail, deltas and performance be damned. It's a one-time thing; the client is broken; don't try to optimize this, just implement a reliable copy.

  • Client is suspicious. In this case, you need to compare client against server to determine if the client is up-to-date and needs any deltas.

You should follow the database (and SVN) design pattern of sequentially numbering every change. That way a client can make a trivial request ("What revision should I have?") before attempting to synchronize. And even then, the query ("All deltas since 2149") is delightfully simple for the client and server to process.

S.Lott
+1  A: 

What you really need is Operational Transform (OT). This can even cater for the conflicts in many cases.

This is still an active area of research, but there are implementations of various OT algorithms around. I've been involved in such research for a number of years now, so let me know if this route interests you and I'll be happy to put you on to relevant resources.

Daniel Paull
Daniel, a pointer to relevant resources would be appreciated.
Parand
I just re-read the wikipedia article. It's come a long way and has many relevant references at the bottom of that page. I would have pointed you to the work of Chengzheng Sun - his work is referenced from wikipedia. http://en.wikipedia.org/wiki/Operational_transformation. Hope that helps!
Daniel Paull
A: 

I am following a similar line of research and came across the pattern Master-Master Row-Level synchronization. Might be worth looking into. SQL SVR can be configured to do this I read... I am afraid of the gotchas however. Does anyone have any experience implementing this?

Cindi