views:

2933

answers:

1

I have a fairly simple sync problem. I have a table with about 10 columns that I want to keep in sync between a sqlite file on 3 different clients: an Iphone client, a browser client, and a Ruby on Rails client. So I need a simple sycing solution that will work for all 3, i.e. I can easily implement it in Javascript, Objective C, and Ruby and it works with JSON over HTTP. I have looked at various components of other syncing solutions like the one in git, some of the tutorials that have come out of the Google gears community, and a rails plugin called acts_as_replica. My naive approach would be to simply create a last synced timestamp in the database and then create a changelog of all deletes as they are made. (I don't allow updates to entries in the table). I can then retrieve all the new entries since the last timestamp, combine then with the deletes, and send a changelog as json over http between the 3 solutions.

Should I consider the use of SHA1 hash or a UUID of each entry or is a last synced timestamp sufficient? How do I make sure there are no duplicate entries? Is there a simpler algorithm I could follow?

+1  A: 

I am assuming changes are likely to be at the end. I don't know the character of insert and updates but here is my idea;

  • I would SHA1 (or MD5, it doesn't matter in this case) days of the current month and months before. Comparing against these fingerprints is a fast way to see were the differences are. (I am leaving today unhashed)
  • If previous months have differences;
    • If volume for a month is too big we can then split the month and simply generate daily fingerprint on the fly instead of comparing the whole month.
    • Otherwise we can treat a monthly change the same way we treat a daily change.
  • After finding out where the changes occur, master copy would send a list of all unique id's for that period. (Always sending today's info)
  • The slave then deletes what has to be deleted and compiles a list of id's to be inserted.
  • The master sends only those records (in full).

The time categories (day, month) can be adjusted according to the data volume.

Of course this is a naive and simple algorithm. If I was processing sensitive/critical data I would look for a transactional algorithm.

muhuk