views:

106

answers:

2

What is the most bandwidth efficient way to unidirectionally synchronise a list of data from one server to many clients?

I have sizeable chunk of data (perhaps 20,000, 50-byte records) which I need to periodically synchronise to a series of clients over the Internet (perhaps 10,000 clients). Records may added, removed or updated only at the server end.

+1  A: 

Something similar to bittorrent? Or even using bittorrent. Or maybe invent a wrapper around bittorrent.

(Assuming you pay for bandwidth on your server and not the others ...)

benlumley
BT is good for the central host, but no good for the clients if updates are very frequent
Alnitak
it says unidirectional, which implies that updates come from the server out to the clients, and not the other way.
benlumley
A: 

Ok, so we've got some detail now - perhaps 10 GB of total (uncompressed) data, every 3 days, so that's 100 GB per month.

That's actually not really a sizeable chunk of data these days. Whose bandwidth are you trying to save - yours, or your clients'?

Does the data perhaps compress very readily? For raw binary data it's not uncommon to achieve 50% compression, and if the data happens to have a lot of repeated patterns within it then 80%+ is possible.

That said, if you really do need a system that can just transfer the changes, my thoughts are:

  1. make sure you've got a well defined primary key field - use that as your key to identify each record
  2. record a timestamp for each record to say when it last changed
  3. have each client tell you the timestamp of the last change it knows of, so you can calculate the deltas
  4. ensure that full downloads are possible too, in case clients get out of sync
Alnitak
Wouldn't is be easiest to simply have the clients run rsync?
potentially, yes, although getting the optimal block size could be tricky if the DB records aren't fixed length
Alnitak