I'm in the process of a writing an application, which needs to synchronize a file-structure between a client and a (http) server.
The file-structure is essentially a list of file-paths where each path is a string connected with 1 or more data-block ids (256-bit reference to the actual data-block). A data-block can be referenced by several files so there's a n-m relation between paths and ids. Right now it is just a list of paths with there ids, but it can easily be converted to the tree structure which the paths represent, if that's necessary for the synchronization.
I'm looking for a data structure which allows me to sync this data efficiently. Mainly achieving two goals:
- A change in one file should not force the client to send the entire file-strcuture to the server, only a small subset of it.
- If many files are changed these changes should be grouped together. E.g. so that 1000 changes doesn't result in 1000 requests to the server.
As you see, the goals are a bit conflicting and I'm therefore looking for something which finds a good middleground between them. The second goal can easily be achieved by grouping several changes into one http-request, but then the processing required by the server (to parse all changes requested by the HTTP-request) should be very inexpensive, computing wise.
I should also mention that there could be several clients synchronizing the same structure on the server. It must therefore be easy to detect the changes by one client and then syncrhonize it to an other client (i.e. it's not just an upload to the server).
I'm certainly not the first one doing something like this, so I assume there are some smart solutions available. For instance, I guess both Dropbox and Subversion have similar requirements when they sync their meta-data. Does anyone happen to know how they have implemented it?