views:

347

answers:

1

I'm trying to understand how the rsync algorithm works with respect to rolling checksums and blocks that match in a staggered fashion.

The wikipedia page seems to suggest that the sender and receiver both calculate and exchange rolling checksums for all possible blocks. But that would mean sending essentially one checksum per byte! I must be missing something. How does it work to be able to align blocks?

e.g. if S = 16 byte blocks, and the sender has this text for file A:

The quick brown fox jumps over the lazy dogs

and the receiver has this text for file B:

The quick brown fox jumped over the lazy dog

how would an rsync exchange work?

+3  A: 

The receiver computes and sends rolling checksums only for non overlapping blocks. The sender on the contrary computes it for every possible block (but keep the result local). Then for the sender, it's just a matter of checking if one of the non overlapping block (sent by the receiver) match with any (overlapping) local block.

Your example is too simple to see anything interesting, the two last blocks simply won't match and will be sent for merging.

With a more interesting example (uppercase is a block):

sender:

A B Cabc D

receiver:

A B C D

The receiver will send the MD5 and rolling hash for A, B, C and D. The sender will compute the rolling hash for every (overlapping) block, it will match for A, for B, for C and for D. Since abc isn't match it will send it with the information where to merge it.

tonfa
oh, I must have misread the bit about the receiver not sending all the checksums. Thanks.
Jason S