ansaurus

Question

How does the rsync algorithm correctly identify repeating blocks?

Answer 1

+1 A:

The rsync algorithm sends two checksums: one for each chunk, and a "rolling" checksum for the whole file. In your example, A will see a difference in the rolling checksum once it gets to the "doubled-up" block.

Dean Harding 2010-04-01 03:17:15

Sending a checksum for the whole file is a great idea. I don't understand how A will see the difference once it gets to the doubled up block. It seems to me like the difference can only be detected once the entire checksum for A is computed, at which point we don't know what the repeating block is.

Kai 2010-04-01 03:33:14

@Kai: Oops, I was just trying to rephrase that comment to make it clearer, and I lost it. The summary: from what I understand, it's a rolling *checksum*, not hash; the checksum for one block depends on the checksum for the previous block.

Jefromi 2010-04-01 03:50:22

Ohh!!! The weak hash is a rolling checksum over the entire file! But its value recorded at the end of each block. Now it makes sense. Thanks codeka and Jefromi, I wouldn't have understand without both of your explanations.

Kai 2010-04-01 03:53:21

ansaurus

tags:

views:

answers:

How does the rsync algorithm correctly identify repeating blocks?

related questions