I am looking for a specific hashcode that has the following properties. I dont know of any such hashcodes and I dont know if it is possible to do such a thing. Just wanted to put it out there and see what people say.
I have two databases (loosely used term - dont think of SQL or anything of that sort), one master and one backup. There is a need to keep the two databases in sync and to detect when the databases are not in sync. Instead of verifying every data in there, it would be preferable to keep some hashcode that can be verified. But the two databases dont necessarily share every modifications. Since changes from master to backup are batched, it is possible that certain modifications from the master to backup are collapsed.
ie: lets say current state of database has elements A->X, B->Y and C->Z. Now B gets modified such that B->Y1 and then later B->Y2. The only change that will get sent from master to backup is B->Y2. The intermediate B->Y1 is skipped.
Now instead of looping over every element in each database to verify that they match, we would prefer to keep a running hash code of the elements in both the locations and then just compare that. The hashcode would have to compute something like:
assuming previous hashcode of hm0:
hashcode hm1 = f(hm0, A->X, B->Y, C->Z)
When B changes now:
hashcode hm2 = f(hm1, B->Y1)
and then
hashcode hm3 = f(hm2, B->Y2)
So master will have hashcode of h3. Now backup will not receive the modification B->Y2, so if it computes a running hash code, it will be like this:
hashcode hb1 = f(hb0, A->X, B->Y, C->Z)
hashcode hb2 = f(hb1, B->Y2)
Now we want hb2 and hm3 to match, as the current state of the databases are the same. But most (if not all) hashcodes dont work this way.
So what we would want then is that we would want to "remove" the contribution of B->Y from the hash first and then 'add' the contribution of B->Y1 and then remove contribution of B->Y1 and add the contribution of B->Y2 into the hash code. So we want something like this:
Two functions, f, g: f modifies existing hashcode by adding contribution of a new element, while g modifies existing hashcode by removing contribution of an element.
On master:
hm1 = f(hm0, A->X, B->Y, C->Z)
when B gets modified to B->Y1:
hm2 = g(hm1, B->Y)
hm3 = f(hm2, B->Y1)
when B get modified to B->Y2:
hm4 = g(hm3, B->Y1)
hm5 = f(hm4, B->Y2)
hm5 is the new hashcode for the current state of the database (A->X, B->Y2, C->Z)
On backup:
hb1 = f(hb0, A->X, B->Y, C->Z)
when B gets modified to B->Y2:
hb2 = g(hb1, B->Y)
hb3 = f(hb2, B->Y2)
Now hm5 and hb3 should match, as the current state of both databases is the same.
So: are there such algorithms f and g? I hope I made the question clear.... Thanks.