tags:

views:

87

answers:

3

Lets say I have a class A that inherits from class B in C#. Class B has a property on it called Checksum which, when called at runtime, is to calculate the checksum of all the properties on an instance of class A (the particualr checksum algorithm used is not important, one from the BCL probably).

Importantly, the checksum algorithm must ignore the checksum property otherwise it will fail when validated later (as the checksum value will have changed).

So, as far as I can see it, there are two options:

1) Iterate over all the public properties of the object using reflection, concatenate into a string and checksum that.

2) Pretend that the object is simply a bunch of contiguous memeory addresses and treat that as a byte array and checksum that.

1 - sounds slow 2 - sounds difficult as I am not sure how you're get it to ignore the string that represents the checksum itself, or how references to other objects are handled.

Does anyone have any better ideas than 1 which sounds like the better of these two solutions?

+2  A: 

You can decorate the checksum property as NonSerialized and serialize the instance of class to byte array, then compute checksum. This way the property will be ignored while serialization.

Giorgi
This sounds like a slightly more elegant implementation of (1) above, but essentailly the same, I understand that serialisation will use reflection somewhere internally and is relatively slow. If no other "novel" solutions are proposed, then I'd see this as being the way to do it.
Colin Desmond
@Colin You can make it faster: http://codebetter.com/blogs/gregyoung/archive/2008/08/24/fast-serialization.aspx
Giorgi
@Giorgi: If I understand your link correctly, this "fast serializer" requires that you implement manual serialization methods for each type. If the OP only wants a checksum, wouldn't it be simpler to just write a custom checksum method for each type? (Nice link though!)
nikie
+1  A: 

Option 3 would be to create a method on-the-fly that calculates the checksum of all properties, e.g. by using reflection.emit. This is only inefficient for the first call, but the generated method can be cached. If you know which types have to be checksummed, you could also use code-generation to create checksum-methods for them at compile time.

nikie
+2  A: 

Why does it have to be a property? If it were a method, GetChecksum() then you would not have to have any special logic so that it does not include itself in the checksum calculation. Now, what you have created is pretty much exactly the same as what the existing GetHashCode() method is for — just provide an implementation of this instead.

Typically one would code the GetHashCode() for each class explicitly although a quick web search will reveal approaches that use reflection to provide a generic (though slower) mechanism. Ususally one would take each field one wants to include the in the hashcode, convert it to an integer and multiply it by a fixed number such that the different objects with different values for the fields give different hashcodes that are well spread across the integer range.

As an example, Resharper generates GetHashCode() methods that look like this:

public override int GetHashCode()
{
    unchecked
    {
        int result = a;
        result = (result * 397) ^ (b != null ? b.GetHashCode() : 0);
        result = (result * 397) ^ c.GetHashCode();
        return result;
    }
}

Where a is an int, b is a string and c is a long. The interim value (result) is mulitplied by 397 and put to the power of the next component's hashcode at each step. The unchecked means that if the integer is overflowed (which is likely) then we discard the overflow and wrap around. This should give a reasonable coverage of the integer space in most cases — though I would recommend testing the coverage as a poor hashcode can have serious consequences on the performance of your system.

Care should be taken to handle zeroes of any field so that you do not multiply by zero and end up with a large number of objects that all have a zero hash-code.

Paul Ruane
Interesting approach. The reason for the property is that the Checksum value is to be persisted into a database using Linq2SQL or EF4, so having it as a property makes that mapping very simple. For our purposes (some level of safety), we'd need to use a "well known" algorithm such as MD5 to calculate the checksum.
Colin Desmond
OK, in which case you should not use the hash-code as this is not guaranteed to be the same across processes or subsequent runs of the process. The serialisation approach would probably better suit you.
Paul Ruane