views:

1428

answers:

5

Delphi 2009 added the GetHashCode function to TObject. GetHashCode returns an Integer which is used for hashing in TDictionary.

If you want an object to work well in TDictionary, you need to override GetHashCode appropriately such that, in general, different objects return different integer hash codes.

But what do you do for objects containing double fields? How do you turn those double values into a integers for GetHashCode?

The way it's usually done in Java, say, is to use a method like Double.doubleToLongBits or Float.floatToIntBits. The latter has documentation that describes it as follows: "Returns a representation of the specified floating-point value according to the IEEE 754 floating-point "single format" bit layout." This involves some bitwise operations with different masks for the different bits of a floating point value.

Is there a function that does this in Delphi?

+2  A: 

If you want to map a double to an integer, you can use a variant record:

type
  TVarRec = record
    case Integer of
      0: ( FInt : Integer; )
      1: ( FDouble : Double; )
  end;


function Convert(const ADouble: Double): Integer;
var
  arec : TVarRec;
begin
  arec.FDouble := ADouble;
  Result := arec.FInt;
end;

Beware that this does a bitwise copy without interpretation of the values.

Another (kind of dirty trick, is using absolute variables:

function Convert(const ADouble: Double): Integer;
var
  tempDouble : Double;
  tempInt    : Integer absolute tempDouble; // tempInt is at the same memory position as tempDouble.
begin
  tempDouble := ADouble;
  Result := tempInt;
end;
Gamecat
Thanks Gamecat. Those methods seem to work well for some double numbers, but you get a lot of double numbers that give the same integer figures. For example, it seems that all whole numbers give an integer value of zero. Might it be possible to improve on this somehow to reduce the chances of hash codes being identical? Or is this just because I'm testing with regular patterns of numbers?
MB
That won't work because sizeof(double) = 8, whereas sizeof(integer) = 4. You could map two integers onto the double if you wanted, though...
Mason Wheeler
+3  A: 

I'd suggest the following improvement over the Gamecat code:

type
  TVarRec = record
    case Integer of
      0: ( FInt1, FInt2 : Integer; )
      1: ( FDouble : Double; )
  end;

function Convert(const ADouble: Double): Integer;
var
  arec : TVarRec;
begin
  arec.FDouble := ADouble;
  Result := arec.FInt1 xor arec.FInt2;
end;

This takes into account all the bits of the Double value.

(comments do not work well with code)

Kcats
Excellent - that seems to work great, thanks :)
MB
Thanks for the improvement ;-).
Gamecat
A: 

There's really no need to do something like this, because the default value of GetHashCode already returns a number that's guaranteed to be unique for each object: the object's memory address. Furthermore, the default hash value isn't going to change if you change the data your object contains.

Let's say you have an object that contains a Double with a value of 3.5, and you hash it and put it into a dictionary, and you get a hash code of 12345678. You also have something else holding a reference to it, and that Double field gets changed and now it's got a value of 5.21. Next time you attempt to calculate its hash value, your hash code is now 23456789, and your lookup will fail.

Unless you can guarantee that this will never happen, and you have a really good reason not to use the memory address, your best bet is to just leave GetHashCode as it is. (If it ain't broke, don't fix it.)

Mason Wheeler
Indeed I think it would cause trouble if you used mutable objects as keys in a hashtable, and then modified them while they were in the hashtable. But I think you need to override GetHashCode if you override Equals. That's how it works in Java anyway. Is there anything different about Delphi in that respect (I'm not all that clued up when it comes to Delphi)? As I understand it, hashtables (and presumably TDictionary) typically rely on both GetHashCode and Equals to locate a particular item.
MB
It relies on both of them, but it relies on them independently. There's no need to change one when you change the other. See TDictionary<TKey,TValue>.ContainsValue and TDictionary<TKey,TValue>.GetBucketIndex for the details of how the values are used.
Mason Wheeler
Hmm yes I think you're right there. GetBucketIndex uses FComparer, internal to TDictionary, but by default it looks like that does an identity comparison.So, whilst in Java "equal objects must have equal hash codes" is the rule that means you have to override hashCode a lot, it appears that that's not the case in Delphi... That's good :)
MB
Yeah. You can look up the default functions in System.pas, and you'll see that they have no dependencies on one another. Equal objects have equal hash codes by default in Delphi, but that's not a requirement.
Mason Wheeler
Actually... I just wrote a couple of simple test cases using TDictionary. I made an object containing one field - an integer. If two of these objects contain the same integer I want them to be considered equal, so I overrided Equals to specify that. But I also want to use these objects as keys in TDictionary. I want to be able to access the value for any particular key by having an equal object (equal according to my Equals, not memory address). Anyway, unless I messed up somehow, my tests showed that you actually *need* to override GetHashCode as well as Equals if you want this to work.
MB
Ah. Well if that's what you want to do, then yes, you'll need to override both of them. Just make sure you make your key objects immutable, or at least the properties/fields that are used to calculate the hash code and the equality.
Mason Wheeler
Will do. Appreciate the input Mason.
MB
A: 

I guess the Java thing can be implemented in Delphi like this:

type
  TVarRec = record
    case Integer of
      0: ( FInt1: Integer; )
      1: ( FSingle: Single; )
  end;

function GetHashCode(Value: Double): Integer;
var
  arec: TVarRec;
begin
  arec.FSingle := Value;
  Result := arec.FInt1;
end;

The idea behind is to reduce the precision of the Double value to match the binary size of an Integer (Sizeof(Single) = Sizeof(Integer)). If your values can be represented in Single precision without collision, this will give a good hash value.

Edit: As the typecast won't compile in my D2009, I adapted the variant record solution.

Uwe Raabe
Not a bad idea, but wouldn't it cause trouble if Value is greater than the maximum size that Single allows? Also if you have lots of doubles that round to the same Integer value you'll get a lot of duplicated hashcodes.
MB
Well, I don't know about your values, but MaxSingle=3.4e+38, which is pretty much. The probality of collisions is not that high, because the values are not rounded to Integers, but cast to Integers. A Single cast to an Integer has the same bit represantation, but the Integer value has no meaning at all.
Uwe Raabe
A: 

Use CRC32 on the Double data because xor is evil.

program Project1;

{$APPTYPE CONSOLE}

uses
  SysUtils;

type
  TVarRec = record
    case Integer of
      0: ( FInt1, FInt2 : Integer; );
      1: ( FDouble : Double; );
  end;

function Convert(const ADouble: Double): Integer;
var
  arec : TVarRec;
begin
  arec.FDouble := ADouble;
  Result := arec.FInt1 xor arec.FInt2;
end;

var
  FDoubleVar1, FDoubleVar2: TVarRec;
  HashCode1, HashCode2: Integer;
begin
  // Make a Double
  FDoubleVar1.FInt1 := $DEADC0DE;
  FDoubleVar1.FInt2 := $0C0DEF00;

  // Make another Double
  FDoubleVar2.FInt1 := $0C0DEF00;
  FDoubleVar2.FInt2 := $DEADC0DE;

  WriteLn('1rst Double   : ', FDoubleVar1.FDouble);
  WriteLn('2nd Double    : ', FDoubleVar2.FDouble);

  HashCode1 := Convert(FDoubleVar1.FDouble);
  HashCode2 := Convert(FDoubleVar2.FDouble);

  WriteLn('1rst HashCode : ', HashCode1);
  WriteLn('2nd HashCode  : ', HashCode2);

  if HashCode1 = HashCode2 then
  begin
    WriteLn('Warning: Same HashCode!');
  end;
  ReadLn;
end.
pani
Any reasoning here?
Smasher
I added an example above where different Doubles give the same hashcode because of xor.
pani
Hash tables will always lead to collisions, especially for doubles - a much larger set than the set of all possible hash values.
Smasher