views:

1026

answers:

4

I am in need of the fastest hash function possible in Delphi 2009 that will create hashed values from a Unicode string that will distribute fairly randomly into buckets.

I originally started with Gabr's HashOf function from GpStringHash:

function HashOf(const key: string): cardinal;
asm
  xor edx,edx     { result := 0 }
  and eax,eax     { test if 0 }
  jz @End         { skip if nil }
  mov ecx,[eax-4] { ecx := string length }
  jecxz @End      { skip if length = 0 }
@loop:            { repeat }
  rol edx,2       { edx := (edx shl 2) or (edx shr 30)... }
  xor dl,[eax]    { ... xor Ord(key[eax]) }
  inc eax         { inc(eax) }
  loop @loop      { until ecx = 0 }
@End:
  mov eax,edx     { result := eax }
end; { HashOf }

But I found that this did not produce good numbers from Unicode strings. I noted that Gabr's routines have not been updated to Delphi 2009.

Then I discovered HashNameMBCS in SysUtils of Delphi 2009 and translated it to this simple function (where "string" is a Delphi 2009 Unicode string):

function HashOf(const key: string): cardinal;
var
  I: integer;
begin
  Result := 0;
  for I := 1 to length(key) do
  begin
    Result := (Result shl 5) or (Result shr 27);
    Result := Result xor Cardinal(key[I]);
  end;
end; { HashOf }

I thought this was pretty good until I looked at the CPU window and saw the assembler code it generated:

Process.pas.1649: Result := 0;
0048DEA8 33DB             xor ebx,ebx
Process.pas.1650: for I := 1 to length(key) do begin
0048DEAA 8BC6             mov eax,esi
0048DEAC E89734F7FF       call $00401348
0048DEB1 85C0             test eax,eax
0048DEB3 7E1C             jle $0048ded1
0048DEB5 BA01000000       mov edx,$00000001
Process.pas.1651: Result := (Result shl 5) or (Result shr 27);
0048DEBA 8BCB             mov ecx,ebx
0048DEBC C1E105           shl ecx,$05
0048DEBF C1EB1B           shr ebx,$1b
0048DEC2 0BCB             or ecx,ebx
0048DEC4 8BD9             mov ebx,ecx
Process.pas.1652: Result := Result xor Cardinal(key[I]);
0048DEC6 0FB74C56FE       movzx ecx,[esi+edx*2-$02]
0048DECB 33D9             xor ebx,ecx
Process.pas.1653: end;
0048DECD 42               inc edx
Process.pas.1650: for I := 1 to length(key) do begin
0048DECE 48               dec eax
0048DECF 75E9             jnz $0048deba
Process.pas.1654: end; { HashOf }
0048DED1 8BC3             mov eax,ebx

This seems to contain quite a bit more assembler code than Gabr's code.

Speed is of the essence. Is there anything I can do to improve either the pascal code I wrote or the assembler that my code generated?


Followup.

I finally went with the HashOf function based on SysUtils.HashNameMBCS. It seems to give a good hash distribution for Unicode strings, and appears to be quite fast.

Yes, there is a lot of assembler code generated, but the Delphi code that generates it is so simple and uses only bit-shift operations, so it's hard to believe it wouldn't be fast.

+6  A: 

ASM output is not a good indication of algorithm speed. Also, from what I can see, the two pieces of code are doing almost the identical work. The biggest difference seem to be the memory access strategy and the first is using roll-left instead of the equivalent set of instructions (shl | shr -- most higher-level programming languages leave out the "roll" operators). The latter may pipeline better than the former.

ASM optimization is black magic and sometimes more instructions execute faster than fewer.

To be sure, benchmark both and pick the winner. If you like the output of the second but the first is faster, plug the second's values into the first.

rol edx,5 { edx := (edx shl 5) or (edx shr 27)... }

Note that different machines will run the code in different ways, so if speed is REALLY of the essence then benchmark it on the hardware that you plan to run the final application on. I'm willing to bet that over megabytes of data the difference will be a matter of milliseconds -- which is far less than the operating system is taking away from you.


PS. I'm not convinced this algorithm creates even distribution, something you explicitly called out (have you run the histograms?). You may look at porting this hash function to Delphi. It may not be as fast as the above algorithm but it appears to be quite fast and also gives good distribution. Again, we're probably talking on the order of milliseconds of difference over megabytes of data.

Talljoe
I can't agree with this enough. On modern processors, trying to hand-optimize assembler is very nearly if not actually a thing of the past.
Lee
I do appreciate your ideas. I don't really intend to try to go crazy optimizing the assembler code. But I would like to eliminate obvious overhead. One run of my program can call the hash function hundreds of millions of times as it's used for almost everything
lkessler
@lkessler, There isn't much overhead to eliminate here. You'll probably find greater optimizations figuring out places to cache the value than you will squeeze out of a couple of microseconds of execution in the hash function. When you profile your application and see that most of your time is being spent in the hash method there are two options -- optimize the hash function (not much further to go) or figure out how to call it less. Your best bet right now is the latter.
Talljoe
I found this: http://landman-code.blogspot.com/2008/06/superfasthash-from-paul-hsieh.html
lkessler
+4  A: 

There has been a bit of discussion in the Delphi/BASM forum that may be of interest to you. Have a look at the following:

http://forums.embarcadero.com/thread.jspa?threadID=13902&tstart=0

PhiS
+5  A: 

We held a nice little contest a while back, improving on a hash called "MurmurHash"; Quoting Wikipedia :

It is noted for being exceptionally fast, often two to four times faster than comparable algorithms such as FNV, Jenkins' lookup3 and Hsieh's SuperFastHash, with excellent distribution, avalanche behavior and overall collision resistance.

You can download the submissions for that contest here.

One thing we learned was, that sometimes optimizations don't improve results on every CPU. My contribution was tweaked to run good on AMD, but performed not-so-good on Intel. The other way around happened too (Intel optimizations running sub-optimal on AMD).

So, as Talljoe said : measure your optimizations, as they might actually be detrimental to your performance!

As a side-note: I don't agree with Lee; Delphi is a nice compiler and all, but sometimes I see it generating code that just isn't optimal (even when compiling with all optimizations turned on). For example, I regularly see it clearing registers that had already been cleared just two or three statements before. Or EAX is put into EBX, only to have it shifted and put back into EAX. That sort of thing. I'm just guessing here, but hand-optimizing that sort of code will surely help in tight spots.

Above all though; First analyze your bottleneck, then see if a better algorithm or datastructure can be used, then try to optimize the pascal code (like: reduce memory-allocations, avoid reference counting, finalization, try/finally, try/except blocks, etc), and then, only as a final resort, optimize the assembly code.

PatrickvL
+4  A: 

I'v written two assembly "optimized" functions in Delphi, or more implemented known fast hash algorithms in both fine-tuned pascal and Borland Assembler. The first was a implementation of SuperFastHash, and the second was a MurmurHash2 implementation triggered by a request from Tommi Prami on my blog to translate my c# version to a pascal implementation. This spawned a discussion continued on the Embarcadero Discussion BASM Forums, that in the end resulted in about 20 implementations (check the latest benchmark suite) which ultimately showed that it would be difficult to select the best implementation due to the big differences in cycle times per instruction between Intel and AMD.

So, try one of those, but remember, getting the fastest every time would probably mean changing the algorithm to a simpler one which would hurt your distribution. Fine-tunning a implementation takes lots of time and better create a good validation and benchmarking suite to make check your implementations.

Davy Landman
Davy: It's nice to hear from the person who did the work. I noted your implementation in my comment to talljoe's answer, and the discussion was pointed out by PhiS.It looks like the SuperFastHash has a lot of code, especially when you compare it to the six lines of pascal in the HashOf function of my question. I'm wondering what would make SuperFastHash faster than HashOf, and if it is faster, then by how much?
lkessler
@lkessler: your questions all point to what has been mentioned in every answer, create a benchmarking program to simulate your expected usage of the hash function, measure speed and distribution both and you might find the reason why SuperFastHash/MurmurHash2 are probably slower than HashOf. For small strings (10 chars) I would *expect* HashOf to be faster, for larger strings the other functions have unrolled loops to take advantage off.
Davy Landman