tags:

views:

52

answers:

3

I have been struggling with a very simple problem... I am working with a 4 dimensional cube using AVL trees... now the problem is a performance related one... basically I have to compare billions of 64 bit numbers... (because my key is 64bit comprising the 4 dimensions of 16bits each)...

The problem is not that I can't compare 2 64-bit numbers, but rather I need to do it in as few as possible clock cycles.

Unfortunately my AVL Tree template has a signature of "int CompareKeyNode(key k, handle h)"

Under the hood I have two __int64 numbers lhs & rhs, unfortunately the contract of this method is: 1. lhs==rhs return 0 2. lhs > rhs return 1 3. lhs < rhs return -1

Now if the above method expected an __int64 number I could simply do a (METHOD A) return lhs - rhs;

Unfortunately it only needs a 32 bit integer, so I have to resort to something akin to ((METHOD B)) ie. return lhs == rhs ? 0 : (lhs < rhs ? -1 : 1)

The problem to me is that to load my data using (METHOD A) takes 60 secs whilst (METHOD B) takes 117 seconds.

Unfortunately (METHOD A) is incorrect as it loses precision on the return for casting the lhs-rhs statement.

The processing time is crucial to me and I am sure there must be a simple efficient answer which is eluding me right now...

Does anybody have an idea on how to solve this? Or perhaps any suggestions?

I am working in VC++ (un-managed) but surely .NET/Java must have solved this problem in there JITing/VM stage... otherwise they would be taking a huge performance hit.

PS: Have tried memcmp() but not good enough...

A: 

In assembly, subtract them.
zero flag set = 0
carry flag set = -1
else 1

FastAl
Unfortunately it's in C++ and not assembly... but good answer...
MarineHero
It may actually be worth inlining assembly if speed is such a concern - C++ allows this. e.g., ***http://www.codeproject.com/KB/cpp/edujini_inline_asm.aspx ***If course you may have to learn assembly. Then you may also find it's not really that much faster after all that work!
FastAl
@MarineHero: Is there some reason you can't add a smattering of inline assembly just for this?
Anon.
Is there a common case - e.g., majority are equal, just a few -1 or +1? Make sure that one's first. You probably can't get faster than comparing for equality! May try rearranging other things for efficiency, divide-and-conquer multithreading, etc. All of course dirty fixes.
FastAl
FastAI: The following assembler 101 code has my data now loading in 72 seconds! Tx!static int compare_key_node(key k, handle h){ _asm { mov eax,dword ptr [h] mov ecx,dword ptr [k] sub ecx,dword ptr [eax+10h] mov edx,dword ptr [ebp+0Ch] sbb edx,dword ptr [eax+14h] mov eax,ecx jz eql jc nga mov eax,1 jmp xxx eql: mov eax,0 jmp xxx nga: mov eax, -1 xxx: }}
MarineHero
A: 

You could try (lhs<rhs) - (rhs<lhs). No guarantee that it'll give an improvement (at all), but it's at least possible that it might...

Jerry Coffin
Now why did I not think of that... Works, but still worse than what I have... takes 130 sec versus my current 117.
MarineHero
@MarineHero: I guess that doesn't surprise me a whole lot -- most compilers don't put as much work into optimizing "strange" code, which that definitely is...
Jerry Coffin
A: 

Really all I can see is that you should either overload or rewrite the function to accept __int64. You're always gonna get a performance loss with inline logic like that. Why can't you overload your class to accept __int64?

chaz
Sure... however... I can't do so for the following reasons:1. I am using the same template based AVL Tree in different places2. I would not like to impose the 64bit overhead to into the AVL class as that is more crucial than my key comparison.
MarineHero