views:

1112

answers:

4

I need an algorithm to store a key/value pair, where the key is an Int64. I'm currently using a sorted IntList (same as a TStringList, but stores int64s). This gives me O(log n) for search, Insert and delete operations. Since I don't ever need the items sorted, this is a little inefficient. I need some kind of hashtable for O(1) operations. The problem is that most implementations I can find assume the key is a string. Now I could obviously convert the Int64 key to a string, but this does seem wasteful. Any ideas?

I do not know the number of items before they are entered to the data structure.

I also should add that I have implemented the same component in .net, using Dictionary, and it's adding the items that is so much faster in the .net version. Once the data structure is setup, traversals and retrievals are not that bad in comparison, but it's insertion that is killing me.

+2  A: 

You could build a hash-table, where the hash-value is a simple modulo of the Int64 you're adding to the hash.

Any good hash-table implementation will have the generation of the hash-index (by hashing the key) separate from the rest of the logic.

Some implementations are summed up here : http://stackoverflow.com/questions/179162/hashtable-implementation-for-delphi-5

PatrickvL
+1  A: 

Some thoughts, not a full blown solution.

Unless there is definite proof that the search itself is the bottleneck (don't use your "feeling" to detect bottlenecks, use a code profiler) I would stick with the IntList... If the time spent in the actual search/insert/delete does not amount for at least 20% of the total processor time, don't even bother.

If you still want a hashtable, then ...

Do not convert to a string. The conversion would allocate a new string from the heap, which is much more costly than doing the search itself. Use the int64 modulo some cleverly chosen prime number as the hash key.

Hashtables will give you O(1) only if they are large enough. Otherwise, you will get a large amount of records that share the same hash key. Make it too short, you'll waste your time searching (linearly !) through the linked list. Make it too large, and you waste memory.

Keep in mind that hash tables require some form of linked list to keep all records sharing the same key. This linked list must be implemented either by adding a "next" pointer in the payload objects (which breaks encapsulation - the object does not have to know it is stored in a hash table) or allocating a small helper object. This allocation is likely to be much more costly than the O(log) of the sorted list.

Stephan Leclercq
What I have done is implemented the same thing in Delphi Prism/Oxygene. I used Dictionary<key, value>, and it's the insertion which is so much faster in this version.
Steve
I do agree that it is "much" faster but how much faster is your whole application? Even if you increase 1000-fold the speed of the insertion, if the un-optimized insertion used only 1% of the total CPU time, you've gained nothing.
Stephan Leclercq
+2  A: 

You can compute a hash value directly from the int64 value, but for that you need to find a hash function which distributes the different int64 values evenly, so that you get little to no collisions. This of course depends on the values of those keys. If you don't know the number of items you most probably also don't know how these int64 values are distributed, so coming up with a good hash function will be hard to impossible.

Assuming your keys are not multiples of something (like addresses, which will be multiples of 4, 8, 16 and so on) you could speed things up a little by using a list of several of those IntList objects, and compute first an index into this array of lists. Using the mod operator and a prime number would be an easy way to calculate the list index. As always this is a trade-off between speed and memory consumption.

You might also google for a good implementation of sparse arrays. IIRC the EZDSL library by Julian Bucknall has one.

mghie
EzDsl is easy and stable, and well debugged. If you don't have Delphi 2009 or 2010, this is the best way to go.
Warren P
+1  A: 

Delphi 2009 and later has added Generics.

So starting Delphi 2009, you can implement your key/value pair in a similar manner as you do in .NET using a TDICTIONARY.

And TDICTIONARY in Delphi uses a hash table table and has O(1) operations.

lkessler