ansaurus

Question

Suggest a good method with least lookup time complexity

Answer 1

+4 A:

Create a key class that contains all the identity fields, and define an appropriate equals function and hash method, and then use a hash map to map from the key class to its associated value. This will give you a time complexity of O(1) per lookup in the expected case, and it only requires space proportional to the number of actual key combinations observed (typically twice the number, although you can adjust the constant for the time/space tradeoff that you desire), rather than space proportional to all possible key combinations.

Michael Aaron Safyan 2010-04-12 04:35:43

Keep in mind that it's only O(1) if you have a perfect hash (one item per bucket). This usually requires foreknowledge of the key properties.

paxdiablo 2010-04-12 04:56:21

@paxdiablo, it's still O(1), even without a perfect hash... collisions do occur, believe it or not, even with ordinary hash table implementations... and if you use something like boost::hash_combine to construct the hash, then it shouldn't be a problem coming up with a fairly spread out hash function.

Michael Aaron Safyan 2010-04-12 06:30:01

It is _not_ O(1). Perfect hashes, by definition, map each value to a unique key. That would be O(1) since the number of operations is identical regardless of the data set size. In *any* implementation where the hash isn't perfect, it's O(n) for simple lists in each bucket although it can be optimised to less if each bucket is yet another hash. But it will _never_ reach the mythical O(1).

paxdiablo 2010-04-12 06:53:22

We're going to have to agree to disagree there, I'm afraid. That's no different to saying you've stored everything in an unsorted list and saying you're bounded by 1000 array-element comparisons, therefore constant-time. The fact remains that the complexity depends, not on 1 (constant time), but on the number of elements. As n approaches infinity (or some arbitrary large number), the divisor is irrelevant for complexity analysis, it's still dependent on n.

paxdiablo 2010-04-12 07:40:57

@paxdiablo, I believe you are confusing the worst-case performance and the average-case performance. While it is true that the worst-case performance is O(n) -- assuming all hashes collide to the same index -- it is O(1) in the expected case.

Michael Aaron Safyan 2010-04-12 23:05:01

For this algorithm, I have there are 23 keys (a1, a2... a23) and the length of the list would be in the upward of 50K. It is an optimization problem using dynamic programming and I am implementing a depth first search. Hash table seems to be a popular solution to this, and assuming that I come up with a good hash function, there will be collisions leading to more than one item per bucket and in this case should I sort the contents in each bucket? Can I use some simple hash functions like sumOf(keys)? Any suggestions about the hash function for this scenario?

Amrish 2010-04-13 19:13:13

@Amrish, what programming language are you using? Most programming languages already provide a hash map / hashtable implementation, and most also provide some mechanism of creating good hashes.

Michael Aaron Safyan 2010-04-13 23:34:29

I am using C++ for my implementation

Amrish 2010-04-14 05:18:48

@Amrish, in that case, you might want to consider using boost::unordered_map as the hash map implementation and use boost::hash_combine to construct your hashes.

Michael Aaron Safyan 2010-04-14 10:01:47

Answer 2

A:

Use hash table (map). Construct the key to be "a1-a2-a3", and store data to H(key)=data.

sza 2010-04-12 04:47:58

Answer 3

A:

I would simply sort the array by key, and use a binary search.

(untested)

int compare_entry(ENTRY *k1, ENTRY *k2) {    
    int d = k1->a1 - k2->a1;
    if (d == 0) {
        d = k1->a2 - k2->a2;
        if (d == 0) {
            d = k1->a3 - k2->a3;
        }
    }
    return d; // >0 is k1 > k2, 0 if k1 == k2, <0 if k1 < k2
}

// Derived from Wikipedia
int find(ENTRY *list, int size, ENTRY *value) {
   int low = 0;
   int n = size - 1;
   int high = n;
   while (low < high) {
       int mid = low + (high - low) / 2
       int cmp = compare_entry(&list[mid], value);
       if (cmp < 0) {
           low = mid + 1;
       } else {
            high = mid; 
       }
   }
   if (low < n) {
       int cmp = compare_entry(&list[low], value);
       if (cmp == 0) {
           return low; // found item at 'low' index
       }
   } else {
        return -1;  // not found
   } 
}

Absolutely worst case, you run through this thing, what, 10 times, and end up actually doing all of the comparisons in the key comparison. So that's, what, 85 integer math operations (additions, subtraction, and 1 shift)?

if your a1-a3 are ranging 0-100, then you can make your key a1 * 10000 + a2 * 100 + a3, and do a single compare, and worst case is 63 integer math operations. And your entire array fits within cache on most any modern processor. And it's memory efficient.

You can burn memory with a perfect hash or some other sparse matrix. Even with a perfect hash, I bet the hash calculation itself is competitive with this time, considering multiplication is expensive. This hits the memory bus harder, obviously.

Will Hartung 2010-04-12 05:37:18

ansaurus

tags:

views:

answers:

Suggest a good method with least lookup time complexity

related questions