tags:

views:

287

answers:

4

Hi,

Once again, talking about my upcoming university project... I had a class today, where we could ask stuff about the project but I still haven't decided on the best way to do this.

Basically, I have a bunch of users (some struct with a couple of members) which must be quickly searched by name and by SSN. Since I will also need to use theses users on a Graph (for other operations), I will be working with pointers.

Now, I though about using two hash tables. One where the key is the name and another where the key is the SSN. But I don't like the idea of having two Hash Tables, simply with different keys and pointing to the same place.

It crossed my mind using a Hash Table with two keys but I don't even know if that is possible and I believe it's not. I just can't think of a way to do it, maybe there is one, or maybe not.

Besides these two solutions, I can't think of any other alternative... I may have to go with the two Hash Tables.

Do you guys suggest any other alternative?

+1  A: 

I think that two Hashtables are ok. Consider binary search trees also, they can be more compact but O(log n) search and harder to implement.

"Hash Table with two keys" never heard of it...

Andrey
Me neither, I just though it could exist lol... I considered binary search trees but like you said, they are O(log n), where hash tables is way faster (in the best case of course). Also, binary search trees could be an option if I was able to implement a balanced one, but I'm having trouble with that and I'll just skip it.
Nazgulled
of course you should use balanced tree. you shouldn't skip it, it is useful knowledge/skill. just google "avl tree" or "red-black tree", there are a lot of tutorials.
Andrey
I should skip it because I have deadlines and I don't have time to be learning new stuff at the moment. I'm not gonna fail this class just because I want to be more knowledgeable or skillful. I have time for that, to finish the project? Not so much...
Nazgulled
Balanced trees are not as compact as open-addressed hash tables.
Seun Osewa
+1  A: 

I don't think there is a way to build a single hash table which supports two keys.

If you want both SSN-lookup and name-lookup to be really fast, then you need two hash tables. You have to remember to add to both of them, or remove from both of them.

Otherwise, you can make the more frequent one (e.g. SSN-lookup) as a hash-based lookup, and the other one as brute-force lookup from the hash table.

ArunSaha
Naah... If I use brute-force lookup, I'll probably have a very low grade. The point of this project it to use the "best" data structures (that we learned in the previous semester) for each case. As long as we properly justify them, we can use anything. As I see it, Hash Tables are the answer (for my project of course).
Nazgulled
@Nazgulled: Thanks for clarifying the context. It seems minimizing Time-complexity is your goal. However, depending on the context, minimizing Space-complexity is a noble goal too! :-) Make sure you identify what is your goal.
ArunSaha
+2  A: 

I'd go with two hash tables. Think of it as two indexes into a database. The database is your users, and you provide two indexes: one ssn index and one name index.

Hans W
That's what I think but what's bugging me is that I will always waste 2 pointers for the same user structure, one in each hash table. Maybe there's no way around it...
Nazgulled
The extra space for keeping two indexes shouldn't be an issue unless you are planning to store massive amounts of users. The issue is the added complexity of keeping the indexes up to date when adding, removing, renaming users or whatever you plan to do. If you think this could easily be handled, then I'd say this is the way to go.
Hans W
+1  A: 
  1. Two hash tables like you said. The advantage is that lookup will be very fast for RANDOM data or even real-life data. The disadvantage is that you don't know what your professors will throw at it (or do you?) and they might force the worst case.

  2. Balanced search trees. I recommend treaps: http://en.wikipedia.org/wiki/Treap - they are, in my opinion, the easiest to implement.

  3. Sort your users and binary search. Also O(log N) per search, and even easier to implement than a treap.

  4. A combination of hashes + sorted users / search trees, if you can afford the memory. This will make it O(1) best case and O(log N) worst case. If H[i] = list of objects that hashed to i, keep a count for each i that tells you how many objects are in that list. If that count is too big, use the sorted users list / search tree instead.

IVlad
They are preparing a data sample for we test our program yes, but we don't yet have access to it. But how can they force the best/worst case? Doesn't that depend on the hash function?
Nazgulled
It does depend on the hash function, but if they have access to your hash function they might be able to come up with test data that will trigger its worst case behavior. I don't think they'll actually go to those lengths, but it is possible. It's enough to find two different names that map to the same place in your hash table, then give you a list of 100 000 users, all having those two different names. Then searching is going to be really slow. For best running time, your best option is #4 on my list. The only disadvantage there is the memory used, but it does avoid worst case behavior.
IVlad
I don't think they will bother that far lol... Thanks for the tip though :)
Nazgulled
Well, surely you'll impress your professor if you get it to O(log N) worst case :). In fact, you can build the hash table first, then only keep a tree / sorted list if your hash table is not balanced enough. That way, you only use more memory in the worst case.
IVlad
We'll see, one thing at a time... :)
Nazgulled