Two hash tables, hash table with double key or different solution?

views:

287

answers:

+1 Q:

Two hash tables, hash table with double key or different solution?

Hi,

Once again, talking about my upcoming university project... I had a class today, where we could ask stuff about the project but I still haven't decided on the best way to do this.

Basically, I have a bunch of users (some struct with a couple of members) which must be quickly searched by name and by SSN. Since I will also need to use theses users on a Graph (for other operations), I will be working with pointers.

Now, I though about using two hash tables. One where the key is the name and another where the key is the SSN. But I don't like the idea of having two Hash Tables, simply with different keys and pointing to the same place.

It crossed my mind using a Hash Table with two keys but I don't even know if that is possible and I believe it's not. I just can't think of a way to do it, maybe there is one, or maybe not.

Besides these two solutions, I can't think of any other alternative... I may have to go with the two Hash Tables.

Do you guys suggest any other alternative?

+1 A:

I think that two Hashtables are ok. Consider binary search trees also, they can be more compact but O(log n) search and harder to implement.

"Hash Table with two keys" never heard of it...

Andrey 2010-03-02 19:08:22

Me neither, I just though it could exist lol... I considered binary search trees but like you said, they are O(log n), where hash tables is way faster (in the best case of course). Also, binary search trees could be an option if I was able to implement a balanced one, but I'm having trouble with that and I'll just skip it.

Nazgulled 2010-03-02 19:11:11

of course you should use balanced tree. you shouldn't skip it, it is useful knowledge/skill. just google "avl tree" or "red-black tree", there are a lot of tutorials.

Andrey 2010-03-02 19:16:07

I should skip it because I have deadlines and I don't have time to be learning new stuff at the moment. I'm not gonna fail this class just because I want to be more knowledgeable or skillful. I have time for that, to finish the project? Not so much...

Nazgulled 2010-03-03 12:40:41

Balanced trees are not as compact as open-addressed hash tables.

Seun Osewa 2010-09-02 02:12:53

+1 A:

I don't think there is a way to build a single hash table which supports two keys.

If you want both SSN-lookup and name-lookup to be really fast, then you need two hash tables. You have to remember to add to both of them, or remove from both of them.

Otherwise, you can make the more frequent one (e.g. SSN-lookup) as a hash-based lookup, and the other one as brute-force lookup from the hash table.

ArunSaha 2010-03-02 19:12:57

Naah... If I use brute-force lookup, I'll probably have a very low grade. The point of this project it to use the "best" data structures (that we learned in the previous semester) for each case. As long as we properly justify them, we can use anything. As I see it, Hash Tables are the answer (for my project of course).

Nazgulled 2010-03-02 19:16:33

@Nazgulled: Thanks for clarifying the context. It seems minimizing Time-complexity is your goal. However, depending on the context, minimizing Space-complexity is a noble goal too! :-) Make sure you identify what is your goal.

ArunSaha 2010-03-02 19:32:53

+2 A:

I'd go with two hash tables. Think of it as two indexes into a database. The database is your users, and you provide two indexes: one ssn index and one name index.

Hans W 2010-03-02 19:14:25

That's what I think but what's bugging me is that I will always waste 2 pointers for the same user structure, one in each hash table. Maybe there's no way around it...

Nazgulled 2010-03-02 19:20:30

The extra space for keeping two indexes shouldn't be an issue unless you are planning to store massive amounts of users. The issue is the added complexity of keeping the indexes up to date when adding, removing, renaming users or whatever you plan to do. If you think this could easily be handled, then I'd say this is the way to go.

Hans W 2010-03-02 19:25:06

+1 A:

Two hash tables like you said. The advantage is that lookup will be very fast for RANDOM data or even real-life data. The disadvantage is that you don't know what your professors will throw at it (or do you?) and they might force the worst case.
Balanced search trees. I recommend treaps: http://en.wikipedia.org/wiki/Treap - they are, in my opinion, the easiest to implement.
Sort your users and binary search. Also O(log N) per search, and even easier to implement than a treap.
A combination of hashes + sorted users / search trees, if you can afford the memory. This will make it O(1) best case and O(log N) worst case. If H[i] = list of objects that hashed to i, keep a count for each i that tells you how many objects are in that list. If that count is too big, use the sorted users list / search tree instead.

IVlad 2010-03-02 19:20:42

They are preparing a data sample for we test our program yes, but we don't yet have access to it. But how can they force the best/worst case? Doesn't that depend on the hash function?

Nazgulled 2010-03-02 19:25:28

It does depend on the hash function, but if they have access to your hash function they might be able to come up with test data that will trigger its worst case behavior. I don't think they'll actually go to those lengths, but it is possible. It's enough to find two different names that map to the same place in your hash table, then give you a list of 100 000 users, all having those two different names. Then searching is going to be really slow. For best running time, your best option is #4 on my list. The only disadvantage there is the memory used, but it does avoid worst case behavior.

IVlad 2010-03-02 19:34:17

I don't think they will bother that far lol... Thanks for the tip though :)

Nazgulled 2010-03-02 20:11:52

Well, surely you'll impress your professor if you get it to O(log N) worst case :). In fact, you can build the hash table first, then only keep a tree / sorted list if your hash table is not balanced enough. That way, you only use more memory in the worst case.

IVlad 2010-03-02 20:17:19

We'll see, one thing at a time... :)

Nazgulled 2010-03-03 12:41:34

ansaurus

tags:

views:

answers:

Two hash tables, hash table with double key or different solution?

related questions