views:

84

answers:

6

Hi. I have a class like this:

class Foo
{
   long long Id;
   string x;
   string y;
   // other member variables and functions
};

I would like to store this in a hash_set (or hash_map), but use the Id member variable as the key for inserting and searching. I'm not sure how I can do this. I thought of the following ways, but none of them are really good:

1) I can write a custom hash function that will hash the object using the Id, but then I can't use the find() method on hash_set to lookup the item by Id (long long) since it will require a Foo object to be passed-in.

2) I can duplicate the Id and create a hash_map instead of a hash_set but I have 100 million instances of these objects so I'd rather not duplicate the Id field.

3) I can move the Id field outside of Foo and then do hash_map, but it would be kind of messy since the Id is used internally by the class and it would be better to keep it with Foo.

Any ideas? What I'm looking for is a way to store Foo objects but be able to search for them in the hash_set using a long long (by Id).

Thanks!

A: 

You can use any variable or combination of variables you want to create the key/hash value for hash_set/hash_map because C++ is like that. One suggestion I will make, however, is that whatever method you chose to create the value remain fixed. That is, you do not allow someone to insert the object into a collection and then change the value of a variable in which the key value depends.

Overcoming some of the limitations you mention:

  1. One poor method I've seen people attempt to use to overcome this is with a constructor that accepts an int variable and populates the 'id' attribute. This is both good and bad in that it saves time for this one small case but opens up a world of problems with implicit conversions. You'll need to make it an explicit constructor if you go this route.

  2. Create a function which creates an ID attribute. Really, 100 million ints aren't going to hurt you as bad as you think if this is really a problem. I'd worry first about just getting it working before looking at any optimizations. At least with a function that creates the Id, you know that you'll be able to figure out the key you want by pushing the variables you're looking for. Additionally, it'll be easy to refactor code.

  3. If they're connected I wouldn't separate them. Invariably you or someone who comes after you won't understand the design decision. They'll screw it up and they'll blame you for a poor implementation (and rightly so, too!)


wheaties
A: 

This may not be a particularly elegant solution, but it works: define an operator < (and operator == as well, as CashCow suggested) for your class that orders objects based on the Id field, then when you want to do a find, pass in a dummy object that contains the Id you're looking for.

casablanca
Works great for a standard set or map, not so great for hash-based containers. The `operator <`, I mean.
Mark Ransom
Why not? I thought hash-based containers still depend on `operator <` to ultimately ensure that two objects are equal.
casablanca
Both `hash_map` and `unordered_map` appear to take a comparison functor for equality, not less than.
Mark Ransom
It seems `unordered_map` uses `equal_to<Key>` for comparison, but at least VC++'s `hash_map` uses `hash_compare<Key, less<Key>>`.
casablanca
Just overload the equality operator too. And for the hashing function itself just return the id.
CashCow
@CashCow: That makes sense. :)
casablanca
A: 

1) I can write a custom hash function that will hash the object using the Id, but then I can't use the find() method on hash_set to lookup the item by Id (long long) since it will require a Foo object to be passed-in.

This is a common problem with the standard map and set containers. Add a constructor or static member that creates a comparison object, an object with only the key member valid.

Mark Ransom
A: 

Option 2 is your best bet among the three you gave. If you're up to it, you could write a custom hash_set (even just a wrapper around the regular one) to provide what you want. In that case, I'd prefer option 1. You could easily add a find function that takes just the key value and adapts it to use the internal find function properly giving you all benefits.

JoshD
A: 

I don't have much experience with it, but boost::multiindex is built on top of boost::hash by default, and has explicit support for looking up an object in a hash using only a key found in the object.

swestrup
A: 

If you don't find an actual solution for the hash_set/hash_map containers, you may want to consider using the uthash hash table implementation, which supports your use case directly. It's C though and it's based on macros but it's one of the most popular hash table implementations that exist.

Jakub Wieczorek