tags:

views:

1012

answers:

6

I'm looking for insight into the heads of HashSet designers. As far as I am aware, my question applies to both Java and C# HashSets, making me think there must be some good reason for it, though I can't think of any myself.

After I have inserted an item into a HashSet, why is it impossible to retrieve that item without enumeration, hardly an efficient operation? Especially since a HashSet is explicitly built in a way which supports efficient retrieval.

It would often be useful to me to have Remove(x) and Contains(x) return the actual item that is being removed or contained. This is not necessarily the item I pass into the Remove(x) or Contains(x) function. Sure, I guess I could achieve the same effect through a HashMap but why waste all that space and effort when it should be perfectly possible to do this with a set?

I can appreciate that there may be some design concerns that adding this functionality would allows uses of HashSet which are not consistent with their role or future role in the framework, but if this is so, what are these design concerns?

Edit

To answer some more questions, here are more details:

I am using an immutable reference type with overridden hashcode, equals, etc to emulate a value type in C#. Let's say the type has members A, B, and C. Hashcode, equals, etc depend only on A and B. Given some A and B I want to be able to retrieve that equivalent item from a hashset and get it's C. I won't be able to use HashSet for this it appears, but I would at least like to know if there is any good reason for this. Pseudo code follows:

public sealed class X{
 object A;
 object B;
 object extra;

 public int HashCode(){
  return A.hashCode() + B.hashCode();
 }

 public bool Equals(X obj){
  return obj.A == A && obj.B == B;
 }
}

hashset.insert(new X(1,2, extra1));
hashset.contains(new X(1,2)); //returns true, but I can't retrieve extra
+2  A: 

If you change an object after it has been inserted, it's hash may have changed (this is especially likely if hashCode() has been overridden). If the hash changes, a lookup of it in the set will fail, as you will be attempting to lookup an object that is hashed at a different location than it is stored in.

Also, you need to make sure you have overridden hashCode and equals in your object if you want to lookup equal objects that are different instances.

Note that this is all for Java - I am assuming C# has something similar, but as it has been several years since I used C#, I will let others speak to it's capabilities.

aperkins
This is true and all, but it is perfectly possible at the moment to break HashSet (and all components with this invariant) by hanging onto a reference after you add it to the set. You can use that reference to mutate and invalidate the invariant as you wish. The API already relies on the user to fulfill this contract, returning the object reference would not change this.
Soonil
A: 

Set objects in those languages were mostly designed as set of value, not for mutable objects. They check that object put in them are unique by using equals. That is why contains and remove returns boolean, not the object: they check for or remove the value you pass to them.

And actually, if you do a contains(X) on a set, and expect to get a different object Y, that would means X and Y are equals (ie X.equals(Y) => true), but somewhat different, which seems wrong.

penpen
Sets are unique according to the comparison method you specify. Simply because I wish to use comparison method A in my set does not mean comparison method B has no value do me when I consider the same objects. Addressing your comments on mutability, see my answer to aperkins.
Soonil
+3  A: 

How were you proposing to retrieve the item from the hash set? A set is by definition not ordered in any way and therefore, there is no index with which to use to retrieve the object in question.

Sets, as a concept, are used to test inclusion, i.e. whether or not the element in question is in the hash data set. If you're looking to retrieve a value from a data source using a key value or index, I would suggest looking into either a Map or a List.

EDIT: Additional answer based on the Edit to the original question

Soonil, based on your new information, it looks like you might be interested in implementing your data as a Java Enum, something similar to this:

 public enum SoonilsDataType {
      A, B, C;

      // Just an example of what's possible
      public static SoonilsDataType getCompositeValue(SoonilsDataType item1,
           SoonilsDataType item2) {
           if (item1.equals(A) && 
                     item2.equals(B)) {
                return C;
           }
      }
 }

Enum's automatically inherit values() which returns the list of all values in the enum's "set", which you can use to test inclusion against in the same way as the Set. Also, because its a full class, you can define new static methods to do the composite logic (like I was trying to allude to in the example code). The only thing about the Enum is that you can't add new instances at runtime, which may not be what you want (though if the set's data size isn't going to grow at runtime, the Enum is what you want).

Peter Nix
See my comment to penpen for your answer.
Soonil
@Soonil If I understand your comment to penpen, you're using not only object equality (for existence in the Set) but also a identifier (or some other numerical index) of your own devising as a secondary equals and you want this identifier to also be used as a index into the hash set?If this is correct, you seem to be working orthogonal to the nature of hash set and I once again suggest you look into a Map (probably HashMap) as your data source.
Peter Nix
Sure Peter, I've added some details to the end of my question, let me know if that helps.
Soonil
Peter, TBH, I don't entirely understand what you are getting at with the enums, but I think our understanding has diverged somewhere :) +1 for the effort though!
Soonil
+3  A: 

I imagine the designers of the Set interface and HashSet class wanted to ensure that the remove(Object) method defined on the Collection interface was also applicable to Set; this method returns a boolean denoting whether the object was successfully removed. If the designers wanted to provide functionality whereby remove(Object) returned the "equal" object already in the Set this would mean a different method signature.

Also, given that the object being removed is logically equal to the object passed to remove(Object) it is arguable about the value added in returning the contained object. However, I have had this problem myself before and have used a Map to solve the problem.

Note that in Java, a HashSet uses a HashMap internally and so there isn't additional storage overhead in using a HashMap instead.

Adamski
You are correct about Java :) alas, C# does not use a HashMap to implement a HashSet, and there are some space/time benefits I would like to keep if possible.
Soonil
+1  A: 

Looks to me like you're actually looking for a Map<X,Y>, where Y is the type of extra1.


(rant below)

The equals and hashCode methods define meaningful object equality. The HashSet class assumes that if two objects are equal as defined by Object.equals(Object) there is no difference between these two objects.

I'd go as far as to say that if the object extra is meaningful, your design is not ideal.

Robert Munteanu
Agreed with your rant :) The map would work, but I've decided against the extra overhead and will probably roll my own set (since this isn't production code, heh) +1
Soonil
A: 

I was given an interesting suggestion as to a way to use a Map, by having my own objects define themselves as KeyValuePairs. While a good concept, unfortunately KeyValuePair is not an interface (why not?) and is a struct, which shoots that plan out of the air. In the end I will roll my own Set, as my constraints allow me this option.

Soonil