tags:

views:

1759

answers:

7

Old question

My understanding is that C# has in some sense HashSet and set types. I understand what HashSet is. But why set is a separate word? Why not every set is HashSet<Object>?

New question

Why does C# has no generic Set type, similar to Dictionary type? From my point of view, I would like to have a set with standard lookup/addition/deletion performance. I wouldn't care much whether it is realized with hashes or something else. So why not make a set class that would actually be implemented as a HashSet in this version of C# but perhaps somewhat different in a future version?

Or why not at least interface ISet?

Answer

Learned thanks to everyone who answered below: ICollection implements a lot of what you'd expect from ISet. From my point of view, though, ICollection implements IEnumerable while sets don't have to be enumerable --- example: set of real numbers between 1 and 2 (even more, sets can be generated dynamically). I agree this is a minor rant, as 'normal programmers' rarely need uncountable sets.

Ok, I think I get it. HashSet was absolutely meant to be called Set but the word Set is reserved in some sense. More specifically, creators of .NET architecture wanted to have a consistent set (sic!) of classes for different languages. This means that every name of the standard class must not coincide with any keyword in the .NET languages. The word Set, however, is used in VB.NET which is actually case-insensitive (is it?) so unfortunately there is no room for maneuvre there.

Mystery solved :)

Epilogue

The new answer by Alex Y. links to the MSDN page which describes the upcoming .NET 4.0 interface ISet which behaves pretty much as I thought it should and is implemented by HashedSet. Happy end.

A: 

I'm pretty sure there's no Set<T> class in the BCL, at least in .NET 3.5 (and not .NET 4.0 either it seems). What would you expect is the need for such a class, anyway?

HashSet<T> is itself just an ordinary set data structure that uses hash codes (the GetHashCode method of an object) to compare elements. This is simply an efficient way of implementing a set type. (Other methods for checking equality would likely have lower performance.)

Noldorin
The HashSet uses the GetHashCode to locate elements inside it's buckets. It still compares the elements using Equals, in case two elements have the same HashCode and end up in the same bucket.
Noam Gal
@Noam: Indeed, that is the case. My point was just that GetHashCode is the primary method of comparison. Equals is pretty much used as a "back-up".
Noldorin
+4  A: 

There is no Set<T>. This BCL team Blog post has lot's of details on HashSet including a not entirely conclusive discussion on including hash in the name. I suspect not everyone on the BCL team liked the decision to use the name HashSet<T>.

ScottS
They say it's because Set is a contextual keyword of get/set pair -- but isn't that lowercase set?
ilya n.
Upvote- a link to the BCL team blog would of been my answer, but I've just seen the question and you've already answered it.
RichardOD
+1  A: 

Ah right I understand your question now
Not sure I can 100% see the need for an ISet<T>.
I guess the question is which do you see as essential behaviour for a set?
Is it Add,Remove, Contains etc. If so then ICollection<T> already provides an interface for that.
If it's set operations such as Union, Intersect, etc then is that something you'd consider generic enough to abstract out to a contract style enforcement?

I have to say I don't know the right answer to this one - I think it's open to debate and I suspect the BCL team may end up putting something like this in a future version but that's up to them. I personally don't see it as massive missing piece of functionality

Original Post

The BCL doesn't have a Set collection at all, at least not as far as I know.
There a few 3rd party Set libs out there like Iesi.Collections
HashSet<T> was introduced in .NET 3.5 to create a fast set collection i.e where you want a collection with no duplicates. It also has typical set operations such as Union and Join. Check out this link from BCL team on HashSet

You'd typically use it where previously you had to use List<T> and check for duplicates when adding.
Adding items to a HashSet<T> can also be significantly faster than List

Some further details:
Another nice feature of HashSet is that it doesn't throw an exception if you try and add a duplicate it just fails to add the duplicate entry which saves you having to put lots of try.catch blocks around every add - nice :)

zebrabox
BTW - anyone know how to make <T> appear in posts? I type it in my original test but it disappears when I post!
zebrabox
Use `<T>` (the quotes that escape code)
ilya n.
Thanks! Also figured out if mark it as a code block that works nicely too
zebrabox
@zebrabox: I know!!! I know!! :) Search for the word "backticks" in http://stackoverflow.com/editing-help and you'll find an how to...
Arjan Einbu
OK, but why not have interface ISet with all those properties and HashSet that implements it? This is how Dictionary works.
ilya n.
@Arjan Einbu - Thanks, will do!
zebrabox
+2  A: 

set is a C# language keyword that has been around since version 1.0. Is is used to define the value-assigning part of a property (and get is used to implement the value-reading part of a property). In this context you should understand the word 'set' as a verb, as in setting a value.

HashSet<T> is a particular implmentation of the mathematical concept of a Set. It was first introduced in .NET 3.5. This blog post by the BCL Team explains more about the reasoning behind it, as well as some clues to why the name is HashSet<T> and not just Set<T>: http://blogs.msdn.com/bclteam/archive/2006/11/09/introducing-hashset-t-kim-hamilton.aspx.

In the case of HashSet<T> you should understand the word 'set' as a noun.

Mark Seemann
I'm not able to solve the clues :) The best I can get is that Set is somehow reserved because of VB.
ilya n.
Because I know about set as a verb, but I think C# is case-sensitive so why would it be mixed up with Set?
ilya n.
Set is indeed a keyword in VB.NET. Although C# is case-sensitive, the CLS isn't, so there's a danger that a case-insensitive language (such as VB.NET) would have trouble telling the keyword and the type apart. Even if the compiler can figure it out, developers might become confused. Thus, the Framework Design Guidelines recommend avoiding certain keywords as type names.`HashSet<T>` was developed to be used from any .NET-based language and not only C#.
Mark Seemann
+9  A: 

(Your original question about set has been answered. IIRC, "set" is the word with the most different meanings in the English language... obviously this has an impact in computing too.)

I think it's fine to have HashSet<T> with that name, but I'd certainly welcome an ISet<T> interface. Given that HashSet<T> only arrived in .NET 3.5 (which in itself was surprising) I suspect we may eventually get a more complete collection of set-based types. In particular, the equivalent of Java's LinkedHashSet, which maintains insertion order, would be useful in some cases.

To be fair, the ICollection<T> interface actually covers most of what you'd want in ISet<T>, so maybe that isn't required. However, you could argue that the core purpose of a set (which is mostly about containment, and only tangentially about being able to iterate over the elements) isn't quite the same as a collection. It's tricky. In fact, a truly mathematical set may not be iterable or countable - for instance, you could have "the set of real numbers between 1 and 2." If you had an arbitrary-precision numeric type, the count would be infinite and iterating over it wouldn't make any sense.

Likewise the idea of "adding" to a set doesn't always make sense. Mutability is a tricky business when naming collections :(

EDIT: Okay, responding to the comment: the keyword set is in no way a legacy to do with Visual Basic. It's the operation which sets the value of a property, vs get which retrieves the operation. This has nothing to do with the idea of a set as an operation.

Imagine that instead the keywords were actually fetch and assign, e.g.

// Not real code!
public int Foo
{
    fetch
    {
        return fooField;
    } 
    assign
    {
        fooField = value;
    } 
}

Is the purpose clear there? Now the real equivalent of that in C# is just

public int Foo
{
    get
    {
        return fooField;
    } 
    set
    {
        fooField = value;
    } 
}

So if you write:

x = y.Foo;

that will use the get part of the property. If you write:

y.Foo = x;

that will use the set part.

Is that any clearer?

Jon Skeet
In fact I still don't get the original thing about set.
ilya n.
From the linked post the closest I get is that there is some legacy Set word from Visual Basic.
ilya n.
Sorry, I was unclear. I am familiar with the set/get keyword pair, but I know they are lowercase, and I believe C# is case-sensitive, so I didn't think that to be relevant to the Set class. But Set being a reserved keyword from VB.NET explains it (I didn't know C# and VB are related until now, but now I guess they are)
ilya n.
I knew about ICollection, but I guess for me as a mathematician collections and sets from the start were very different. Now as I think about it I agree ICollection is similar to how people actually use sets in programming. However, taking an example from math, sets have more interesting examples. For example, consider the set [0; 1] \union [2; 3) \union [7; +\infty]. This set is easy to program as an ordered set but it can never be programmed as a collection.
ilya n.
Correction: it can never be programmed as a list, but I suspect that ICollection includes method similar to ToArray(). It could be thought of as a collection which has no way to be iterated through, but I think this is a new idea compared to what people are used to.
ilya n.
I think I missed the part of your answer where you say the same thing as I do about sets being not necessarily countable. Apologies for repeating it as my invention.
ilya n.
On mutability and addition -- sets are very dangerous in this respect since real_set.Add(5) as well as real_set.Plus(5) could be understood as the whole set shifted by 5 (that is, the new set whose elements are exactly x+5 for x being an elements of some_set)
ilya n.
Yup, basically you need to be very clear about what concept of "set" you're talking about. I think I'd always use something like "offset" for the latter operation you're talking about, but I agree that it's a cause of concern.
Jon Skeet
In fact, translate(Vector v) would be the math standard... of course it might not play well with i18n.
ilya n.
Yup, translate would work for vectors.
Jon Skeet
+1  A: 

Set is a reserved keyword in VB.NET (it's the equivalent to set in C#). VB.NET can use classes/methods/etc with the same name as keywords but they have to be written between square brackets, which it's ugly:

Imports Wintellect.PowerCollections 'PowerCollections contains a class called Set'
Public Class Test
    Private _myValue As Integer  

    Public Property MyValue() As Integer
        Get
            Return _myValue
        End Get
        Set ' Set as keyword'
            _myValue = value
        End Set
    End Property

    Public Function X As [Set](Of Integer)
        Dim a As New [Set](Of Integer) ' Set as class'
        Return a
    End Function

End Class
ggf31416
Ok, that does explain it.
ilya n.
No, sorry, I don't get it. How are C# and VB/.NET related?
ilya n.
Ok, I think I get it.
ilya n.
VB.NET and C# are the main languages of .NET, they are almost equivalent in features, they compile to almost the same IL and Microsoft promised to make them closer in the future.VB.NET community is quite large but you don't see as many questions about VB.NET because many of them can understand C# without problems.
ggf31416
+2  A: 

The only reason for this seems lack of resources to implement this ideally in .NET 3.5.

.NET 4.0 will include ISet, as well as its new implementation in addition to HashSet - SortedSet. Check out the provided links to MSDN library - they're already available in .NET 4.0 beta1.

Alex Yakunin
Amazing how you found it, thanks! I wish I could accept two answers.
ilya n.