tags:

views:

73

answers:

5

For example consider:

var hset = new HashSet<int>();
// Fill the hset.

var enumerable = hset as IEnumerable<int>;
bool enumerable.Contains(0);

Does LINQ use the fact that the HashSet has an efficient implementation of Contains, or does itsimply operate on the enumerator as one would expect?

The reason I'm asking is that the component I'm currently working on has a number of properties that are IEnumerable<T>, but the previously developer always converts whatever data structure he is using to create the enumerable object to an array before assigning it to the property. I'm not sure if this is good practice or a complete waste of time.

+7  A: 

There are some optimizations in place, and Contains is one of them.

These days when we have the Microsoft public symbol server, we can just take a look at the code when in doubt. This is the implementation of Enumerable.Contains in .NET Framework 4:

public static bool Contains<TSource>(this IEnumerable<TSource> source, TSource value) 
{ 
    ICollection<TSource> collection = source as ICollection<TSource>;
    if (collection != null) return collection.Contains(value); 
    return Contains<TSource>(source, value, null);
}

The method casts the source to ICollection<T>, and on success, uses it's Contains method. Since HashSet<T> implements ICollection<T>, the actual method used will be HashSet<T>.Contains. This is good because it is an O(1) operation compared to an arrays O(N).

In other words, converting to an array first will be hurtful for performance: The copy operation first takes time, then the actual lookup will be less effective, O(N), because the Contains method will need to look through all the elements of the array.

In general, when skimming through Enumerable.cs, this pattern is generally used: Most methods tries to use the ICollection version of the methods, when there will be a benefit from doing it.

driis
+1, that's correct.
Darin Dimitrov
Or you can read the documentation to find out the same thing.
Ray Henry
Thanks, that's really useful to know!
B Tyler
@Ray, reading the documentation takes more time than reading the source code (at least for me). It is also less reliable.
Darin Dimitrov
@Darin, true, if you know how to get the source. Although for me I was able to do a Google search and find the answer in about 15 seconds.
Ray Henry
@Ray correct. But I agree with Darin, reading 3 lines of code to answer a question instead of reading a full page of documentation, is much, much, faster to process for my brain.
driis
You don't need to get the source. There's Red Gate Reflector. Super reliable. Allows you to navigate also.
Darin Dimitrov
Yeah but I'm not at my dev machine.
Ray Henry
@Ray, ah that explains it.
Darin Dimitrov
And, navigating to the type using ReSharper (CTRL+T) will also get you the source code from the symbol server, if it's available.
driis
A: 

It depends. Certain methods do check for particular interfaces before resorting to a "least common denominator" implementation (one that only uses the IEnumerable<T> interface).

For example, Count does check for both ICollection and ICollection<T> in order to leverage a (presumably) O(1) count before resorting to just counting all elements one-by-one.

It seems from driis's answer that Contains is the same way, checking for ICollection<T> (which HashSet<T> implements).

Now, it's unclear to me what you mean here:

[The previous developer] always converts whatever data structure he is using to create the enumerable object to an array before assigning it to the property.

If you mean that your collections are actually copied to arrays in order to be exposed as IEnumerable<T>, then you are definitely not getting the HashSet<T> class's Contains method; you're getting the array's (since a T[] array does implement ICollection<T>, though this is really not going to be any better than the naive approach).

Dan Tao
+1  A: 

The Linq extension method Contains has a shortcut for enumerables that implement ICollection<>. Since HashSet<> implements ICollection<>, it will do an efficient lookup.

Documented in MSDN

If the type of source implements ICollection(Of T), the Contains method in that implementation is invoked to obtain the result. Otherwise, this method determines whether source contains the specified element.

Verifiable using Reflector

public static bool Contains<TSource>(this IEnumerable<TSource> source, TSource value)
{
    ICollection<TSource> is2 = source as ICollection<TSource>;
    if (is2 != null)
    {
        return is2.Contains(value);
    }
    return source.Contains<TSource>(value, null);
}
Greg
A: 

The linq Contains extension method does check the type and do the relevant thing. If the enumerable implements ICollection<T>, its ICollection<T>.Contains() method will be called. If not, the enumerable will be enumerated using a foreach until the specified item is found.

Since HashSet<T> implements ICollection<T>, the Contains method will be called.

adrianbanks
A: 

If the type of source implements ICollection, the Contains method in that implementation is invoked to obtain the result. Otherwise, this method determines whether source contains the specified element.

http://msdn.microsoft.com/en-us/library/bb352880.aspx

Ray Henry