ansaurus

Question

Do LINQ queries make use of the underyling data type?

Answer 1

+7 A:

There are some optimizations in place, and Contains is one of them.

These days when we have the Microsoft public symbol server, we can just take a look at the code when in doubt. This is the implementation of Enumerable.Contains in .NET Framework 4:

public static bool Contains<TSource>(this IEnumerable<TSource> source, TSource value) 
{ 
    ICollection<TSource> collection = source as ICollection<TSource>;
    if (collection != null) return collection.Contains(value); 
    return Contains<TSource>(source, value, null);
}

The method casts the source to ICollection<T>, and on success, uses it's Contains method. Since HashSet<T> implements ICollection<T>, the actual method used will be HashSet<T>.Contains. This is good because it is an O(1) operation compared to an arrays O(N).

In other words, converting to an array first will be hurtful for performance: The copy operation first takes time, then the actual lookup will be less effective, O(N), because the Contains method will need to look through all the elements of the array.

In general, when skimming through Enumerable.cs, this pattern is generally used: Most methods tries to use the ICollection version of the methods, when there will be a benefit from doing it.

driis 2010-10-12 17:34:50

+1, that's correct.

Darin Dimitrov 2010-10-12 17:36:17

Or you can read the documentation to find out the same thing.

Ray Henry 2010-10-12 17:38:39

Thanks, that's really useful to know!

B Tyler 2010-10-12 17:39:07

@Ray, reading the documentation takes more time than reading the source code (at least for me). It is also less reliable.

Darin Dimitrov 2010-10-12 17:39:21

@Darin, true, if you know how to get the source. Although for me I was able to do a Google search and find the answer in about 15 seconds.

Ray Henry 2010-10-12 17:42:45

@Ray correct. But I agree with Darin, reading 3 lines of code to answer a question instead of reading a full page of documentation, is much, much, faster to process for my brain.

driis 2010-10-12 17:43:38

You don't need to get the source. There's Red Gate Reflector. Super reliable. Allows you to navigate also.

Darin Dimitrov 2010-10-12 17:43:39

Yeah but I'm not at my dev machine.

Ray Henry 2010-10-12 17:45:22

@Ray, ah that explains it.

Darin Dimitrov 2010-10-12 17:46:23

And, navigating to the type using ReSharper (CTRL+T) will also get you the source code from the symbol server, if it's available.

driis 2010-10-12 17:46:32

Answer 2

A:

It depends. Certain methods do check for particular interfaces before resorting to a "least common denominator" implementation (one that only uses the IEnumerable<T> interface).

For example, Count does check for both ICollection and ICollection<T> in order to leverage a (presumably) O(1) count before resorting to just counting all elements one-by-one.

It seems from driis's answer that Contains is the same way, checking for ICollection<T> (which HashSet<T> implements).

Now, it's unclear to me what you mean here:

[The previous developer] always converts whatever data structure he is using to create the enumerable object to an array before assigning it to the property.

If you mean that your collections are actually copied to arrays in order to be exposed as IEnumerable<T>, then you are definitely not getting the HashSet<T> class's Contains method; you're getting the array's (since a T[] array does implement ICollection<T>, though this is really not going to be any better than the naive approach).

Dan Tao 2010-10-12 17:34:52

Answer 3

+1 A:

The Linq extension method Contains has a shortcut for enumerables that implement ICollection<>. Since HashSet<> implements ICollection<>, it will do an efficient lookup.

Documented in MSDN

If the type of source implements ICollection(Of T), the Contains method in that implementation is invoked to obtain the result. Otherwise, this method determines whether source contains the specified element.

Verifiable using Reflector

public static bool Contains<TSource>(this IEnumerable<TSource> source, TSource value)
{
    ICollection<TSource> is2 = source as ICollection<TSource>;
    if (is2 != null)
    {
        return is2.Contains(value);
    }
    return source.Contains<TSource>(value, null);
}

Greg 2010-10-12 17:35:25

Answer 4

A:

The linq Contains extension method does check the type and do the relevant thing. If the enumerable implements ICollection<T>, its ICollection<T>.Contains() method will be called. If not, the enumerable will be enumerated using a foreach until the specified item is found.

Since HashSet<T> implements ICollection<T>, the Contains method will be called.

adrianbanks 2010-10-12 17:36:38

Answer 5

A:

If the type of source implements ICollection, the Contains method in that implementation is invoked to obtain the result. Otherwise, this method determines whether source contains the specified element.

http://msdn.microsoft.com/en-us/library/bb352880.aspx

Ray Henry 2010-10-12 17:37:31

ansaurus

tags:

views:

answers:

Do LINQ queries make use of the underyling data type?

related questions