views:

1446

answers:

4

More than about LINQ to [insert your favorite provider here], this question is about searching or filtering in-memory collections.

I know LINQ (or searching/filtering extension methods) works in objects implementing IEnumerable or IEnumerable<T>. The question is: because of the nature of enumeration, is every query complexity at least O(n)?

For example:

var result = list.FirstOrDefault(o => o.something > n);

In this case, every algorithm will take at least O(n) unless list is ordered with respect to 'something', in which case the search should take O(log(n)): it should be a binary search. However, If I understand correctly, this query will be resolved through enumeration, so it should take O(n), even in list was previously ordered.

  • Is there something I can do to solve a query in O(log(n))?
  • If I want performance, should I use Array.Sort and Array.BinarySearch?
+2  A: 

Yes, it has to be, because the only way of accessing any member of an IEnumerable is by using its methods, which means O(n).

It seems like a classic case in which the language designers decided to trade performance for generality.

Sklivvz
Thanks for the answer. It's what I thought. But... are there no ways to circumvent this? Maybe with parallelization.
Pablo Marambio
@Marambio: take a look at PLINQ. It attempts to parallelize most of LINQ.
sixlettervariables
Well... thanks! That should be an answer.
Pablo Marambio
> trade performance for generality.Bingo!
Lucas
+5  A: 

Even with parallelisation, it's still O(n). The constant factor would be different (depending on your number of cores) but as n varied the total time would still vary linearly.

Of course, you could write your own implementations of the various LINQ operators over your own data types, but they'd only be appropriate in very specific situations - you'd have to know for sure that the predicate only operated on the optimised aspects of the data. For instance, if you've got a list of people that's ordered by age, it's not going to help you with a query which tries to find someone with a particular name :)

To examine the predicate, you'd have to use expression trees instead of delegates, and life would become a lot harder.

I suspect I'd normally add new methods which make it obvious that you're using the indexed/ordered/whatever nature of the data type, and which will always work appropriately. You couldn't easily invoke those extra methods from query expressions, of course, but you can still use LINQ with dot notation.

Jon Skeet
> Even with parallelisation, it's still O(n).Good point
Lucas
+2  A: 

Yes, the generic case is always O(n), as Sklivvz said.

However, many LINQ methods special case for when the object implementing IEnumerable actually implements e.g. ICollection. (I've seen this for IEnumerable.Contains at least.)

In practice this means that LINQ IEnumerable.Contains calls the fast HashSet.Contains for example if the IEnumerable actually is a HashSet.

IEnumerable<int> mySet = new HashSet<int>();

// calls the fast HashSet.Contains because HashSet implements ICollection.
if (mySet.Contains(10)) { /* code */ }

You can use reflector to check exactly how the LINQ methods are defined, that is how I figured this out.

Oh, and also LINQ contains methods IEnumerable.ToDictionary (maps key to single value) and IEnumerable.ToLookup (maps key to multiple values). This dictionary/lookup table can be created once and used many times, which can speed up some LINQ-intensive code by orders of magnitude.

Tobi
How would that work when filtering by property as per the question?
Sklivvz
Then you could use ToDictionary or ToLookup, mapping that property to the key of the dictionary and the object itself to the value of the dictionary. (Both ToDircetionary and ToLookup take delegates to specify what should be key and what should be value.)
Tobi
Of course this would only speed stuff up when you do enough searches on that particular property on a result set which doesn't change.I think though that the filtering/searching for a property was just an example and fast searching for objects itself would be included in the question too :)
Tobi
A: 

If you are looking for a parallel implementation of LINQ, Microsoft Research is working on both PLINQ and the Task Parallel Library. You can read up on them and find examples on them at the Parallel Computing Development Center.

Other useful links:

sixlettervariables