ansaurus

Question

Answer 1

+7 A:

The first two "tests" are identical, and both slow. The third adds another entire level of slowness.

The first two LINQ statements here are quadratic in nature. Since your "Match" element potentially requires iterating through the entire "data" sequence in order to find the match, as you progress through the range, the length of time for that element will get progressively longer. The 10000th element, for example, will force the engine to iterate through all 10000 elements of the original sequence to find the match, making this an O(N^2) operation.

The "test3" operation takes this to an entirely new level of pain, since it's "squaring" the O(N^2) operation in the second single - forcing it to do another quadratic operation on top of the first one - which is going to be a huge number of operations.

Each time you do data.Single(...) with the match, you're doing an O(N^2) operation - the third test basically becomes O(N^4), which will be orders of magnitude slower.

Reed Copsey 2010-07-01 15:50:58

+1 I like your answer better than mine :)

Kelsey 2010-07-01 15:52:20

How is iterating the entire sequence quadratic? And why is the third test squaring it when, working from the inside out, it should be the same as doing test1 and plugging the result into test2?

2010-07-01 16:07:11

@stovroz: When you call data.Single, you're (potentially) doing an entire enumeration of "data" for every single element of data. This means each time you return one new item from Select(), you potentially have to iterate through ALL of data in order to find the "match". In the third case, you're doing this nested inside of the (already) quadratic element, which basically does an N^2 operation inside the N^2 operation...

Reed Copsey 2010-07-01 16:09:04

+1 Excellent explanation of `O` notation. :)

jrista 2010-07-01 17:57:23

Hi Reed Copsey. I assume that data.Single is _definitely_ doing an entire enumeration data, as it needs to make sure there's only one. Anyway, so test3's data.Single(y => y.Value == x.Value).Index gets us a value for Index in O(N) reducing the requirement to test2's data.Single(z => z.Index == Index) which is also O(N). And I'm only asking to Take(1) of the N, so what's going wrong that the whole thing isn't O(N) as it should be?

2010-07-01 18:11:16

@stovroz: Take test1: When you iterate, it's O(N). However, each element must do an additional data.Single [O(N)] call, which means you're O(N^2). When you do test3, you're doing another O(N) inside of the inner nest, which makes it "squared" again, which causes 10^16 total iterations - hence, it's slow as hell.

Reed Copsey 2010-07-01 18:22:27

What do mean each element must do an additional data.Single call? I'm only taking the first one and it's suppose to be lazily evaluated.

2010-07-01 19:15:00

@stovroz: But, in test3, you're not doing the take(1), so you end up going quadratic on the first element...

Reed Copsey 2010-07-01 19:25:18

I am doing Take(1) in test3.

2010-07-01 19:53:45

@stovroz: But not in the inner query. You're doing a query internal to the query (since you've got a nested Single()), and that one happens EVERY time the first (outer) single is processed.

Reed Copsey 2010-07-01 20:00:08

Test1 and Test2 are definitely not O(N^2), so there's no way test3 is O(N^4). It must be that test1 and test2 are O(N) and test3 is O(N^2), but I still don't see why Linq can't make a better job of it.

2010-07-01 23:30:16

@Reed Copsey: stovroz is absolutely right. Simple test can show that test1 and test2 are O(N) and test3 is O(N^2). The problem is somewhere else.

Yury Tarabanko 2010-07-02 21:32:38

@Yury: That's why test 3 is running so much slower. It would be O(N^4) (without the .Take(1), which makes only the first element iterate). However, it's O(N^2) on 10k elements, which makes it MUCH slower.

Reed Copsey 2010-07-02 21:36:40

@Reed Copsey: I see. You are right. Have just compared it with "mannual" code. The results are approximately the same. :) +1

Yury Tarabanko 2010-07-02 21:58:41

Answer 2

A:

Fixed.

var data = Enumerable.Range(0, 10000)
  .Select(x => new { Index = x, Value = x + " is the magic number"})
  .ToList();

var forward = data.ToLookup(x => x.Index); 
var backward = data.ToLookup(x => x.Value);

var test1 = data.Select(x => new { Original = x,
  Match = backward[x.Value].Single()
} ).Take(1).Dump();
var test2 = data.Select(x => new { Original = x,
  Match = forward[x.Index].Single()
} ).Take(1).Dump();
var test3 = data.Select(x => new { Original = x,
  Match = forward[backward[x.Value].Single().Index].Single()
} ).Take(1).Dump();

In the original code,

data.ToList() generates 10,000 instances (10^4).
data.Select( data.Single() ).ToList() generates 100,000,000 instances (10^8).
data.Select( data.Single( data.Single() ) ).ToList() generates 100,000,000,000,000,000 instances (10^16).

Single and First are different. Single throws if multiple instances are encountered. Single must fully enumerate its source to check for multiple instances.

David B 2010-07-01 17:47:39

If the original code is generating 10^16 instances of anything in a couple of seconds, I'm selling my PC to NASA.

2010-07-01 19:03:18

well, the take(1) before the dump cuts it to 10^12.

David B 2010-07-01 22:00:15

10^12 is still far too high.

2010-07-01 23:25:25

*yawn*, you count 'em then.

David B 2010-07-02 01:06:13

ansaurus

tags:

views:

answers:

Why is this LINQ so slow?

related questions