views:

276

answers:

3

When constructing LINQ expressions (for me, linq to objects) there are many ways to accomplish something, some much, much better and more efficient than others.

  • Is there a good way to "tune" or optimize these expressions?
  • What fundamental metrics do folks employ and how do you gather them?
  • Is there a way to get at "total iterations" count or some other metric, where you could "know" that lower means better?

EDIT

Thanks Richard/Jon for your answers.

What it seems that I really want is a way to get a simple Operation Count "OCount" for a LINQ Expression though I am not sure that the hooks exist in LINQ to allow it. Suppose that I have a target level of performance for a specific machine hardware (an SLA). Ideally, I would add a unit test to confirm that the typical data moved through that query would process within that allotted time (from the SLA). Problem is that this would be run on the build server/developers machine/etc. which probably bears little resemblance to the machine hardware of the SLA. So the idea is that I would determine an acceptable max "OCount" for the expression, knowing that if the OCount is less than X, it will certainly provide acceptable performance under the SLA on the target "typical" hardware. If the OCount exceeds this threshold, the build/unit test would generate a warning. Ideally, I would like to have something like this (pseudocode-ish):

var results = [big linq expression run against test dataset];
Assert.IsLess(MAXALLOWABLE_OCOUNT, results.OCount)

where results.OCount would simply give me the total iterations (n) necessary to produce the result set.

Why would I like this??

Well, with even a moderately sized LINQ expression, a small change/addition can have HUGE effects on the performance as a consequence of increasing the overall operation count. The application code would still pass all unit tests as it would still produce the correct result, but work miserably slowly when deployed.

The other reason is for simple learning. If you do something and the OCount goes up or down by an order of magnitude, then you learn something.

EDIT #2 I'll throw in a potential answer as well. It is not mine, it comes from Cameron MacFarland from another question that I asked that spawned this one. Turns out, I think the answer to that one could work here in a unit test environment like the one that I described in the first edit to this question.

The essence of it would be to create the test datasets in the unit test fixture that you feed into the LINQ expression in the way outlined in this answer and then add up the Iteration counts and compare to the max allowable iteration count.

See Cameron's answer here

+3  A: 
  1. Get an SLA (or other definition) describing the required overall performance.

  2. Measure the applications performance, and how far below requirements it is (if within requirements then stop and do something useful).

  3. Use a profiler to get a detailed performance breakdown, identify the parts of the system most able to be improved (making a small improvement to hot code is likely to be better than a big improvement to rarely called code).

  4. Make the change, re-run the unit/functional tests (no point doing the wrong thing fast).

  5. Go to 1.

If, in #3 you find a LINQ expression is a performamce problem then start thinking about needing an answer to this question. The answer will completely depend on which LINQ provider you are using and the details of its use in your case. There is no general answer.

Richard
+4  A: 

You basically need to work out the complexity function. This depends on the operator, but doesn't tend to be very well documented, unfortunately.

(For the general principle I agree with Richard's answer - this is just LINQ to Objects stuff.)

If you have specific operators you're interested in, it would be worth asking about them, but off the top of my head:

  • Select = O(n)
  • Where = O(n)
  • Join = O(inner + outer + matches) (i.e. it's no cheaper than inner + outer, but could be as bad as inner * outer depending on the results)
  • GroupJoin = same as Join, but buffered instead of streaming by outer
  • OrderBy = O(n log n)
  • SelectMany = O(n + results)
  • Count = O(1) or O(n) depending on whether it implements IList
  • Count(predicate) = O(n)
  • Max/Min = O(n)
  • All/Any = O(n) (with possible early out)
  • Distinct = O(n)
  • Skip/Take = O(n)
  • SkipWhile/TakeWhile = O(n)

The exact characteristics depend on whether the operator buffers or streams too.

Jon Skeet
A: 

Adding onto Jon who is adding onto Richard

Another issue to consider is whether or not you are processing all results of a LINQ query. In certain cirmcumstances, particularly UI, you only end up processing a subset of the results returned from a LINQ query. In those situations it's important to know which LINQ queries support lazy evaluation. That is the ability to return a subset of the results without processing the entire collection.

For instance calling MoveNext() on the following LINQ operations will process one result at a time

  • Select
  • Where

But the following must process every element in the collection before returning a single item.

  • OrderBy
  • Except (processes the other collection entirely)
JaredPar