tags:

views:

88

answers:

1

This code takes timings for n invocations of two methods that both do nothing (MyMethod1, MyMethod2) but are decorated with different attributes (MyAspect1, MyAspect2).

MyAspect1 contains no logic apart from letting MyMethod1 run (Proceed()).

MyAspect2 does the same for MyMethod2, but also creates a Task, supplying an anonymous delegate to run, which is meant to equate to near zero work for the CPU (trimming a string).

I'd expect timings for iterations over MyMethod1 and MyMethod2 to be roughly the same.

The actual timings are below; they seem counter intuitive, with MyMethod1 starting off significantly slower (counter to what I'd expect), and eventually with it losing the race significantly from 10000 iterations onwards. Can anyone help me explain this?

Test results (average of 3 runs per iteration value):

No. of iterations  |          Ticks 
                      Method 1  |  Method 2
       5                 25282        6999
       100               22128        8176 
       1000              22982        23720
       10000             25995        265621
       1000000           994359       25125076

---------------------------------------------------

public class TestClass2
{ 
    public void MyTest()
    {
        const int iterations = 100;
        var stopWatch = new Stopwatch();
        stopWatch.Start();
        for (int x = 0; x < iterations; x++)
        {
            MyMethod1();
        }
        stopWatch.Stop();
        Console.WriteLine("Method1 calls duration: " + stopWatch.ElapsedTicks);            
        stopWatch.Reset();
        stopWatch.Start();
        for (int x = 0; x < iterations; x++)
        {
             MyMethod2();
        }
        stopWatch.Stop();
        Console.WriteLine("Method2 calls duration: " + stopWatch.ElapsedTicks);                        
    }

    [MyAspect1]
    private void MyMethod1() {}
    [MyAspect2]
    private void MyMethod2() {}
}

[Serializable]
public class MyAspect1 : AsynchronousMetricsAspect
{
    public override void OnInvoke(MethodInterceptionArgs args)
    {
        args.Proceed();
    }
}

//NOTE: this type is the same as MyAspect1, but adds a Task creation
[Serializable]
public class MyAspect2 : AsynchronousMetricsAspect
{
    public override void OnInvoke(MethodInterceptionArgs args)
    {
        args.Proceed();
        Task.Factory.StartNew(() => "bleh".Trim()); //invoke the logging method asynchronously           
    }
}

Further info: Test is being invoked from ReSharper NUnit VS plugin and the attributes are PostSharp MethodInterceptionAspects.

Edit: Updated figures for when the methods are pre-invoked before the tests, to ensure any JIT compilation has occurred. 

Test results (ballpark):

No. of iterations  |          Ticks 
                      Method 1  |  Method 2
       5                 22          437
       100               37          2204 
       1000              192         24476
       10000             7931        286403
       100000            115451      2862439
       1000000           695950      29049021
       10000000          8347867     275777590
+4  A: 

Well, to start with I'd try to determine whether this is anything to do with PostSharp or not. I would try to call the same methods without any AOP being involved at all.

Here are my suspicions though:

  • The 25000 ticks (a pretty short length of time, mind you) is due to JITting bits of PostSharp infrastructure. They're therefore a constant cost, which doesn't have to be paid again by the second method. You can verify this by switching round the order of your tests. However, given the numbers, I suspect there's another constant cost incurred by setting up task factories etc... which would explain why 100 iterations of method 2 doesn't take much longer than 5 iterations.

  • Creating a task is not free, and I wouldn't expect it to be. Just the fact that you're creating an object means there's work going on... and then there's the scheduling and coordination across tasks and threads. Parallel Extensions is good, but it can't work miracles. I don't know how PFX will spread the work out, but it's possible that some of that extra work being generated (the tasks are short, but they still have to be executed) ends up on the same core as the main thread. You're creating a lot of tasks very quickly.

  • It's possible that the difference between the two methods is enough to change whether they can be inlined or not. This is obfuscated somewhat by the use of PostSharp, but I wouldn't be entirely surprised to find that that's the case.

  • The scaling isn't quite linear in the case of method 2, but it's not far off. If you take a formula of 6000 + 23 * N as a quick and dirty approximation, you would expect values of 6115, 8300, 29000, 236000 and 23006000 - which aren't very far off.

  • I strongly suspect that the huge figure for the first method in the last test is due to garbage collection and the heap growing. You're creating plenty of objects here, and I suspect it's doing a full GC and expanding the various generations appropriately - method 2 in the same test then takes advantage of that larger heap.

Those are really just guesses, but they provide some kind of explanation for everything you're seeing :)

Jon Skeet
Thanks Jon. Didn't expect an answer from the man himself! I'm currently trying to decipher DotTrace output. Your answer sheds a good deal of light for me, thanks.
Ben Aston
@Ben: Beware of assuming that any of this answer is actually accurate. The ideas may well help you to diagnose further, but none of them should be taken as correct :) (The simplest one to check is my suspicion about the JIT compiler being responsible for the first method taking longer for small numbers of iterations though.)
Jon Skeet