+5  A: 

I believe that the problem lies in your time metrics, what are you using to measure the elapsed time?

Just for the record, here are my results:

DoOne() -> 295 ms
DoTwo() -> 291 ms

And the code:

        Stopwatch sw = new Stopwatch();

        sw.Start();
        {
            DoOne();
        }
        sw.Stop();

        Console.WriteLine("DoOne() -> {0} ms", sw.ElapsedMilliseconds);

        sw.Reset();

        sw.Start();
        {
            DoTwo();
        }
        sw.Stop();

        Console.WriteLine("DoTwo() -> {0} ms", sw.ElapsedMilliseconds);
arul
Stopwatch SW = new Stopwatch(); SW.Start(); DoOne(); SW.Stop(); Console.WriteLine(" DoOne took {0} seconds.", ((float)SW.ElapsedTicks) / Stopwatch.Frequency);
Boaz
Are you calling SW.Reset() between the calls ?
arul
Mine are:DoOne -> 271 msDoTwo -> 284 ms
Prankster
Mine are: DoOne -> 190 ms DoTwo ->209 ms
Brian ONeil
DoOne() -> 220 msDoTwo() -> 240 ms
Winston Smith
Please update us on the result of Arul's question.
mxmissile
@arul: Did you measure these timings with the release build from the console (no debugger attached)?
0xA3
Full program is now posted + compilation preferences. Of course, program was run without a debugger attached.
Boaz
+2  A: 

I've ran this using Jon Skeet's Benchmark Helper and I am not seeing the results you are, the execution time is approximately the same between the two methods.

cfeduke
+4  A: 

I am seeing some significant penalty for the interface version but nowhere near the magnitude penalty you are seeing.

Can you post a small, complete program that demonstrates the behaviour along with exactly how you are compiling it and exactly what version of the framework you are using?

Eric Lippert
I would be interested to know why there is a penalty for the interface version (even if I the penalty is not as big as mentioned in the question; for me it is about factor(?) 5-7 slower).
0xA3
Program posted - see above.
Boaz
+25  A: 

A note to everyone out there who is trying to benchmark stuff like this.

Do not forget that the code is not jitted until the first time it runs. That means that the first time you run a method, the cost of running that method could be dominated by the time spent loading the IL, analyzing the IL, and jitting it into machine code, particularly if it is a trivial method.

If what you're trying to do is compare the "marginal" runtime cost of two methods, it's a good idea to run both of them twice and consider only the second runs for comparison purposes.

Eric Lippert
Important to have this mentioned, but it does not really explain what we see here (the timings that can be measured show that there is no significant time spent for jitting).
0xA3
+3  A: 

My tests show the interface version to be about 3x slower when compiled in release mode. When compiled in debug mode they're almost neck-and-neck.

--------------------------------------------------------
 DoOne Release (ms) |  92 |  91 |  91 |  92 |  92 |  92
 DoTwo Release (ms) | 313 | 313 | 316 | 352 | 320 | 318
--------------------------------------------------------
 DoOne Debug (ms)   | 535 | 534 | 548 | 536 | 534 | 537
 DoTwo Debug (ms)   | 566 | 570 | 569 | 565 | 568 | 571
--------------------------------------------------------

EDIT

In my tests I used a slightly modified version of the DoTwo method so that it was directly comparable to DoOne. (This change didn't make any discernible difference to the performance.)

private static void DoTwo()
{
    IList<int> A = new List<int>();
    for (int i = 0; i < 200; i++) A.Add(i);
    int s = 0;
    for (int j = 0; j < 100000; j++)
    {
       for (int c = 0; c < A.Count; c++) s += A[c];
    }
}

The only difference between the IL generated for DoOne and (modified) DoTwo is that the callvirt instructions for Add, get_Item and get_Count use IList and ICollection rather than List itself.

I'm guessing that the runtime has to do more work to find the actual method implementation when the callvirt is through an interface (and that the JIT compiler/optimiser can do a better job with the non-interface calls than the interface calls when compiling in release mode).

Can anybody confirm this?

LukeH
I get about the same ratio between DoOne and DoTwo (36ms to 200ms) for the release version. These are averaged values of 20 calls for each method. So the reason does not seem to be incorrect timing metrics as suggested by arul (even if the penalty is not as high as measured by the OP)
0xA3
+8  A: 

Profiling one on one:

Testing with Snippet compiler.

using your code results:

0.043s vs 0.116s

eliminating Temporary L

0.043s vs 0.116s - ininfluent

by caching A.count in cmax on both Methods

0.041s vs 0.076s

     IList<int> A = new List<int>();
     for (int i = 0; i < 200; i++) A.Add(i);

     int s = 0;
     for (int j = 0; j < 100000; j++)
     {
        for (int c = 0,cmax=A.Count;c< cmax;  c++) s += A[c];
     }

Now I will try to slow down DoOne, first try, casting to IList before add:

for (int i = 0; i < 200; i++) ((IList<int>)A).Add(i);

0,041s 0,076s - so add is ininfluent

so it remains only one place where the slowdown can happen : s += A[c]; so I try this:

s += ((IList<int>)A)[c];

0.075s 0.075s - TADaaan!

so seems that accessing Count or a index element is slower on the interfaced version:

EDIT: Just for fun take a look at this:

 for (int c = 0,cmax=A.Count;c< cmax;  c++) s += ((List<int>)A)[c];

0.041s 0.050s

so is not a cast problem, but a reflection one!

kentaromiura
@kentaromiura, Once you eliminate the temporary L in Boaz's original code, the only difference between the IL generated for DoOne and DoTwo is that the callvirts for Add, get_Item and get_Count use IList and ICollection rather than List itself. I'm guessing that the runtime has to do more work to find the actual method implementation when the callvirt is through an interface.
LukeH
Yes, as I state eliminating L is ininfluent.Slowdown comes from accessing indexed value, probably because under the hood .net use reflection to get the actual value, I can safely state this without looking at IL code generated(actually I'm not able to do so ;) ) because 100000 cast from IList<int>to List<int> actually slows down the code by only 0.009s, so the other time must come from the .net runtime trying to understand to what it has to cast at runtime ;)
kentaromiura
@Luke: I think you should post your comment as an answer because that is the actual answer to the question. It seems in the release version the direct calls to List.get_Item and List.Count can be optimized by the JIT compiler (and this optimization does not happen in the debug built)
0xA3
@divo: Is exactly what I state:so seems that accessing Count or a index element is slower on the interfaced version
kentaromiura
@divo, As suggested I've updated my existing answer with this info. But even if the hypothesis is correct it doesn't explain why Boaz is seeing 100x slower performance for the interface version.
LukeH
Thanks kentaromiura . See the answer I added bellow.
Boaz
+6  A: 

First I want to thank all for their answers. It was really essential in the path figuring our what was going on. Special thanks goes to @kentaromiura which found the key needed to get to the bottom of things.

The source of the slow down of using List<T> via an IList<T> interface is the lack of the ability of the JIT complier to inline the Item property get function. The use of virtual tables caused by accessing the list through it's IList interface prevents that from happening.

As a proof , I have written the following code:

      public class VC
      {
         virtual public int f() { return 2; }
         virtual public int Count { get { return 200; } }

      }

      public class C
      {
         //[MethodImpl( MethodImplOptions.NoInlining)]
          public int f() { return 2; }
          public int Count 
          {
            // [MethodImpl(MethodImplOptions.NoInlining)] 
            get { return 200; } 
          }

      }

and modified the DoOne and DoTwo classes to the following:

      private static void DoOne()
      {
         C c = new C();
         int s = 0;
         for (int j = 0; j < 100000; j++)
         {
            for (int i = 0; i < c.Count; i++) s += c.f();
         }

      }
      private static void DoTwo()
      {
         VC c = new VC();
         int s = 0;
         for (int j = 0; j < 100000; j++)
         {
            for (int i = 0; i < c.Count; i++) s += c.f();
         }

      }

Sure enough the function times are now very similar to before:

 DoOne took 0.01273598 seconds.
 DoTwo took 8.524558 seconds.

Now, if you remove the comments before the MethodImpl in the C class (forcing the JIT not to inline) - the timing becomes:

DoOne took 8.734635 seconds.
DoTwo took 8.887354 seconds.

Voila - the methods take almost the same time. You can stil see that method DoOne is still slightly fast , which is consistent which the extra overhead of a virtual function.

Boaz
But that still doesn't really explain why you were seeing a slowdown of 100x (or even 800x), does it? When others experienced a slowdown DoTwo was only somewhere between 3x and 8x slower than DoOne.
LukeH
"I could imagine that the extra overhead a virtual call vs. a normal results in double the execution time."No, try to cache Count in cmax and you will see that the overhead is ~0, so the extra overhead is the access to Count property Another way to prove this is to put [MethodImpl(MethodImplOptions.NoInlining)] before the get {return 200;}
kentaromiura
@kentarmiura - you are absolutely correct. Stupid of me. I updated the post (choosing not to cache the Count result to make it clearer).
Boaz
@Luke - the extreme slowdowns have something to do with my specific installation of Parallels Dekstop (yes, I know timing on a VM is not the best of idea). Since then I checked on other Macs (en Parallels) and the results are consistent with what the rest of the world reports. I'm still investigating what is making mine slow - but at least it is consistently slow, as the results show.
Boaz