views:

624

answers:

4

After I upgraded my projects to .NET 4.0 (With VS2010) I realized than they run slower than they were in .NET 2.0 (VS2008). So i decided to benchmark a simple console application in both VS2008 & VS2010 with various Target Frameworks:

using System;
using System.Diagnostics;
using System.Reflection;

namespace RuntimePerfTest
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine(Assembly.GetCallingAssembly().ImageRuntimeVersion);
            Stopwatch sw = new Stopwatch();

            while (true)
            {
                sw.Reset();
                sw.Start();

                for (int i = 0; i < 1000000000; i++)
                {

                }

                TimeSpan elapsed = sw.Elapsed;
                Console.WriteLine(elapsed);
            }
        }
    }
}

Here is the results:

  • VS2008
    • Target Framework 2.0: ~0.25 seconds
    • Target Framework 3.0: ~0.25 seconds
    • Target Framework 3.5: ~0.25 seconds
  • VS2010
    • Target Framework 2.0: ~3.8 seconds
    • Target Framework 3.0: ~3.8 seconds
    • Target Framework 3.5: ~1.51 seconds
    • Target Framework 3.5 Client Profile: ~3.8 seconds
    • Target Framework 4.0: ~1.01 seconds
    • Target Framework 4.0 Client Profile: ~1.01 seconds

My initial conclusion is obviously that programs compiled with VS2008 working faster than programs compiled with VS2010.

Can anyone explain those performance changes between VS2008 and VS2010? and between different Target Frameworks inside VS2010 itself?

+21  A: 

I think I've got it.

If you're running on a 64 bit machine, make sure the build is set to "Any CPU" rather than "x86". Doing that fixed the issue on my machine.

The default for new projects changed in VS2010 from "Any CPU" to "x86" - I believe this was to make Edit and Continue work by default on 64 bit machines (as it only supports x86).

Running an x86 process on a 64 bit machine is obviously somewhat suboptimal.

EDIT: As per Dustin's comments, running x86 rather than x64 can have performance advantages in terms of more efficient use of memory (shorter references).

I also corresponded with Dustin about this by email, and he included these reasons:

FWIW, the default target platform wasn’t changed to support ENC. We had already shipped ENC broken on x64 for 2 releases. So by itself, ENC wasn’t really a compelling reason to switch. The primary reasons we switched (in no particular order) were:

  • IntelliTrace is not supported on x64. So, one of the coolest new features won’t work on x64 Windows for Any CPU projects.

  • x64 EXEs run slower on x64 Windows than x86 EXEs do. So, the idea of x86 debug, x64 release would mean that “optimized” builds in Release would actually perform worse.

  • Customer complaints when deploying an application and finding that it doesn’t work, even though it worked on their machine. These were often around P/Invoke, but there any many other assumptions that can be made in an application that can break when run with different bitness.

The above reasons coupled with the fact that an Any CPU brings no benefits (i.e. you can’t actually take advantage of the expanded address space because the EXE may still run on x86) was the reason that the default was switched.

Rick Byers has an excellent post on this topic here.

Jon Skeet
Might the difference be the platform target? csc.exe uses AnyCPU as the default, whereas VS 2010 has changed the default to x86.
0xA3
@0xA3: Isn't that basically what my answer says?
Jon Skeet
Yes, it is, but your answer didn't say that yet when I wrote my comment ;-) I'd wish there was an instant refresh if someone updates their answer.
0xA3
After change to Any CPU, i got ~0.25 seconds in all target frameworks, just like in VS2008, that's cool! It still doesn't explain the difference in performance in x86 platform with various Target Frameworks.
DxCK
Indeed, one of the motivations for the change of the default platform target has been Edit and Continue, as well as P/Invoke and COM Interop scenarios. The change however does not effect class libraries. Here the default is still AnyCPU. See https://connect.microsoft.com/VisualStudio/feedback/details/455333/platform-target-is-defaulting-to-x86-rather-than-any-cpu for a more detailed explanation.
0xA3
FWIW, running an x86 process on a 64-bit machine is not suboptimal. In general, running an x86 process on a 64-bit machine is actually *faster* than running an x64 process. See my response below for why I believe the benchmark is flawed.
Dustin Campbell
I'm not sure how JIT exactly works, but I find x64 binaries written in C/C++ on 64bit system faster. (I tried to write some genetic algorithms, and the time difference between x86 and x64 multithreaded builds was about 10%)
Yossarian
+5  A: 

I believe your benchmark is flawed. The IL code from VS 2008 and VS 2010 for your sample program is identical in release mode (VS 2008 targeting .NET 2.0 and VS 2010 targeting .NET 4.0 with default settings). Therefore you should not see a difference in timings between VS 2008 and VS 2010. Both compilers emit the following code:

.method private hidebysig static void  Main(string[] args) cil managed
{
  .entrypoint
  // Code size       69 (0x45)
  .maxstack  2
  .locals init ([0] class [System]System.Diagnostics.Stopwatch sw,
           [1] int32 i,
           [2] valuetype [mscorlib]System.TimeSpan elapsed)
  IL_0000:  call       class [mscorlib]System.Reflection.Assembly [mscorlib]System.Reflection.Assembly::GetCallingAssembly()
  IL_0005:  callvirt   instance string [mscorlib]System.Reflection.Assembly::get_ImageRuntimeVersion()
  IL_000a:  call       void [mscorlib]System.Console::WriteLine(string)
  IL_000f:  newobj     instance void [System]System.Diagnostics.Stopwatch::.ctor()
  IL_0014:  stloc.0
  IL_0015:  ldloc.0
  IL_0016:  callvirt   instance void [System]System.Diagnostics.Stopwatch::Reset()
  IL_001b:  ldloc.0
  IL_001c:  callvirt   instance void [System]System.Diagnostics.Stopwatch::Start()
  IL_0021:  ldc.i4.0
  IL_0022:  stloc.1
  IL_0023:  br.s       IL_0029
  IL_0025:  ldloc.1
  IL_0026:  ldc.i4.1
  IL_0027:  add
  IL_0028:  stloc.1
  IL_0029:  ldloc.1
  IL_002a:  ldc.i4     0x3b9aca00
  IL_002f:  blt.s      IL_0025
  IL_0031:  ldloc.0
  IL_0032:  callvirt   instance valuetype [mscorlib]System.TimeSpan [System]System.Diagnostics.Stopwatch::get_Elapsed()
  IL_0037:  stloc.2
  IL_0038:  ldloc.2
  IL_0039:  box        [mscorlib]System.TimeSpan
  IL_003e:  call       void [mscorlib]System.Console::WriteLine(object)
  IL_0043:  br.s       IL_0015
} // end of method Program::Main

One thing that might be different is the platform target. VS 2010 uses x86 as the default platform target whereas VS 2008 uses AnyCPU. If you are on a 64-bit system this will result in different JIT compilers being used for the VS 2008 vs. VS 2010 builds. This might lead to different results as the JIT compilers are developed separately.

0xA3
+4  A: 

I agree that the benchmark is a flawed.

  • It is too short.
  • As pointed out earlier, the different JITs for x86/x64 are likely optimizing the loop differently.
  • It really only tests stack variables which are likely JITted to fast register access. A more real world benchmark should at least move access the address space.

Most of the additional time is likely taken by the WoW layer in the x86 cases. However, the inherent inefficiencies of an x64 process would very likely outweigh the overhead of the WoW layer in a longer benchmark that actually touches memory. In fact, if the benchmark were to access memory (by creating and accessing objects on the heap), you'd see the WoW layers pointer optimization benefits.

Dustin Campbell
A: 

We have the same problem. After converting wpf project from .NET 3.5 (VS2008) to .NET 4 (VS2010), the GUI is much less responsive (almost 1 sec delay for every click).

After some investigation, we figured, it is because Visual Studio 2010 sucks much more resources and everything is slower when we degub from VS2010. When we run the builded project as .exe it runs fast again.

petval