ansaurus

Question

Peculiar result relating to struct size and performance

Answer 1

+1 A:

I can't reproduce your results. On my box, the "ref" version has basically the same performance for Big and Small, within tolerance.

(Running Release mode without the debugger attached, with 10 or 100 times as many iterations just to try to get a nice long run.)

Have you tried running your version for lots of iterations? Is it possible that while the tests are running, your CPU is gradually increasing its clock speed (as it spots that it's having to work hard)?

Jon Skeet 2010-09-27 19:01:48

Same here, the operator version is equally fast as the ref version. I think he is running the release version in the debugger. If I do that, I get results similar to jalexiou's.

jdv 2010-09-27 19:20:12

I have added a long calculation in the benning of the code to "warm-up" the CPU's. Thanks for the suggestion because I now get the same timing .. very interesting.

jalexiou 2010-09-27 19:47:20

Answer 2

+3 A:

There appear to be a couple of flaws in your benchmark.

Use Stopwatch instead of the PerformanceTimer type. I'm not familiar with the latter and it appears to be a 3rd party component. It's particularly troubling that it's measuring time in EllapsedSeconds instead of EllapsedMilliseconds.
Should run each test twice and count only the second to eliminate potential JIT costs
Marshal.SizeOf is does not produce the actual size of the struct, just it's marshalling size.

After switching to Stopwatch I see the benchmark performing as expected by producing nearly equal times for both types in the static ref case.

JaredPar 2010-09-27 19:06:39

PerformanceTimer is taken from <http://www.codeproject.com/csharp/highperformancetimercshar.asp> and uses the [DllImport("Kernel32.dll")] static extern bool QueryPerformanceCounter( out long lpPerformanceCount); [DllImport("Kernel32.dll")] static extern bool QueryPerformanceFrequency( out long lpFrequency);functions

jalexiou 2010-09-27 19:21:40

@jalexiou: Yes, but I'd recommend using Stopwatch in modern code. The article predates Stopwatch (it's 8 years old!) - it's cleaner to use the built-in timer. I doubt that that caused any problems, but it's worth knowing about.

Jon Skeet 2010-09-27 20:02:27

There are two things that I don't like about `Stopwatch`. a) The resolution is in integer miliseconds (it shoul'd be in micro or nanoseconds). b) The need to Reset() before each evaluation cycle make it a hasle sometimes. The performance counter is always running and I am just polling the current `tics`. My current resolution limit is 0.33ns (freq=3.0 GHz), but realistically I could only measure maybe 10-15ns.

jalexiou 2010-09-28 14:43:11

Answer 3

A:

Agreed with Jared, this is a benchmarking error.

The essence of the issue/discrepancy your seeing is a result of not running a 'warmup' on the tests. This ensures that all types and methods have been loaded in the CLR runtime. You should place a for loop around the main test and always run the benchmarks several times... watch for the change after the first set in the following results:

    Size of Small is 8 bytes
    Size of Big is 80 bytes

    5,000,000.00 Iterations
    Operator Results
      Small=523.00000       Big=1953.00000  Slower=x3.73
    StaticRef Results
      Small=2042.00000      Big=2125.00000  Slower=x1.04
      Small=x0.26   Big=x0.92

    5,000,000.00 Iterations
    Operator Results
      Small=2464.00000      Big=3510.00000  Slower=x1.42
    StaticRef Results
      Small=3578.00000      Big=3647.00000  Slower=x1.02
      Small=x0.69   Big=x0.96

    5,000,000.00 Iterations
    Operator Results
      Small=3921.00000      Big=4817.00000  Slower=x1.23
    StaticRef Results
      Small=4880.00000      Big=4944.00000  Slower=x1.01
      Small=x0.80   Big=x0.97

csharptest.net 2010-09-27 19:31:10

Answer 4

A:

I have a couple of suggestions.

Use the Stopwatch class. It uses the exact same Win32 APIs, but is already coded for you.
Increase the iteration count so that your benchmarks take at least 1s (or more) to run otherwise anomalies could crop up and dominate the time.
Consider the effects of the vshost.exe process. You will get different results for both the Debug and Release builds depending on whether your run the application standalone or through the Visual Studio host process.

When I ran your code I saw similiar results for the pass-by-ref test in all test scenarios. What really stuck out for me was just how much faster the smaller struct was in a Release build standalone (ie. not through vshost.exe).

Release build standalone:

Size of Small is 8 bytes
Size of Big is 80 bytes
50,000,000.00 Iterations
Operator Results
  Small=0.57173 Big=25.58988    Slower=x44.76

StaticRef Results
  Small=26.06602        Big=26.68569    Slower=x1.02
  Small=x0.02   Big=x0.96

Release build through vshost:

Size of Small is 8 bytes
Size of Big is 80 bytes
50,000,000.00 Iterations
Operator Results
  Small=4.56601 Big=35.33387    Slower=x7.74

StaticRef Results
  Small=37.94317        Big=39.64959    Slower=x1.04
  Small=x0.12   Big=x0.89

Brian Gideon 2010-09-27 19:35:56

Answer 5

A:

Thanks everyone for their input. Here are some final thoughts.

PerformanceCounter yields the same results as Stopwatch so that is a no issue.

Final results:

1. For small struct using an operator or by-ref yields the same performance
2. For big struct using by-ref is like 14x faster
3. Big struct is x20 slower than small struct for operators (as expected)
4. Big struct is about 50% slower than small struct with by-ref (still interesting)

So the final question is what is the mechanism that makes the Big struct slower with by-ref since no stack copying should occur?

Results from release executable external to visual studio.

Size of Small is 8 bytes
Size of Big is 80 bytes
5,000,000.00 Iterations
Warming up the CPU's

Using QueryPerformanceCounter
Operator Results
  Small=0.03545 Big=0.71519     Slower=x20.18

StaticRef Results
  Small=0.03526 Big=0.05194     Slower=x1.47
  Small=x1.01   Big=x13.77

jalexiou 2010-09-27 20:02:05

ansaurus

tags:

views:

answers:

Peculiar result relating to struct size and performance

Results from release executable external to visual studio.

related questions