views:

185

answers:

3

As a learning experience I recently tried implementing Quicksort with 3 way partitioning in C#.

Apart from needing to add an extra range check on the left/right variables before the recursive call, it appears to work quite well.

I knew beforehand that the framework provides a built-in Quicksort implementation in List<>.Sort (via Array.Sort). So I tried some basic profiling to compare performance. Results: The built-in List<>.Sort method, operating on the same lists, performs about 10 times faster than my own manual implementation.

Using reflector, I found that the actual sorting in List<>.Sort is implemented in external code, not IL (in a function named tryszsort()).

Looking at my own Quicksort implementation I would expect that replacing the recursive calls with iteration might give some improvement. Also, disabling array bounds checking (if possible) could also give some benefits. Maybe this would get some way closer to the built-in implementation but I'm not confident.

So my question: Is it realistic to expect performance in an optimised algorithm (written in .NET IL, jitted to native code) can compete with performance of an externally implemented algorithm?

Once again, I realise Quicksort is provided as part of the framework, this was just a learning experience for me. However there are also many algorithms (CRC32 comes to mind) that are not provided, but still could be of much value to many applications. Here's a related question regarding implementing CRC32 in .NET and performance issues.

So if you need to implement such an algorithm in .NET, what are the major performance considerations to understand, so that your algorithm can at least approach the performance of external code?

[Update]

I have improved execution speed to within about 10% of the built in Array.Sort by changing the algorithm to operate on a simple array of Int, instead of List. In Reflector, I can see this avoids a Callvirt() operation on every get or set on the list. I thought this might improve things, but I'm surprised by how much.

+6  A: 

By using non-recursive code and, especially, by using "unsafe" blocks and pointer arithmetic (if applicable) you could actually see a x5 or x10 performance gain with an algorithm written in C#. As always with performance (and even more when dealing with a managed environment), you never quite know until you try it and benchmark it.

Now, generally speaking, you should mostly write stuff in C#, then optimize it, optimize it some more, and if it's still not good enough, identify the exact critical piece of code and port it to C++ (while being careful about limiting the number of managed/native call boundaries).

Ludovic
Interesting points on unsafe mode. I tried to avoid this by default, but in some cases pointer arithmetic might be the only option. I agree completely about the optimization approach, I also try not to get caught up on micro optimizations; improving the overall design may give much better approach improvements.
Ash
+1, but could you elaborate on how an unchecked block would increase performance here? Because unchecked is the default, and I don't think it has any effect on array bounds checks.
JulianR
Note that porting to C++ won't always give a perf benefit, as there's a penalty for managed-to-unmanaged calls that go through P/Invoke, because of calling convention mismatch, and the fact that both cdecl and stdcall push arguments to register. Most of `extern` methods in BCL use "fast" calling convention which relies on native code implementing the method knowing implementation details of the CLR, so they don't pay that price.
Pavel Minaev
Good points Pavel. That's why I said to be careful about native/managed boundaries, but I didn't explain it.Julian, I don't think "unchecked" is the default, as you get array out of bounds exceptions if you use incorrect indexes.
Ludovic
Arrays are always bounds checked, unless the compiler can deduce it's not necessary or if you use pointer arithmetic. Unchecked blocks have no effect here AFAIK. What I mean by "unchecked" by default" is that overflows of primitive types by default do not generate exceptions unless you wrap it in a `checked` block or use the compiler option. And when using the compiler option, `unchecked` allows the block to behave like the default. Unless I'm mistaken, `unchecked` has no role in performance optimizations.
JulianR
Ah yes, I got the "unchecked" thing wrong again (yes, it's not the first time I think it does something else :) ).Thanks!
Ludovic
+3  A: 

Just out of curiosity, as despite my 9 years of experience with .NET, I still constantly make this mistake: Did you compile your code in Release mode with optimizations on? Debug code performs significantly worse than optimized release code.

Assuming you DID compile in release mode, there shouldn't be a huge difference in performance if you implement the algorithm similarly (i.e. iterative vs. iterative or recursive vs. recursive). If you wish to see the .NET implementation and figure out, you can download the SSCLI, Share-Source Common Language Infrastructure. This is Microsoft's publicly availble ECMA-standard compliant CLI implementation. It is not 100% of the .NET framework we all know and love, but it is a significant portion of it. It can provide a lot of information that Reflector can't, including internal implementations. All types of code is available, including C#, C++, and even some Assembler in a couple cases.

jrista
Yes, I compiled in Release mode, however I didn't explicitly check for any optimizations, I assumed setting Release is enough. I'll have to look at this, so thanks for the tip. Thanks aso for the link to the SSCLI too. Interesting to see if/how Sort is implemented.
Ash
+1  A: 

Make sure you're comparing apples and apples.

When sorting, it's possible for the Compare function to be dominant, and that might differ between the implementations.

Assuming the Compare function in both cases is fast enough not to be an issue, then the time can be dominated by stuff like array bounds checking, which can easily make a big difference.

Mike Dunlavey
Good tip. In this case I'm currently just using basic Int32 comparisons. I did find that sorting a simple int array is much faster than a List<int>. It appears that there is a "Callvirt" for every get and set on the List<int>. Removing this by using an array halved (at least) the execution time.
Ash