As a learning experience I recently tried implementing Quicksort with 3 way partitioning in C#.
Apart from needing to add an extra range check on the left/right variables before the recursive call, it appears to work quite well.
I knew beforehand that the framework provides a built-in Quicksort implementation in List<>.Sort (via Array.Sort). So I tried some basic profiling to compare performance. Results: The built-in List<>.Sort method, operating on the same lists, performs about 10 times faster than my own manual implementation.
Using reflector, I found that the actual sorting in List<>.Sort is implemented in external code, not IL (in a function named tryszsort()).
Looking at my own Quicksort implementation I would expect that replacing the recursive calls with iteration might give some improvement. Also, disabling array bounds checking (if possible) could also give some benefits. Maybe this would get some way closer to the built-in implementation but I'm not confident.
So my question: Is it realistic to expect performance in an optimised algorithm (written in .NET IL, jitted to native code) can compete with performance of an externally implemented algorithm?
Once again, I realise Quicksort is provided as part of the framework, this was just a learning experience for me. However there are also many algorithms (CRC32 comes to mind) that are not provided, but still could be of much value to many applications. Here's a related question regarding implementing CRC32 in .NET and performance issues.
So if you need to implement such an algorithm in .NET, what are the major performance considerations to understand, so that your algorithm can at least approach the performance of external code?
[Update]
I have improved execution speed to within about 10% of the built in Array.Sort by changing the algorithm to operate on a simple array of Int, instead of List. In Reflector, I can see this avoids a Callvirt() operation on every get or set on the list. I thought this might improve things, but I'm surprised by how much.