I am considering porting a small portion of the code in a C# project of mine to C/ASM for performance benefits. (This section of code uses many bitwise operations and is one of the few places where there may exist a real performance increase by using native code.) I then plan to simply call the native function in the separate DLL via P/Invoke. Now, the only data that will be passed between the managed and native code will be purely primitive types (bool, int, long, 1D arrays, etc.). So my question is: will there be any significant overhead using P/invoke simply with primitive types? I am aware that there is a substatial overhead when using more complex types, since they need to be marshalled (pinned/copied), but perhaps in my situation it will be relatively efficient (compared to calling the code from within the native DLL itself even)? If someone could clarify this matter for me, explaining the degrees of performance advantages/hits and the reasons behind them, it would be much appreciated. An alternative way to accomplish the whole task would also be welcome, though since C# lacks support for inline assembly/CIL, I don't believe there is one.
You could generate a compiled, optimized version of your .NET assembly by using ngen
on the end-user's computer (as part of the install process).
In my experience properly formatted C# (e.g., keep allocation outside of loops) will perform very well.
I seem to recall hearing that there's at least a 30 machine op overhead for each P/Invoke call. But ignore the theory, profile your options and choose the fastest.
I would personally setup a test harness with a simple expression written in C# and unmanaged C++, then profile the app to see what kind of performance delta you're working with.
Something else to consider is that you'd be introducing a maintenance issue with the app, especially if you have junior-level developers expected to maintain the code. Make sure you know what you're gaining and what what you're losing respective to performance as well as code-clarity and maintainability.
As an aside, JIT'd C# code should have performance on par with C++ with respect to arithmetic operations.
From MSDN (http://msdn.microsoft.com/en-us/library/aa712982.aspx):
"PInvoke has an overhead of between 10 and 30 x86 instructions per call. In addition to this fixed cost, marshaling creates additional overhead. There is no marshaling cost between blittable types that have the same representation in managed and unmanaged code. For example, there is no cost to translate between int and Int32."
So, it's reasonably cheap, but as always you should measure carefully to be sure you are benefitting from it, and bear in mind any maintenance overhead. As an aside, I would recommend C++/CLI ("managed" C++) over P/Invoke for any complex interop, especially if you're comfortable with C++.