views:

960

answers:

13

I'm currently writing a C# application that does a lot of digital signal processing, which involves a lot of small fine-tuned memory xfer operations. I wrote these routines using unsafe pointers and they seem to perform much better than I first thought. However, I want the app to be as fast as possible.

Would I get any performance benefit from rewriting these routines in C or C++ or should I stick to unsafe pointers? I'd like to know what unsafe pointers brings to the table in terms of performance, compared to C/C++.

EDIT: I'm not doing anything special inside these routines, just the normal DSP stuff: cache friendly data transfers from one array to the other with a lot of multiplications, additions, bit shiftings etc. in the way. I'd expect the C/C++ routines to look pretty much the same (if not identical) as their C# counterparts.

EDIT: Thanks a lot to everyone for all the clever answers. What I've learned is that I won't get any significant boost in performance just by doing a direct port, unless some sort of SSE optimization takes place. Assuming that all modern C/C++ compilers can take advantage of it I'm looking forward to give it a try. If someone is interested in the results just let me know and I'll post them somewhere. (May take a while though).

A: 

I would suggest that if you have any algorithms in your DSP code that need to be optimised then you should really be writing them in assembly, not C or C++.

In general, with modern processors and hardware, there aren't that many scenarios that require or warrant the effort involved in optimisation. Have you actually identified any performance issues? If not then it's probably best to stick with what you have. Unsafe C# is unlikely to be significantly slower than C/C++ in most cases of simple arithmetic.

Have you considered C++/CLI? You could have the best of both worlds then. It would even allow you to use inline assembler if required.

Stu Mackellar
Why jump straight to assembly? C and C++ have the advantage of being portable.
postfuturist
It depends on the level of optimization required. I knew someone who was so slick at optimizing 56k / 96k code that the manufacturers (Motorola) used to call _him_ for help. That was all assembler; a compiler would get in the way.
Peter K.
I've found MSVC to be optimal at instruction scheduling and register coloring it needs a fair bit of handholding if you want to use software pipelining, and I often have to resort to compiler intrinsics to get it to use just the hardware op I have in mind, but neither requires dropping to asm.
Crashworks
I would _never_ jump straight to ASM for anything. Your best is to write in C, optimise in C then take an ASM dump from the compiler and profile it. Tweak it where you can. No human can possibly schedule all the processing units and registers in a modern CPU.
Adam Hawes
Sometimes starting with asm is a help, not because of the actual instructions, but because you tend to think about your data differently in asm. If you have asm experience, though, as Crashworks says you can often convince the compiler to do things the right way. Having the knowledge of asm is what gets you the extra boost.
Nosredna
+6  A: 

I think you should write your DSP routines either in C++ (managed or unmanaged) or in C#, using a solid design but without trying to optimize everything from the start, and then you should profile your code and find the bottlenecks and try to optimize those away.

Trying to produce "optimal" code from the start is going to distract you from writing working code in the first place. Remember that 80% of your optimization is only going to affect 20% of your code as in a lot of cases only 10% of your code is responsible for 90% of your CPU time. (YMMV, as it depends on the type of application)

When I was trying to optimize our use of alpha blending in our graphics toolkit, I was trying to use SIMD the "bare metal" way first: inline assembler. Soon I found out that it's better to use the SIMD intrinsics over pure assembly, since the compiler is able to optimize readable C++ with intrinsics further by rearranging the individual opcodes and maximize the use of the different processing units in the CPU.

Don't underestimate the power of your compiler!

Dave Van den Eynde
+9  A: 

Another way to optimize DSP code is to make it cache friendly. If you have a lot of filters to apply to your signal you should apply all the filters to each point, i.e. your innermost loop should be over the filters and not over data, e.g.:

for each n do t´[n] = h(g(f(t[n])))

This way you will trash the cache a lot less and will most likely gain a good speed increase.

Andreas Magnusson
This is in fact a good suggestion. In fact, this could be called "good locality", and works not only for DSP code but for a lot of applications.
Dave Van den Eynde
+4  A: 

Would I get any performance benefit from rewriting these routines in C/C++ or should I stick to unsafe pointers?

In theory it wouldn't matter - a perfect compiler will optimize the code, whether C or C++, into the best possible assembler.

In practice, however, C is almost always faster, especially for pointer type algorithms - It's as close as you can get to machine code without coding in assembly.

C++ doesn't bring anything to the table in terms of performance - it is built as an object oriented version of C, with a lot more capability and ease of use for the programmer. While for some things it will perform better because a given application will benefit from an object oriented point of view, it wasn't meant to perform better - it was meant to provide another level of abstraction so that programming complex applications was easier.

So, no, you will likely not see a performance increase by switching to C++.

However, it is likely more important for you to find out, than it is to avoid spending time on it - I think it would be a worthwhile activity to port it over and analyze it. It is quite possible that if your processor has certain instructions for C++ or Java usage, and the compiler knows about them, it may be able to take advantage of features unavailable in C. Unlikely, but possible.

However, DSP processors are notoriously complex beasts, and the closer you get to assembly, the better performance you can get (ie, the more hand-tuned your code is). C is much closer to assembly than C++.

Adam Davis
I think I am correct in saying that switching from C to C++ will not gain you any performance, though switching from almost anything else to C++ probably will (in ref. to 3rd last paragraph, which is not clear).
Zooba
+3  A: 

First let me answer the question about "safe" vs "unsafe": You said in your post "I want the app to be as fast as possible" and that means you don't want to mess with "safe" or "managed" pointers ( don't even mention garbage collection ).

Regarding your choice of languages: C/C++ lets you work with the underlying data much much more easily without any of the overhead associated with the fancy containers that everyone is using these days. Yes it is nice to be cuddled by containers that prevent you from seg-faulting... but the higher-level of abstraction associated with containers RUINS your performance.

At my job our code has to run fast. An example is our polyphase-resamplers at work that play with pointers and masking operations and fixed point DSP filtering ... none of these clever tricks are really possible without low level control of the memory and bit manipulations ==> so I say stick with C/C++.

If you really want to be smart write all your DSP code in low level C. And then intermingle it with the more safe containers/managed pointers... when it gets to speed you need to take off the training wheels... they slow you down too much.

( FYI, regarding taking the training wheels off: you need to test your C DSP code extra offline to make sure their pointer usage is good... o/w it will seg fault. )

EDIT: p.s. "seg faulting" is a LUXURY for all you PC/x86 developers. When you are writing embedded code... a seg fault just means your processor will go into the wuides and only be recovered by power cycling ;).

Trevor Boyd Smith
+2  A: 

In order to know how you would get a performance gain, it's good to know the portions of code that could cause bottlenecks.

Since you're talking about small memory transfers, I assume all data will fit in the CPU's cache. In that case, the only gain you can achieve would be by knowing how to work the CPU's intrinsics. Typically, the compiler most familiar with the CPU's intrinsics is a C compiler. So here, I think you may improve performance by porting.

Another bottleneck will be on the path between CPU and memory - cache misses due to the big number of memory transfers in your application. The biggest gain will then lie in minimizing cache misses, which depend on the platform you use, and on the layout of your data (is it local or spread out through memory?).

But since you're already using unsafe pointers, you have that bit under your own control, so my guess is: on that aspect, you won't benefit much from a port to C (or C++).

Concluding: you may want to port small portions of your application into C.

xtofl
+14  A: 

I've actually done pretty much exactly what you're asking, only in an image processing area. I started off with C# unsafe pointers, then moved into C++/CLI and now I code everything in C++. And in fact, from there I changed from pointers in C++ to SSE processor instructions, so I've gone all the way. Haven't reached assembler yet, although I don't know if I need to, I saw an article on CodeProject that showed SSE can be as fast as inline assembler, I can find it if you want me to.

What happened as I went along was my algorithm went from around 1.5-2 frames per second in C# with unsafe pointers, to 40 frames per second now. C# and C++/CLI were definitely slower than C++, even with pointers, I haven't been able to get above 10 frames per second with those languages. As soon as I switched to C++, I got something like 15-20 frames per second instantly. A few more clever changes and SSE got me up to 40 frames per second. So yes, it is worth going down if you want speed in my experience. There is a clear performance gain.

Ray Hidayat
I can second Ray's experience. I moved some image filtering code from C++/CLI to pure C++ for about a 2 to 1 speed improvement.
Steve Fallows
I will happily third it, particularly if you are using Math.functions at all. I got a 10x speedup moving from Math::Log() (C++/CLI) to log() in math.h. Also remember #pragma managed(off) if you want to mix code in C++/CLI - it works fine.
Zooba
+1  A: 

Seeing that you're writing in unsafe code already, I presume it would be relatively easy to convert that to a C dll and call them from within C#. Do this after you have identified the slowest parts of your program and then replace them with C.

Hao Wooi Lim
+1  A: 

Your question is largely philosophical. The answer is this: dont't optimize until you profile.

You ask whether you'll gain improvement. Okay, you will gain improvement by N percent. If that's enough (like you need code that executes 200 times in 20 milliseconds on some embedded system) you're fine. But what if it's not enough?

You have to measure first and then find whether some parts of code could be rewritten in the same language but faster. Maybe you can redesign data structures to avoid unnecessary computations. Maybe you can skip on some memory reallocation. Maybe something is done with quadratic complexity when it could be done with linear complexity. You won't see it until you've measured it. This is usually much less of waste of time than just rewriting everything in another language.

sharptooth
+1  A: 

C# has no support for SSE (yet, there is a mono project for SSE operations). Therefor C/C++ with SSE would definitely be faster.

You must, however, be careful with managed-to-native and native-to-managed transitions, as they are quite expensive. Stay as long in either world as possible.

SealedSun
+1  A: 

Do you really want the app to be as fast as possible or simply fast enough? That tells you what you should do next.

MSN
+1  A: 

If you're insistent on sticking with your hand-roll, without hand-optimising in assembler or similar, the C# should be fine. Unfortunately, this is the kind of question that can only really be answered experimentally. You're already in unmanaged pointer space, so my gut feel is that a direct port to C++ would not see a significant difference in speed.

I should say, though, that I had a similar issue recently, and we ended up throwing away the hand-roll after trying the Intel Integrated Performance Primitives library. The performance improvements we saw there were very impressive.

metao
A very interesting answer, thanks a lot.
Trap
+1  A: 

Mono 2.2 now has SIMD support with this you can have the best of both worlds managed code and raw speed.

You might also want to have a look at Using SSE in c# is it possible?

Rex Logan