+5  A: 

First, have you tried profiling things to see if you could optimize your memory usage? A good place to start is with the CLR profiler (works with all CLRs up to 3.5).

Rewriting everything in C++ is an incredibly drastic change just for the sake of a small performance hit -- this is like fixing a paper cut by amputating your hand.

John Feminella
Thank you for the reply. C# was probably not the right choice here, but we had to have this up and running by day X. And it was decided that there was no way we could write it in C++ quickly enough.But now that that deadline is met and we're running, we have a lot more breathing room to make changes. So re-writing it project by project in C++ is one option that we're considering. We see some 100ms delays that are definitely caused by GCs. And we're considering this re-write only if we can't eliminate these through profiling, pre-allocating, etc.
Michael Covelli
+4  A: 

Are you certain that those 100ms delays are due to the GC? I would make VERY sure that the GC really is your problem before you spend a lot of time, effort, and money rewriting the thing in C++. Combining managed code with unmanaged code also presents its own problems, as you have to deal with marshalling between those two contexts. That will add its own performance drain, and your net gain could quite likely end up being zero in the end.

I would profile your C# application and narrow down exactly where your 100ms delays are coming from. This tool might be helpful:

How To: Use CLR Profiler

A word on the GC

Another word about the .NET GC (or really any GC, for that matter.) This one is not nearly said often enough, but it is a critical factor in successfully writing code with a GC:

Having a Garbage Collector does not mean you don't have to think about memory management!

Writing optimal code that plays nicely with the GC requires less effort and hassle than writing C++ code that plays nicely with an unmanaged heap...but you still have to understand the GC and write code that plays nicely with the it. You can't completely ignore all memory management related things. You have to worry about it less, but you still have to think about it. Writing code that plays nicely with the GC is a critically important factor in achieving performant code that does not CREATE memory management problems.

The following article should also be helpful, as it outlines the fundamental behaivor of the .NET GC (valid through .NET 3.5...its quite likely that this article is no longer completely valid for .NET 4.0 as there have been some critical changes to its GC...for one, it no longer has to block .NET threads while collection occurs):

Garbage Collection: Automatic Memory Management in the Microsoft .NET Framework

jrista
Thank you for your reply. Yes, unfortunately, every time there's a delay, the GC count confirms that there was a collection. We've never had a delay without a collection also occurring. I hadn't heard about that feature in .NET 4.0 (the not blocking .NET threads while a gc occurs). If that's true, that could really save us. Do you have a reference on it? Thanks!
Michael Covelli
Here's one: http://geekswithblogs.net/sdorman/archive/2008/11/07/clr-4.0-garbage-collection-changes.aspx.This is great, this might really help us out. I haven't tried the .NET 4.0 beta yet. Has anyone else tried to compare the GC performance in real world tests?
Michael Covelli
I believe .NET 4.0 is out of beta...isn't the VS2010 launch event happening in the next couple weeks? I've been using the new 2010/4.0 stuff for months in a trial/experimental, and its pretty rock solid stuff. I haven't done anything low-level with the new GC, but I have not had any issues with it at all either.
jrista
Just out of curiosity...are the collections that are pausing your application Gen2 collections? Or are they gen0/1 collections? I would be very surprised if gen0/1 collections are pausing your application...however I would not be so surprised if a gen2 collection did. If you get getting a lot of gen2 collections, that might be a problem that could be resolved with some optimization.
jrista
It's mostly the Gen2 collections. Thanks for your help, it sounds like the .NET 4 gc along with profiling and optimizations is the way to go rather than C++.
Michael Covelli
Good to hear. :) It is really interesting that you are getting so many Gen2 collections...those should be really rare. Hopefully profiling the code will help you identify your bottleneck.
jrista
+1  A: 

If 100 ms is an issue, I asusme your code is mission critical. Mixing managed and unmanaged code will have interop overhead of calling between managed appdomain and unmanaged space.

GC is very well optimized, so before doing that try to profile your code and refactor it. If you are concerned about GC, try playing with setting the thread priority and minimize object creation and cache the data whenever possible. In your project property turns on Optimize code setting too.

Fadrian Sudaman
Thanks for your reply. Do you have any data on how much the interop overhead actually is? And which method of doing it is best?
Michael Covelli
Generally the overhead is ignorable if you are just calling method, see my answer herehttp://stackoverflow.com/questions/2309383/performance-of-passing-data-between-net-and-com-assemblies/2310298#2310298Overhead will become significant if you start marshaling data between managed and unmanaged type such as using string and array. See this microsoft link for more info. msdn.microsoft.com/en-us/library/ms998551.aspx
Fadrian Sudaman
What about calling code in a .dll written in C++ using the DllImport with the calling convention set to Cdecl? If it only returns values and structs and IntPtrs?
Michael Covelli
Returning structs may have more overhead then IntPtr or normal values. Just to share, I read on recent post that a basic interop call (without marshalling overhead considered) will take about 10-30 instructions which is nothing on modern CPU that process million of instructions per second
Fadrian Sudaman
+3  A: 

The CLR GC does not suspend threads running unmanaged code during a collection. If the native code calls into managed code, or returns to managed code then it may be affected by a collection (like any other managed code).

Logan Capaldo
Thanks for your reply. I suspected that that was the case, but I've found some references that say competing things. Do you have a reference to confirm that?
Michael Covelli
@Michael: The references you've found might be related to unmanaged COM interop stuff, which won't block GC, IIRC. But regular unmanaged calls will.
John Feminella
@Michael http://msdn.microsoft.com/en-us/magazine/bb985011.aspxIf you think about it, there actually isn't a safe way to suspend a thread running native code to do a GC anyway. The CLR has no way of knowing if the native code is holding a resource required to perform the GC resulting in deadlock (this could happen for instance when you start involving the hosting apis).
Logan Capaldo
Good point, it really can't freeze those threads.
Michael Covelli
+1  A: 

One thought was to re-write it in C++, project by project. But if you combine C# with unmanaged C++, will the threads in the C++ projects also be frozen by garbage collections?

Not if the C++ code is running on different threads. the C++ heap and the managed heap are different things.

On the other hand, if your C++ code is doing a lot of new/delete, you will still begin to see allocation stalls in the C++ code as the heap gets to be fragmented. And these stalls are likely to be much worse than what you see in C# code because there is no GC. When the heap needs to be cleaned up, it just happens inside the call to new or delete.

If you really have a tight performance requirement, then you need to plan on not doing any memory allocation from the general heap inside your time critical code. In practice that means this will be more like C code than C++ code, or using special memory pools and placement new.

John Knoeller
Thanks for your reply. Do you have any references that confirm that the C++ threads won't, in fact, be frozen by GCs? I've found some conflicting references, and I'm just trying to confirm one way or the other.
Michael Covelli
@Michael: Sorry, no references. Just general knowledge. I've done realtime coding in C++ for the last decade. Windows C/C++ heap doesn't move active objects, or delay freeing of objects - thus no GC. But that doesn't mean that the cost of a new/delete call can't vary enormously from call to call, the only way to avoid the uncertainty is to get all of your memory allocation taking care of _before_ you go into your time critical code.
John Knoeller
Thanks for your help. You're right that moving to C++ won't fix things automatically since the new/delete time can still vary a lot. Best to optimize the C# and move all the new calls out of time critical areas first before considering a full re-write.
Michael Covelli
+5  A: 

I work as a .NET developer at a trading firm where, like you, we care about 100 ms delays. Garbage collection can indeed become a significant issue when dependable minimal latency is required.

That said, I don't think migrating to C++ is going to be a smart move, mainly due to how time consuming it would be. Garbage collection occurs after a certain amount of memory has been allocated on the heap over time. You can substantially mitigate this issue by minimizing the amount of heap allocation your code creates.

I'd recommend trying to spot methods in your application that are responsible for significant amounts of allocation. Anywhere objects are constructed is going to be a candidate for modification. A classic approach to fighting garbage collection is utilizing resource pools: instead of creating a new object every time a method is called, maintain a pool of already-constructed objects, borrowing from the pool on every method call and returning the object to the pool once the method has completed.

Another no-brainer involves hunting down any ArrayList, HashTable, or similar non-generic collections in your code that box/unbox value types, leading to totally unnecessary heap allocation. Replace these with List<T>, Dictionary<TKey, TValue>, and so on wherever possible (here I am specifically referring to collections of value types such as int, double, long, etc.). Likewise, look out for any methods you may be calling which box value type arguments (or return boxed value types).

These are just a couple of relatively small steps you can take to reducing your garbage collection count, but they can make a big difference. With enough effort it can even be possible to completely (or at least nearly) eliminate all generation 2 garbage collections during the continuous operations phase (everything except for startup and shutdown) of your application. And I think you'll find that generation 2 collections are the real heavy-hitters.

Here's a paper outlining one company's efforts to minimize latency in a .NET application through resource pooling, in addition to a couple of other methods, with great success:

Rapid Addition leverages Microsoft .NET 3.5 Framework to build ultra-low latency FIX and FAST processing

So to reiterate: I would strongly recommend investigating ways to modify your code so as to cut down on garbage collection over converting to an entirely different language.

Dan Tao
It sounds like resource pooling and profiling and possibly upgrading to .NET 4.0 for the Background GC is a good way to go instead of C++. Thanks for your help! The White Paper that you linked to is great. Do you know of anything that mentions more specifics (along the lines of the one function call they mention in the paper that does boxing and unboxing behind the scenes).That is, is there a list of best practices for things to do when trying to avoid GCs and resource pooling. Perhaps a complete list of functions not to call?
Michael Covelli
@Michael: I wish there were. Unfortunately Rapid Addition has not made their standard "do not call" list public, nor would I expect Microsoft to publicize such a list (especially since one would expect it to change over time as implementations of the listed methods become modified). Your best bet, as others have said, is to profile your code and find for yourself those places where unexpected levels of memory allocation may be occurring. In case you aren't already, I also strongly recommend using perfmon to monitor your GC count while debugging your app. It can be quite illuminating.
Dan Tao
+1  A: 

.NET 4.0 has what's called Background Garbage Collection, which is different than Concurrent Garbage Collection, which may be what is causing your issue. Jason Olson talks about it with Carl Franklin and Richard Campbell on .NET Rocks Episode #517. You can view the transcript here. It's on page 5.

I'm not completely sure if just upgrading to the 4.0 Framework will solve your problem, but I imagine it would be well worth your time looking into it before rewriting everything in C++.

Aaron Daniels
This is great, thank you. It looks like .NET 4.0 and some profiling is probably the way to go instead of a re-write.
Michael Covelli