Update: Microsoft have now reproduced the bug and are working on a fix.
Whilst evaluating the viability of the .NET platform for low latency software development, we have discovered a serious bug in the .NET 4 concurrent workstation garbage collector that can cause applications to hang for up to several minutes at a time.
On three of our machines the following simple C# program causes the GC to leak memory until none remains and a single mammoth GC cycle kicks in, stalling the program for several minutes (!) while 11Gb of heap is recycled:
static void Main(string[] args)
{
var q = new System.Collections.Generic.Queue<System.Object>();
while (true)
{
q.Enqueue(0);
if (q.Count > 1000000)
q.Dequeue();
}
}
You need to compile for x64 on a 64-bit Windows OS with .NET 4 and run with the default (concurrent workstation) GC using the default (interactive) latency setting.
Here's what the Task Manager looks like when running this program on this machine:
Note that 11Gb of heap have been leaked here when this program requires no more than 100Mb of memory.
We have now accumulated around a dozen repros of this bug, written in F# as well as C#, and it appears to be related to a bug in the GC write barrier when most of gen0 survives. However, Microsoft have not yet been able to reproduce it. Can you? If so, can you please describe your setup as accurately as possible so we can try to narrow down exactly what conditions are required for this bug to manifest.