views:

381

answers:

5

Here's an interesting article that I found on the web.

It talks about how this firm is able to parse a huge amount of financial data in a managed environment, essentially by object reuse and avoiding immutables such as string. They then go on and show that their program doesn't do any GC during the continuous operation phase.

This is pretty impressive, and I'd like to know if anyone else here has some more detailed guidelines as to how to do this. For one, I'm wondering how the heck you can avoid using string, when blatently some of the data inside the messages are strings, and whatever client application is looking at the messages will want to be passed those strings? Also, what do you allocate in the startup phase? How will you know it's enough? Is it simple a matter of claiming a big chunk of memory and keeping a reference to it so that GC doesn't kick in? What about whatever client application is using the messages? Does it also need to be written according to these stringent standards?

Also, would I need a special tool to look at the memory? I've been using SciTech memory profiler thus far.

+2  A: 

From what I understood, the article doesn't say they don't use strings. They don't use immutable strings. The problem with immutable strings is that when you're doing parsing, most of the strings generated are just throw-away strings.

I'm guessing they're using some sort of pre-allocation combined with free lists of mutable strings.

Pedro Rodrigues
+7  A: 

I found the paper you linked to rather deficient:

  • It assumes, and wants you to assume, that garbage collection is the ultimate latency killer. They have not explained why they think so, nor have they explained in what way their system is not basically a custom-made garbage collector in disguise.
  • It talks about the amount of memory cleaned up in garbage collection, which is irrelevant: the time taken to garbage collect depends more on the number of objects, irrespective of their size.
  • The table of “results” at the bottom provides no comparison to a system that uses .NET’s garbage collector.

Of course, this doesn’t mean they’re lying and it’s nothing to do with garbage collection, but it basically means that the paper is just trying to sound impressive without actually divulging anything useful that you could use to build your own.

Timwi
Do you think this paper is a red herring? I had an inclination to think so when I saw their reason for using .NET ("MSFT manages the hardware changes"), which isn't really that big a benefit.
Carlos
Managing hardware changes can be a big benefit with the sort of throughput they are talking about. At that level, people will want to recompile - and even rewrite - with new compiler optimisations for a new architecture, something that optimised JITting should do for you.
Jon Hanna
Most JITs don't do sufficient optimization to compete with static compiling with profile guided optimization. The reason for using .net is that it is much cheaper to produce managed code. To do something like this is not very complicated. You allocate all your resources up front and then don't run the GC. Many implement this type of architecture using object pools.
Steve
+1  A: 

I worked for a while with a CEP product called StreamBase. One of their engineers told me that they were migrating their C++ code to Java because they were getting better performance, fewer bugs and better portability on the JVM by pretty much avoiding GC altogether. I imagine the arguments apply to the CLR as well.

It seemed counter-intuitive, but their product was blazingly fast.

Here's some information from their site:

StreamBase avoids garbage collection in two ways: Not using objects, and only using the minimum set of objects we need.

First, we avoid using objects by using Java primitive types (Boolean, byte, int, double, and long) to represent our data for processing. Each StreamBase data type is represented by one or more primitive type. By only manipulating the primitive types, we can store data efficiently in stack or array allocated regions of memory. We can then use techniques like parallel arrays or method calling to pass data around efficiently.

Second, when we do use objects, we are careful about their creation and destruction. We tend to pool objects rather than releasing them for garbage collection. We try to manage object lifecycle such that objects are either caught by the garbage collector in the young generation, or kept around forever.

Finally, we test this internally using a benchmarking harness that measures per-tuple garbage collection. In order to achieve our high speeds, we try to eliminate all per-tuple garbage collection, generally with good success.

Drew Noakes
Honestly, I would hate to work on that code base from the sound of it. No object model, no code structure, wow. That’s just horrible. If they so badly wanted to avoid the GC, then why switch to Java in the first place?
Konrad Rudolph
Like I said, it's counter-intuitive. They had a great product though with great performance, developed by some smart people. I guess they had their reasons. It's not that they didn't have an object model though, nor code structure. It's just that they reuse objects wherever possible and when GC is required, they make sure the object is in Gen0 (good practice anyway). I'm no C++ guru, but I think I'd rather program C# than C++, even with the constraints they set for themselves.
Drew Noakes
@Drew: Absolutely. C++ has little advantage here and C# has the huge advantages of memory safety and .NET interop.
Jon Harrop
A: 

In 99% of the time you will be wasting your bosses money when you try to achieve this. The article describes a absolute extreme scenario where they need the last drop of performance. As you can read in the article, there are great parts of the .NET framework that can't be used when trying to be GC-free. Some of the most basic parts of the BCL use memory allocations (or 'produce garbage', as the paper calls it). You will need to find a way around those methods. And even when you need absolute blazingly fast applications, you'd better first try to build an application/architecture that can scale out (use multiple machines), before trying to walk the no-GC route. The sole reason for them to use the no-GC route is they need an absolute low latency. IMO, when you need absolute speed, but don't care about the absolute minimum response time, it will be hard to justify a no-GC architecture. Besides this, if you try to build a GC-free client application (such as Windows Forms or WPF App); forget it, those presentation frameworks create new objects constantly.

But if you really want this, it is actually quite simple. Here is a simple how to:

  • Find out which parts of the .NET API can't be used (you can write a tool that analyzes the .NET assemblies using an introspection engine).
  • Write a program that verifies the code you or your developers write to ensure they don't allocate directly or use 'forbidden' .NET methods, using the safe list created in the previous point (FxCop is a great tool for this).
  • Create object pools that you initialize at startup time. The rest of the program can reuse existing object so that they won't have to do any new ops.
  • If you need to manipulate strings, use byte arrays for this and store byte arrays in a pool (WCF uses this technique also). You will have to create an API that allows manipulating those byte arrays.
  • And last but not least, profile, profile, profile.

Good luck

Steven
+4  A: 

One thing to note from the beginning is where they say "Conventional wisdom has been developing low latency messaging technology required the use of unmanaged C++ or assembly language". In particular, they are talking about a sort of case where people would often dismiss a .NET (or Java) solution out of hand. For that matter, a relatively naïve C++ solution probably wouldn't make the grade either.

Another thing to consider here, is that they have essentially haven't so much gotten rid of the GC as replaced it - there's code there managing object lifetime, but it's their own code.

There are several different ways one could do this instead. Here's one. Say I need to create and destroy several Foo objects as my application runs. Foo creation is parameterised by an int, so the normal code would be:

public class Foo
{
    private readonly int _bar;
    Foo(int bar)
    {
        _bar = bar;
    }
    /* other code that makes this class actually interesting. */
}

public class UsesFoo
{
    public void FooUsedHere(int param)
    {
        Foo baz = new Foo(param)
        //Do something here
        //baz falls out of scope and is liable to GC colleciton
    }
}

A much different approach is:

public class Foo
{
    private static readonly Foo[] FOO_STORE = new Foo[MOST_POSSIBLY_NEEDED];
    private static Foo FREE;
    static Foo()
    {
        Foo last = FOO_STORE[MOST_POSSIBLY_NEEDED -1] = new Foo();
        int idx = MOST_POSSIBLY_NEEDED - 1;
        while(idx != 0)
        {
            Foo newFoo = FOO_STORE[--idx] = new Foo();
            newFoo._next = FOO_STORE[idx + 1];
        }
        FREE = last._next = FOO_STORE[0];
    }
    private Foo _next;
    //Note _bar is no longer readonly. We lose the advantages
    //as a cost of reusing objects. Even if Foo acts immutable
    //it isn't really.
    private int _bar;
    public static Foo GetFoo(int bar)
    {
        Foo ret = FREE;
        FREE = ret._next;
        return ret;
    }
    public void Release()
    {
        _next = FREE;
        FREE = this;
    }
    /* other code that makes this class actually interesting. */
}

public class UsesFoo
{
    public void FooUsedHere(int param)
    {
        Foo baz = Foo.GetFoo(param)
        //Do something here
        baz.Release();
    }
}

Further complication can be added if you are multithreaded (though for really high performance in a non-interactive environment, you may want to have either one thread, or separate stores of Foo classes per thread), and if you cannot predict MOST_POSSIBLY_NEEDED in advance (the simplest is to create new Foo() as needed, but not release them for GC which can be easily done in the above code by creating a new Foo if FREE._next is null).

If we allow for unsafe code we can have even greater advantages in having Foo a struct (and hence the array holding a contiguous area of stack memory), _next being a pointer to Foo, and GetFoo() returning a pointer.

Whether this is what these people are actually doing, I of course cannot say, but the above does prevent GC from activating. This will only be faster in very high throughput conditions, if not then letting GC do its stuff is probably better (GC really does help you, despite 90% of questions about it treating it as a Big Bad).

There are other approaches that similarly avoid GC. In C++ the new and delete operators can be overridden, which allows for the default creation and destruction behaviour to change, and discussions of how and why one might do so might interest you.

A practical take-away from this is when objects either hold resources other than memory that are expensive (e.g. connections to databases) or "learn" as they continue to be used (e.g. XmlNameTables). In this case pooling objects is useful (ADO.NET connections do so behind the scenes by default). In this case though a simple Queue is the way to go, as the extra overhead in terms of memory doesn't matter. You can also abandon objects on lock contention (you're looking to gain performance, and lock contention will hurt it more than abandoning the object), which I doubt would work in their case.

Jon Hanna
+1 for "letting GC do its stuff is probably better".
Steven
Heck yes, while there are times when stuff like this is genuinely useful, most methods to usurp the GC fall into the category of "interesting, now never ever do it" while most attempts to usurp it fall into the category of "you had a problem, you did something, now you have two problems". I've only once had cause to do anything other than let the GC do its thing in real code, and that one time was very local to one spot where the memory use patterns of the application briefly went completely different to it's normal operation.
Jon Hanna