views:

563

answers:

2

If I am hosting a long running application such as a web server within a Common Lisp image, what strategy should I use to manage the garbage collector?

I'm assuming that, by default, the garbage collector is entitled to spend long periods of time sorting out the heap, at times I can't predict. This may impact a particular browser request in ways I don't want.

Is there a method in Common Lisp to control this? Perhaps by encouraging it to work in a 'little-and-often' manner?

+1  A: 

Garbage collection has come a long way since the early days, and a lot of work has been done into avoiding the unpredictable long wait. For modern implementations, I think those are a thing of the past.

However, the details of garbage collection do vary by the implementation. There aren't that many high-quality Lisp implementations out there, so you should have no difficulty consulting their documentation on garbage collection.

David Thornley
+19  A: 

Several Lisp implementations have excellent Garbage Collectors. A special problem is that Lisp applications often have a high allocation rate of small objects (conses, ...).

There are a few things to consider.

  • Precise vs. Conservative GC. I'm not a huge fan of conservative GCs (Boehm etc.) for Lisp. A problem of conservative GCs is that they don't find all garbage. This can be a problem for long running programs and lead to fragmentation and not reclaimed unused memory. Precise GCs are using the tag information of Lisp data and can identify every data type of every object. Conservative GCs were invented for programming language implementations that don't use tagged data (C++, ...).

  • copying GC, compacting GC. To work against memory fragmentation in long running Lisps, a GC that compacts and localizes objects can be useful. A problem sometimes comes up when hashtables need to be rehashed (because the location changes). A copying GC may need more memory (because there is a from and a to space of memory). But when the GC copies the objects from one memory space into another, it automatically makes it more compact. More advanced GCs (like on the Lisp Machine) can also sort objects and allocate objects of the same type near each other - assuming that this will speed up accessing objects.

  • ephemeral GC. This means that there is a first GC stage that exclusively runs in main memory and gets some support from a memory management unit to identify changed memory regions. Scanning main memory is fast than scanning virtual memory and scanning only changed memory regions reduces the amount of work further. When lots of objects are allocated and quickly turning into garbage, this gives very short GC pauses.

  • generational GC. Usually GCs nowadays are generational. There is more than one generation and objects that survive a few GCs are promoted to an older generation. Usually only the first generation is GCed very often.

  • Tuning. GCs of, say, LispWorks and Allegro CL have a lot of tuning knobs. Especially for long-running applications it makes sense to read the manual and for example tune the number of generations, their sizes and other things.

  • virtual memory. GC over virtual memory is usually very slow. Avoid that if possible - add more RAM to the machines.

  • manual memory management. For example the CL-HTTP web server does some manual memory management using resources. These are pools of pre-allocated objects that can be reinitialized very quickly. The Lisp Machines were using this a lot. Typical use of them is in read buffers for streams. Instead of creating new strings by every read operation, it is useful to use reusable buffers.

  • stack allocation. Some Common Lisp allow stack allocation of some data - leaving the block then automatically frees the memory. This assumes then that the memory is no longer referenced on leaving a block.

  • concurrent GC. None of the usual Common Lisp implementations has a concurrent GC AND support for concurrent Lisp threads. Some implementations have concurrent Lisp threads, but the GC will stop them all while it is working.

  • profiling the GC. If you are not sure where allocation happens and what the GC does, you need to find that out using profiling information.

If your Lisp has a precise generational GC which runs in main memory it is hard to get problems with long pauses. Clozure CL (a free Common Lisp implementation) for example has a very good GC implementation. You want to avoid memory fragmentation and garbage collections in virtual memory. If necessary use a 64bit Lisp implementation with more main memory.

Pointers:

You can see from the documentation that LispWorks and Allegro CL have lots of knobs for tuning GC.

Common Lisp has a few functions that deal with the implementation environment. (ROOM) is a function that gives an overview of memory usage. (ROOM t) gives more detail (here LispWorks):

CL-USER 2 > (room t)
 Generation 0:  Total Size 1823K, Allocated 1090K, Free 725K 
          Segment 2008AAB8: Total Size 507K, Allocated 361K, Free 142K
                    minimum free space 64K, 
                      Awaiting promotion = 0K, sweeps before promotion =10
          Segment 217E7050: Total Size 1315K, Allocated 729K, Free 582K
                    minimum free space 0K, 
                      Awaiting promotion = 0K, sweeps before promotion =2
 Generation 1:  Total Size 1397K, Allocated 513K, Free 871K 
          Segment 20CB9A50: Total Size 68K, Allocated 48K, Free 15K
                    minimum free space 0K, 
                      Awaiting promotion = 0K, sweeps before promotion =4
          Segment 216D7050: Total Size 1088K, Allocated 331K, Free 752K
                    minimum free space 0K, 
                      Awaiting promotion = 0K, sweeps before promotion =4
          Segment 2004E4F8: Total Size 241K, Allocated 133K, Free 103K
                    minimum free space 0K, static
 Generation 2:  Total Size 2884K, Allocated 1290K, Free 1585K 
          Segment 21417050: Total Size 2816K, Allocated 1227K, Free 1584K
                    minimum free space 0K, 
                      Awaiting promotion = 0K, sweeps before promotion =4
          Segment 20DA5DA0: Total Size 68K, Allocated 62K, Free 1K
                    minimum free space 117K, 
                      Awaiting promotion = 0K, sweeps before promotion =4
 Generation 3:  Total Size 19373K, Allocated 19232K, Free 128K 
          Segment 20109A50: Total Size 11968K, Allocated 11963K, Free 0K
                    minimum free space 3K, 
                      Awaiting promotion = 0K, sweeps before promotion =10
          Segment 20DB6E18: Total Size 6528K, Allocated 6396K, Free 128K
                    minimum free space 0K, 
                      Awaiting promotion = 0K, sweeps before promotion =10
          Segment 20CCAAC8: Total Size 876K, Allocated 872K, Free 0K
                    minimum free space 0K, 
                      Awaiting promotion = 0K, sweeps before promotion =10

Total Size 25792K, Allocated 22127K, Free 3310K
Rainer Joswig
I don't think your statement of tag usage is entirely correct.A precise GC will indeed decide how to clean up by looking at an object's tag (as will the imprecise), but first it has to know whether a register's content is an object pointer or an immediate value. This latter operation is what separates precise from imprecise GCs.
skypher
Take an array of different things: immediate objects (numbers, characters, ...) and non-immediate objects (the array holds a pointer to them: arrays, structures, strings, ...). How does the GC knows that one thing is a number and the other one is a pointer (to a structure)? Conservative GC was developed for languages like C and C++ where there are no tagged objects. The GC heuristically determines what could be a pointer and what not.
Rainer Joswig
It may have been developed for untagged languages like C, but it's used even in some Lisp environments today, e.g., CMUCL and SBCL use a conservative GC on "register poor" architectures like x86: http://sbcl-internals.cliki.net/GENCGC
Ken