views:

60

answers:

2

what are the effects of paging on garbage collection ?

+3  A: 

The effects of paging on garbage-collection are pretty much the same as upon anything; it allows access to lots of memory, but hurts performance when it happens.

The more pressing question, is what is the effect of garbage-collection on paging?

Garbage collection can cause areas of memory to be read from and written to that would not be considered otherwise at a given point of time. Reducing the degree to which garbage collection causes paging to happen is therefore advantageous. This is one of the advantages that a generational compacting collector offers, as it leads to more short-lived objects being in one page, collected from that page, and the memory made available to other objects, while also keeping long-lived objects in a page where related objects are more likely to also be (long-lived objects will often be related to other long-lived objects because one long-lived object is keeping the others alive). This not only reduces the amount of paging necessary to perform the collection, but can help reduce the amount of paging necessary for the rest of the application.

Jon Hanna
There is one more thing. The garbage collector may have to look through a _lot_ of memory that then needs to be swapped in before it can be looked upon. THe impact may feel worse forthat.
Thorbjørn Ravn Andersen
@Thorbjørn yes, I was thinking of that in "much the same as upon anything", but the amount of memory GC can have to scan through is unusually large compared to most other code, so it's well worth listing it separately.
Jon Hanna
A: 

First a bit of terminology. In some areas, e.g. Linux-related talks, paging is a feature of the operating system in which executable code needs not be permanently in RAM. Executable code comes from an executable file, and the kernel loads it from the disk on demand, when the CPU walks through the instructions in the program. When memory is tight, the kernel may decide to simply "forget" a page of code, because it knows that it can always reload it from the executable file, if that code needs to be executed again.

The kernel also implements another feature which is called swapping and is about something similar, but for data. Data is not obtained from the executable file. Hence, the kernel cannot simply forget a page of datal; it has to save it somewhere, in a dedicated area called a "swap file" or "swap partition". This makes swapping more expensive than paging: the kernel must write out the data page before reusing the corresponding RAM, whereas a code page can simply be reused directly. In practice, the kernel pages quite a lot before considering swapping.

Paging is thus orthogonal to garbage collection. Swapping, however, is not. The general rule of thumb is that swapping and GC do not mix well. Most GC work by regularly inspecting data, and if said data has been sent to the swap partition, then it will have to be reloaded from that partition, which means that some other data will have to be sent to the said partition, because if the data was in the swap and not in RAM then this means that memory is tight. In the presence of swapping, a GC tends to imply an awful lot of disk activity.

Some GC apply intricate strategies to reduce swap-related strategies. This includes generational GC (which try to explore old data less often) and strict typing (the GC looks at data because it needs to locate pointers; if it knows that a big chunk of RAM contains only non-pointers, e.g. it is some picture data with only pixel values, then it can leave it alone, and in particular not force it back from the swap area). The GC in the Java virtual machine (the one from Sun/Oracle) is known to be quite good at that. But that's only relative: if your Java application hits swap, then you will suffer horribly. But it could have been much worse.

Just buy some extra RAM.

Thomas Pornin