At work one of our target platforms is a resource constrained mini-server running Linux (kernel 2.6.13, custom distribution based on an old Fedora Core). The application is written in Java (Sun JDK 1.6_04). The Linux OOM killer is configured to kill processes when memory usage exceeds 160MB. Even during high load our application never go over 120MB and together with some other native processes that are active we stay well within the OOM limit.
However, it turns out that the Java Runtime.getRuntime().exec() method, the canonical way to execute external processes from Java, has a particularly unfortunate implementation on Linux that causes spawned child processes to (temporarily) require the same amount of memory as the parent process since the address space is copied. The net result is that our application gets killed by the OOM killer as soon as we do Runtime.getRuntime().exec().
We currently work around this by having a separate native program do all external command execution and we communicate with that program over a socket. This is less than optimal.
After posting about this problem online I got some feedback indicating that this should not occur on "newer" versions of Linux since they implement the posix fork() method using copy-on-write, presumably meaning it will only copy pages that it needs to modify when it is required instead of the entire address space immediately.
My questions are:
- Is this true?
- Is this something in the kernel, the libc implementation or somewhere else entirely?
- From what version of the kernel/libc/whatever is copy-on-write for fork() available?