views:

1666

answers:

4

At work one of our target platforms is a resource constrained mini-server running Linux (kernel 2.6.13, custom distribution based on an old Fedora Core). The application is written in Java (Sun JDK 1.6_04). The Linux OOM killer is configured to kill processes when memory usage exceeds 160MB. Even during high load our application never go over 120MB and together with some other native processes that are active we stay well within the OOM limit.

However, it turns out that the Java Runtime.getRuntime().exec() method, the canonical way to execute external processes from Java, has a particularly unfortunate implementation on Linux that causes spawned child processes to (temporarily) require the same amount of memory as the parent process since the address space is copied. The net result is that our application gets killed by the OOM killer as soon as we do Runtime.getRuntime().exec().

We currently work around this by having a separate native program do all external command execution and we communicate with that program over a socket. This is less than optimal.

After posting about this problem online I got some feedback indicating that this should not occur on "newer" versions of Linux since they implement the posix fork() method using copy-on-write, presumably meaning it will only copy pages that it needs to modify when it is required instead of the entire address space immediately.

My questions are:

  • Is this true?
  • Is this something in the kernel, the libc implementation or somewhere else entirely?
  • From what version of the kernel/libc/whatever is copy-on-write for fork() available?
A: 

1: Yes. 2: This is divided into two steps: Any system call like fork() is wrapped by the glibc to the kernel. The kernel part of the system-call is in kernel/fork.c 3: I don't know. But I would bet that your kernel has it.

The OOM killer kicks in when Low memory is threatened on 32bit boxes. I've never had an issue with this, but there are ways to keep OOM at bay. This problem could be some OOM configuration issue.

Since you are using a Java application, you should consider moving to 64bit Linux. That should definitely fix it. Most 32bit apps can run on a 64bit kernel with no issues as long as relevant libraries are installed.

You could also try the PAE kernel for 32 bit fedora.

Bash
+2  A: 

Well, I personally doubt that this is true, since Linux's fork() is done via copy-on-write since God knows when (at least, 2.2.x kernels had it, and it was somewhere in the 199x).

Since OOM killer is believed to be a rather crude instrument which is known to misfire (f.e., it does not necessary kills the process that actually allocated most of the memory) and which should be used only as a last resport, it is not clear to me why you have it configured to fire on 160M.

If you want to impose a limit on memory allocation, then ulimit is your friend, not OOM.

My advice is to leave OOM alone (or disable it altogether), configure ulimits, and forget about this problem.

ADEpt
Thanks for the information, we sadly have no control over the platform: the hardware, OS and configuration of it are mostly set in stone (we "just" deploy on it so to say).
Boris Terzic
+1  A: 

Yes, this absolutely is the case with even new versions of Linux (we're on 64-bit Red Hat 5.2). I have been having a problem with slow running subprocesses for about 18 months, and could never figure out the problem until I read your question and ran a test to verify it.

We have a 32 GB box with 16 cores, and if we run the JVM with settings like -Xms4g and -Xmx8g and run subprocesses using Runtime.exec() with 16 threads, we are not be able to run our process faster than about 20 process calls per second.

Try this with the simple "date" command in Linux about 10,000 times. If you add profiling code in to watch what is happening, it starts off quickly but slows down over time.

After reading your question, I decided to try lowering my memory settings to -Xms128m and -Xmx128m. Now our process runs at about 80 process calls per second. The JVM memory settings was all I changed.

It doesn't seem to be sucking up memory in such a way that I ever ran out of memory, even when I tried it with 32 threads. It's just the extra memory has to be allocated in some way, which causes a heavy startup (and maybe shutdown) cost.

Anyway, it seems like there should be a setting to disable this behavior Linux or maybe even in the JVM.

chrism
A: 

This is pretty much the way *nix (and linux) have worked since the dawn of time(or atleat the dawn of mmus).

To create a new process on *nixes you call fork(). fork() creates a copy of the calling process with all its memory mappings, file descriptors, etc. The memory mappings are done copy-on-write so (in optimal cases) no memory is actually copied, only the mappings. A following exec() call replaces the current memory mapping with that of the new executable. So, fork()/exec() is the way you create a new process and that's what the JVM uses.

The caveat is with huge processes on a busy system, the parent might continue to run for a little while before the child exec()'s causing a huge amount of memory to be copied cause of the copy-on-write. In VMs , memory can be moved around a lot to facilitate a garbage collector which produces even more copying.

The "workaround" is to do what you've already done, create an external lightweight process that takes care of spawning new processes - or use a more lightweight approach than fork/exec to spawn processes (Which linux does not have - and would anyway require a change in the jvm itself). Posix specifies the posix_spawn() function, which in theory can be implemented without copying the memory mapping of the calling process - but on linux it isn't.

nos