views:

75

answers:

2

This happens on a 2.6.26-2-amd64 Linux kernel when trying to mmap a 5GB file with copy-on-write semantics ( PROT_READ | PROT_WRITE and MAP_PRIVATE). Mapping files smaller than 4GB or using only PROT_READ works fine. This is not a soft resource limit issue as reported in this question; the virtual limit size is unlimited.

Here is the code that reproduces the problem (the actual code is part of Boost.Interprocess).

#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>

#include <fcntl.h>
#include <unistd.h>

main()
{
        struct stat b;
        void *base;
        int fd = open("foo.bin", O_RDWR);

        fstat(fd, &b);
        base = mmap(0, b.st_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
        if (base == MAP_FAILED) {
                perror("mmap");
                return 1;
        }
        return 0;
}

and here is what happens:

dd if=/dev/zero of=foo.bin bs=1M seek=5000 count=1
./test-mmap
mmap: Cannot allocate memory

Here is the relevant strace (freshly compiled 4.5.20) output, as asked by nos.

open("foo.bin", O_RDWR)                 = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=5243928576, ...}) = 0
mmap(NULL, 5243928576, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = -1 ENOMEM (Cannot allocate memory)
dup(2)                                  = 4
[...]
write(4, "mmap: Cannot allocate memory\n", 29mmap: Cannot allocate memory
) = 29
+4  A: 

Try passing MAP_NORESERVE in the flags field like this:

mmap(NULL, b.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_NORESERVE, fd, 0);

It's likely the combination of your swap and physical memory are less than the 5GB requested.

Alternatively you can do this for testing purposes, if it works, you can make the code change above:

# echo 0 > /proc/sys/vm/overcommit_memory

Below are the relevant extracts from the manual pages.

mmap(2):

   MAP_NORESERVE
          Do  not reserve swap space for this mapping.  When swap space is
          reserved, one has the guarantee that it is  possible  to  modify
          the  mapping.   When  swap  space  is not reserved one might get
          SIGSEGV upon a write if no physical memory  is  available.   See
          also  the  discussion of the file /proc/sys/vm/overcommit_memory
          in proc(5).  In kernels before 2.6, this flag  only  had  effect
          for private writable mappings.

proc(5):

   /proc/sys/vm/overcommit_memory
          This file contains the kernel virtual  memory  accounting  mode.
          Values are:

                 0: heuristic overcommit (this is the default)
                 1: always overcommit, never check
                 2: always check, never overcommit

          In  mode 0, calls of mmap(2) with MAP_NORESERVE are not checked,
          and the default check is very weak, leading to the risk of  get‐
          ting a process "OOM-killed".  Under Linux 2.4 any non-zero value
          implies mode 1.  In mode 2  (available  since  Linux  2.6),  the
          total  virtual  address  space on the system is limited to (SS +
          RAM*(r/100)), where SS is the size of the swap space, and RAM is
          the  size  of  the physical memory, and r is the contents of the
          file /proc/sys/vm/overcommit_ratio.
Matt Joiner
This is it! Copy on write requires corresponding backing store for the whole file. It isn't psychic to know that I only intend to modify a few pages. I'll patch Boost and I'll contact the author to propose a corresponding option.
Diomidis Spinellis
@Diomidis Spinellis: Let us know how the request goes. I imagine the authors intentionally didn't put this in, either because the mmap code they provide is POSIX or not Linux specific, or to avoid OOM (which is why this wouldn't be the default).
Matt Joiner
+1  A: 

Quoting your memory, swap size and overcommit settings from your comment:

MemTotal: 4063428 kB SwapTotal: 514072 kB
$ cat /proc/sys/vm/overcommit_memory
0
$ cat /proc/sys/vm/overcommit_ratio 
50

With overcommit_memory set to 0 ("heuristic overcommit"), you can't create a private, writeable mapping that's larger than the current free memory and swap total - clearly, since you only have 4.5GB of memory + swap, that can never be true.

Your options are either to use MAP_NORESERVE (as Matt Joiner suggests), if you're sure that you'll never dirty (write to) more pages in the mapping than you have free memory and swap for; or to significantly increase the size of your swap space.

caf