views:

147

answers:

9

I need to zero records of varying sizes in a file. To do this, I'm currently allocating dummy records, memseting them to zero, and passing these to a write function.

Is there some region which is guaranteed to always be zeroed (and of a large enough size), that I can instead point to, removing the need for repeated allocation and zeroing of memory?

A: 

get system page size with system information API(I just can't remember the exact name), allocate 1 page of memory, set it to zero, write it sequentially over and over.

Behrooz
**Do not** set it to zero. It will naturally be zero and will not use any physical memory as long as you leave it that way (unwritten), but it will use physical memory if you write anything to it. Best would be to `mmap` it read-only.
R..
@R:I don't know much about infrastructure of memory allocation functions,but I don't think allocating one page affects anything.
Behrooz
+1  A: 

Yes, just allocate a block big enough for any of these records, and zero it once. Pass the address of that block to your write function every time, with the size of the record you actually want to zero out. Passing a buffer to write doesn't make it expire or anything. Mind you, write also doesn't free the buffer you pass it; that's up to you.

Novelocrat
+1  A: 

If you want a large region of memory that is always zeroed, you should allocate it yourself and memset it to zero. No getting around that, but you should only have to do it once. Make sure it's at least as big as the biggest amount of zeroed memory you are going to need at any one time.

Then whenever you need to pass a pointer to a zeroed memory you can pass a pointer to within this block you allocated.

thomasrutter
+3  A: 

See calloc.

The calloc() function shall allocate unused space for an array of nelem elements each of whose size in bytes is elsize. The space shall be initialized to all bits 0.

Alternatively (I did not try this), if you do not want any allocation at all, you could open and/or mmap /dev/zero and read record_size blocks from it and write them to the file in which you are overwriting records.

Sinan Ünür
No I think you missed the point
Matt Joiner
@Matt How did I miss the point? Once you figure out the largest block of zeros you want, you can allocate it once with `calloc` and use that.
Sinan Ünür
A: 

The speed of the write function will be orders* of magnitude slower than the memset.

Profile it!

* even with Flash drives

Will
Oh I don't disagree, its the allocation/deallocation that I want to avoid.
Matt Joiner
+2  A: 

At least on linux allocating memory through mmap() will give you a zero-filled buffer. Downside is you can't just allocate the memory you need but only multiples of the page size

#include <unistd.h>
long sz = sysconf(_SC_PAGESIZE);
hroptatyr
+7  A: 

If there's a reasonable upper bound on the record size, allocate a global read-only variable containing zeros. (Since it's a static-duration object, it's automatically initialized to zero.)

const unsigned char zero_filled_buffer[MAX_RECORD_SIZE]; /*at file scope*/

If the write function is a C fwrite or POSIX write or other function, you can (must, for write) call it in a loop, so the buffer doesn't have to be as big as the biggest record, just as big as the biggest chunk you write at once.

Such a variable will take zero space in your executable file under typical hosted implementations. ADDED: Note that as far as the C standard is concerned, the declaration above is exactly equivalent to const unsigned char zero_filled_buffer[MAX_RECORD_SIZE] = {0}; however some compilers (including gcc) include the zeros in the executable if you explicitly add = {0} but not if you leave off the initializer.

A smart program loader on a system with virtual memory could take advantage of the virtual memory system to use a single shared read-only zero-filled page of physical RAM for all such objects; I don't know if any do in practice. ADDED: For example, Linux (Debian lenny amd64) doesn't.

An alternative POSIX approach is to mmap the file and call memset to zero-fill buffers.

Gilles
actually i could put `= {0}` on that i believe
Matt Joiner
@Matt Joiner: You could but it wouldn't make any difference.
Job
@Job: It wouldn't make any difference as far as the C standard is concerned. But I've just tested, and it does make a difference to gcc.
Gilles
Most compilers will write the object into the binary if you explicitly initialize it. This is not purely being stupid, but rather maintaining compatibility with some traditional tricks used in systems-level code where the author has an implementation-specific reason to want the object to be either in `.data` or `.bss`.
R..
You don't actually have to declare it as a file-scope identifier, either - if you want to declare it in your zero-ing function, just use `static const ...`
caf
@caf: Great input, thanks!
Matt Joiner
Guys care to look and comment on the answer I've posted below? http://stackoverflow.com/questions/3487836/pass-pointer-to-guaranteed-zeroed-memory/3490709#3490709 This answer is the neatest and fastest way to do what I requested in the question. It would be great if someone could confirm whether the compiler can optimize these regions to overlap, or at least put them in an unwritable page.
Matt Joiner
+1  A: 

As noted, you only have to allocate one time the largest region you'll ever need; you can pass that any time you need a region of that size or smaller.

In most implementations, there is no portion of the address space which is not mapped to RAM but when read will harmlessly read zero. Such a thing might be nice to have, but I'm unaware of one.

In some embedded systems, I've written flash-memory write routines so that, if given a null pointer, they will assume the source data is (depending upon the application) all FF, since I do sometimes need to clear out a chunk of a file, and having the final write code handle the null-pointer case means the code to find and allocate flash blocks can be shared between the write-meaningful-data case and the write-blank-data case. One caveat is that if one splits the write into multiple pieces, one must not add offsets to the null pointer before passing it to the I/O write.

supercat
A: 

Here's a guaranteed to work, run-time possibility (compile with gcc zeroed_mem_region.c -Wall -std=gnu99):

#include <sys/mman.h>
#include <assert.h>
#include <stdio.h>

size_t const zeroed_size = 512;
char const *zeroed;

int main()
{
    zeroed = mmap(
            NULL,
            zeroed_size,
            PROT_READ,
            MAP_PRIVATE|MAP_ANONYMOUS,
            -1,
            0);
    printf("zeroed region at %p\n", zeroed);
    for (size_t i = 0; i < zeroed_size; ++i) {
        assert(zeroed[i] == 0);
    }
    printf("testing for writability\n");
    ((char *)zeroed)[0] = 1;
    return 0;
}

Note that zeroed is char const * for testing, in reality this would be void const *.

Pros

  • Avoids use of malloc allocator
  • Guaranteed region unwritable (generates SIGSEGV)
  • Faster than use of malloc
  • Won't put crap in the executable
  • No need for memset step (check mmap(2))

Cons

  • Unix/Linux specific (anonymous mappings since Linux 2.4)
  • Lessens the possibility of compiler optimizations (though none apparently exist in this area)
Matt Joiner
Faster than malloc doesn't matter for a few kB. SIGSEV on write is an advantage over `const zeroed[]` (which gcc leaves writable on my machine); on the other hand `const zeroed[]` gives a compile-time error with gcc if you assign directly. No crap in the executable also applies to static duration `const zeroed[]`. Another pro is being able to choose the size at run time. `MAP_ANON` is not POSIX but widely available on modern unices. Neither `const zeroed[]` nor mmap are smart enough to allocate the same physical page over and over on my machine (Linux/amd64) (not an issue for 512B though).
Gilles
I've accepted this answer for a while because I believe it deserves some attention for the compiler shortcomings it addresses.
Matt Joiner