tags:

views:

483

answers:

13
+10  Q: 

zeroing out memory

Hello,

gcc 4.4.4 c89

I am just wondering what most c programmers do when they want to zero out memory.

For example I have a buffer of 1024 bytes. Sometimes I do this:

char buffer[1024] = {0};

Which will zero all bytes.

However, should I declare like this and use memset?

char buffer[1024];
.
.
memset(buffer, 0, sizeof(buffer);

Is there any real reason you have to zero the memory? What is the worst that can happen by not doing it?

Many thanks for any suggestions,

+3  A: 

The worst that can happen by not doing it is that you write some data in character by character and later interpret it as a string (and you didn't write a null terminator). Or you end up failing to realise a section of it was uninitialised and read it as though it were valid data. Basically: all sorts of nastiness.

Memset should be fine (provided you correct the sizeof typo :-)). I prefer that to your first example because I think it's clearer.

For dynamically allocated memory, I use calloc rather than malloc and memset.

Vicky
+3  A: 

Depends how you're filling it: if you're planning on writing to it before even potentially reading anything, then why bother? It also depends what you're going to use the buffer for: if it's going to be treated as a string, then you just need to set the first byte to \0:

char buffer[1024];
buffer[0] = '\0';

However, if you're using it as a byte stream, then the contents of the entire array are probably going to be relevant, so memseting the entire thing or setting it to { 0 } as in your example is a smart move.

Samir Talwar
char buffer[1024] = "" zeros the *entire* buffer, which could make it an unexpectedly expensive operation.
Joseph Quinsey
@Joseph: So it does. I had no idea. I'll edit my answer to reflect that.
Samir Talwar
+1  A: 

This post has been heavily edited to make it correct. Many thanks to Tyler McHenery for pointing out what I missed.

char buffer[1024] = {0};

Will set the first char in the buffer to null, and the compiler will then expand all non-initialized chars to 0 too. In such a case it seems that the differences between the two techniques boil down to whether the compiler generates more optimized code for array initialization or whether memset is optimized faster than the generated compiled code.

Previously I stated:

char buffer[1024] = {0};

Will set the first char in the buffer to null. That technique is commonly used for null terminated strings, as all data past the first null is ignored by subsequent (non-buggy) functions that handle null terminated strings.

Which is not quite true. Sorry for the miscommunication, and thanks again for the corrections.

Edwin Buck
`char buffer[1024] = {0}` does indeed set *all* elements to `\0`. There's a rule where if you give an array an initializer but the initializer is shorter than the array, the rest of the array is set to zero bytes.
Tyler McHenry
The first one will set the whole buffer to 0 too, not just the first char.
Vicky
Careful with your vocabulary. `NULL` is not necessarily equal to 0.
Matt B.
+2  A: 

I prefer using memset to clear a chunk of memory, especially when working with strings. I want to know without a doubt that there will be a null delimiter after my string. Yes, I know you can append a \0 on the end of each string and some functions do this for you, but I want no doubt that this has taken place.

A function could fail when using your buffer, and the buffer remains unchanged. Would you rather have a buffer of unknown garbage, or nothing?

Heather
The first example is not compiler dependent - it's standard C.
Vicky
@Vicky - Thanks for the correction, I wasn't 100% sure on the first example.
Heather
Anonymous downvoter, if something I've said in my answer is incorrect and not what I have already rectified, I would really like to know what it is.
Heather
Is there a reason you prefer `memset` to `={0}`?
avakar
@avakar - To be honest, I wasn't fully aware of what `={0}` did. Learning C in college, this syntax was never used and never taught, we were always taught to always use `memset`. After reading the responses here, I have no logical reason to use one over the other, and may start using the first one to avoid a call to `memset`. Not sure if that would be any less expensive...
Heather
It is not about "expensive"/"inexpensive". It is about the simple fact that `memset` is a generally *invalid* approach, while `= { 0 }` is always valid.
AndreyT
A: 

I also use memset(buffer, 0, sizeof(buffer));

The risk of not using it is that there is no guarantee that the buffer you are using is completely empty, there might be garbage which may lead to unpredictable behavior.

Always memset-ing to 0 after malloc, is a very good practice.

LoudNPossiblyRight
A: 
 char buffer[1024] = {0};

I don't think that zeroes all the bytes so much as it puts a zero in the first element and leaves the other 1023 elements the way they were before, probably zero anyway.

Whether or not you zero out the buffer (with memset() or bzero()) depends entirely on what you want to do with the buffer and whether that application can handle a buffer that starts out with crap in it. If you are just going to start out by filling the buffer anyway, say with image data, then don't bother to initialize it.

Vagrant
`char buffer[1024] = {0}` does indeed set *all* elements to `\0`. There's a rule where if you give an array an initializer but the initializer is shorter than the array, the rest of the array is set to zero bytes
Tyler McHenry
Quoting from K if there are fewer, the trailing members are initialized with 0.`
jamessan
+10  A: 

I vastly prefer

char buffer[1024] = { 0 };

It's shorter, easier to read, and less error-prone. Only use memset on dynamically-allocated buffers, and then prefer calloc.

JSBangs
It's shorter, easier to type, and less prone to error. However, it isn't as obvious if you forget the rules for non-explicitly initialized elements of arrays (I admit being one of the guilty). So the alternative might introduce error if you pick the wrong bounds, but even a person who hasn't had his morning coffee won't be led astray by what a memset() is doing. Does this mean coding to a lower denominator of programmer? Perhaps, but my defensive style never allows for much "default" initialization. If you don't use default initialization, you forget the rules.
Edwin Buck
@Edwin Buck: If you forget the rules for non-explicitly initialized elements, what's the worst that will happen? This isn't really a downside.
jamesdlin
The worst that will happen is that someone else will mess with working (and correct) code thinking that only the first element is being initialized. It's not that the code will contain any issues as you write it, it's that the "other" person might mess it up trying to "help" you in fixing something that's not broken.
Edwin Buck
@Edwin Buck: And even then, it's likely that the other programmer will write a correct version with `memset`, which is likely to be fine anyway. Futhermore, other programmers in general shouldn't change things that aren't broken, and if *you* don't use it in your code, then you're perpetuating the problem of other people not being as familiar with the idiom.
jamesdlin
The rule is very simple to remember, once you know it: "Objects are never partially initialised in C". So anything (array, struct, etc) is either completely initialised, or not initialised at all.
caf
It makes a lot more sense to write "char buffer[1024] = {}" - then there is less confusion, everything will still be initialized, but no one will think that only the first element is initialized.
Stefan Monov
@Stefan Monov: I think that's legal only in C++, not in standard C.
jamesdlin
+4  A: 

When you define char buffer[1024] without initializing, you're going to get undefined data in it. For instance, Visual C++ in debug mode will initialize with 0xcd. In Release mode, it will simply allocate the memory and not care what happens to be in that block from previous use.

Also, your examples demonstrate runtime vs. compile time initialization. If your char buffer[1024] = { 0 } is a global or static declaration, it will be stored in the binary's data segment with its initialized data, thus increasing your binary size by about 1024 bytes (in this case). If the definition is in a function, it's stored on the stack and is allocated at runtime and not stored in the binary. If you provide an initializer in this case, the initializer is stored in the binary and an equivalent of a memcpy() is done to initialize buffer at runtime.

Hopefully, this helps you decide which method works best for you.

spoulson
Exactly what I was supposed to post. The difference is the produced binary output since!
Robert
A: 

I'm not familiar with the:

char buffer[1024] = {0};

technique. But assuming it does what I think it does, there's a (potential) difference to the two techniques.

The first one is done at COMPILE time, and the buffer will be part of the static image of the executable, and thus be 0's when you load.

The latter will be done at RUN TIME.

The first may incur some load time behaviour. If you just have:

char buffer[1024];

the modern loaders may well "virtually" load that...that is, it won't take any real space in the file, it'll simply be an instruction to the loader to carve out a block when the program is loaded. I'm not comfortable enough with modern loaders say if that's true or not.

But if you pre-initialize it, then that will certainly need to be loaded from the executable.

Mind, neither of these have "real" performance impacts in the small. They may not have any in the "large". Just saying there's potential here, and the two techniques are in fact doing something quite different.

Will Hartung
This isn't true. `char buffer[1024] = { 0 };` is allocated on the stack at runtime. The compiler may even translate this into a call to `memset`.
JSBangs
`char buffer[1024] = {0}` can only potentially be done at compile time if `buffer` is global or static (and then it would be done automatically even if you left off the initializer). If `buffer` is a local variable, it's on the stack, which means that it must be initialized to zero at run-time, since the contents of the stack at the time of a function call are undetermined until it actually happens.
Tyler McHenry
+5  A: 

The worst that can happen? You end up (unwittingly) with a string that is not NULL terminated, or an integer that inherits whatever happened to be to the right of it after you printed to part of the buffer. Yet, unterminated strings can happen other ways, too, even if you initialized the buffer.

Edit (from comments) The end of the world is also a remote possibility, depending on what you are doing.

Either is undesirable. However, unless completely eschewing dynamically allocated memory, most statically allocated buffers are typically rather small, which makes memset() relatively cheap. In fact, much cheaper than most calls to calloc() for dynamic blocks, which tend to be bigger than ~2k.

c99 contains language regarding default initialization values, I can't, however, seem to make gcc -std=c99 agree with that, using any kind of storage.

Still, with a lot of older compilers (and compilers that aren't quite c99) still in use, I prefer to just use memset()

Tim Post
"What's the worst that can happen"? A non-null-terminated string or buffer overflow can overwrite key data, smash your stack pointers, lead to security holes, get you fired, eat your children, cause famine in Africa, incite nuclear war, and summon the dread Cthulhu. The wise programmer protects himself from them with every weapon he has.Also, I'm pretty sure that no commercial C compiler actually implements that part of the spec except maybe in debug builds. Uninitialized variables get uninitialized memory.
JSBangs
@JS Bangs - I can assure you that my children and warheads will not suffer from this fate. On an unrelated note, how much coffee or overly caffeinated drinks (in liters or gallons) have you consumed in the last 24 hours?
Tim Post
@JS Bangs can you define 'unlimited' ? especially for static allocation?
Tim Post
@Tim: My copy of C99 has no section 8.5.1 - can you post a snippet of what it might say about initializing to something consistent? As far as I know, only statically allocated objects get initialized to zero; automatic variables (most locals) do not get the promise of any kind of initialization in C unless explicitly initialized by the programmer.
Michael Burr
@Michael Burr - I misquoted. I'll find the exact passage and edit again once I have. I _know_ I read that uninitialized storage (it could have been specific to static) shall be initialized by the compiler in a consistent manner. 8.5.1 does deal with initialization of members, see http://gcc.gnu.org/ml/gcc-bugs/2000-07/msg00639.html , but I can't find the reference that I'm looking for.
Tim Post
@Tim: the 8.5.1 reference is from the C++ standard (I should have guessed). As far as C99 goes, 6.7.8/10 (Initialization) says, "If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate". That part of the standard then goes on to detail how static duration objects without explicit initialization are initialized (which is basically to set pointers to the null pointer value and zero initialize everything else).
Michael Burr
A: 

yup, calloc() method defined in stdlib.h allocates memory initialized with zeros.

Adil Butt
+2  A: 

In this particular case, there's not much difference. I prefer = { 0 } over memset because memset is more error-prone:

  • It provides an opportunity to get the bounds wrong.
  • It provides an opportunity to mix up the arguments to memset (e.g. memset(buf, sizeof buf, 0) instead of memset(buf, 0, sizeof buf).

In general, = { 0 } is better for initializing structs too. It effectively initializes all members as if you had written = 0 to initialize each. This means that pointer members are guaranteed to be initialized to the null pointer (which might not be all-bits-zero, and all-bits-zero is what you'd get if you had used memset).

On the other hand, = { 0 } can leave padding bits in a struct as garbage, so it might not be appropriate if you plan to use memcmp to compare them later.

jamesdlin
+1  A: 

One of the things that can happen if you don't initialize is that you run the risk of leaking sensitive information.

Uninitialized memory may have something sensitive in it from a previous use of that memory. Maybe a password or crypto key or part of a private email. Your code may later transmit that buffer or struct somewhere, or write it to disk, and if you only partially filled it the rest of it still contains those previous contents. Certain secure systems require zeroizing buffers when an address space can contain sensitive information.

progrmr