calloc -- Usefulness of zeroing out memory

views:

537

answers:

+7 Q:

calloc -- Usefulness of zeroing out memory

Hi,

What is the advantage of zeroing out memory (i.e. calloc over malloc)? Won't you change the value to something else anyways?

-Chris

+1 A:

SB 2010-02-13 02:12:00

As I said in another comment: `calloc` to make pointers set to `NULL` is useless because it's not guaranteed by the standard that all-bits-zero == `NULL`.

Alok 2010-02-13 02:55:01

+3 A:

Assume you want to write a counting sort implementation, or depth first search a graph and keep track of visited vertices. You'll update your memory as the algorithm runs (rather than assigning a value just once). You need to initialize it to zero at the beginning. If you didn't have calloc, you'd have to manually go through it and initialize it to zero at the beginning of your algorithm. calloc can potentially do this more efficiently for you.

Mehrdad Afshari 2010-02-13 02:12:20

Upvoting because the rest of the answers only mention the bug-finding or short-cutting advantages (which are great info, too). They don't mention (what is probably) the original intended purpose of calloc. Sometimes you have an algorithm that allocates memory, and requires that data to be initialized to zero. Simple as that. Just because you'll eventually store a different value doesn't mean that zero isn't the correct initial value.

Merlyn Morgan-Graham 2010-02-13 02:54:38

@Merlyn: for some value of "the rest of the answers" :-)

Alok 2010-02-13 03:01:05

Oh, sorry. I didn't realize you were still talking! ;)

Merlyn Morgan-Graham 2010-02-13 03:15:38

LOL. I am not dead yet. :-)

Alok 2010-02-13 03:40:08

+7 A:

By knowing what value is already there, a programmer can take some shortcuts and make certain optimizations. Most frequently, callocing a structure with pointers: they are initialized to NULL.
What if the programmer forgot to initialize something in the allocation? Instead of random stuff, zero is a great default value.

In a realtime process control system I worked on long ago, we settled on having the power-on logic initialize all of RAM to 0xCC, the 8086's interrupt 3 instruction. This would cause the processor to enter the monitor (a primitive debugger) if it somehow executed uninitialized memory. (Unhelpfully, the 8086 merrily executes memory containing zeros since they are add [bx+si],al instructions. Even 32-bit mode causes them to be add [ax],al instructions.)

I don't recall if we ever found a runaway program, but the values corresponding to 0xCC in various values: 52,428 (unsigned 16 bit), -19,660 (signed 16 bits), -107374176 (32-bit float), and -9.25596313493e+61 (64-bit float) popped up in a lot of unexpected places. Also, some code expecting characters to be 7 bit ASCII—that is, a bug—alerted us to its presence when it tried to process 0xCC.

wallyk 2010-02-13 02:13:04

`calloc` to make pointers set to `NULL` is useless because it's not guaranteed by the standard that all-bits-zero == `NULL`. Similarly for floating-points.

Alok 2010-02-13 02:53:51

That hasn't been true since the late 1970s. All important architectures use `(void *) 0` as NULL, and the IEEE floating point formats all use "all bits zero" as true zero.

wallyk 2010-02-13 03:06:53

`(void *)0` is not necessarily all-bits-zero. The compiler must do the translation of `(void *)0` to the appropriate "null pointer constant". Similarly, when you write `p = 0;` and `p` is a pointer, the compiler must set `p` to a bit-pattern that's equal to the null pointer constant, which may not be all bits zero.

Alok 2010-02-13 03:09:35

Similarly integer types (other than `unsigned char` and the C99 fixed-size types (u)intN_t), are permitted to have padding bits, setting which to 0 can be a trap representation (for a far-fetched example if there's an inverse-parity bit, then all-bits-zero is a parity violation). Unlikely, downright evil, but if you live by the standard then you can die by the standard...

Steve Jessop 2010-02-13 03:13:18

Anyone have real world examples of architectures that don't use all bits zero as null?

Kevin Gale 2010-02-13 03:20:06

http://c-faq.com/null/machexamp.html has some examples.

Alok 2010-02-13 03:28:25

@Steve: thanks.

Alok 2010-02-13 03:33:08

@Alok, got any modern examples of of the non-zero null? For reference here are the dates I came up with for thos in that list: Honeywell-bull = early 1990s; Prime-50 = 1970s; Eclipse MV = 1980s; CDC Cyber 180 = 1984; old HP 3000 = 1970s; Symbolics Lisp Machine = 1980s; "some" 64-bit Cray machines... The era is unknown, but likely early models therefore 1980s, early 1990s; 8086 - this one is inaccurate in its mention. 16 bit mode was "SEGMENT:OFFSET" each 16bits in size, offset always started at 0 and was analogous to a pointer.

Jason D 2010-02-13 04:02:45

@Alok: Prime is hardly a modern architectural example, and Data Generals were obsolete in the 1980s. I'd be surprised if there was a real C compiler for them. In fact, all the cited examples are either rarely seen (Cray) or long obsolete.

wallyk 2010-02-13 04:06:35

@wallyk and @Jason: the fact is that there were architectures with not-all-bits-zero NULL pointers. We don't know if there won't be any in the future, though. Basically, I find it easy to write code without the assumption, so I do: so you are right that the assumption is valid for current computers, but since it's not guaranteed by the standard, and since I can easily get the same effect portably very easily, I never feel the need to rely on all-bits-zero == 0 for pointers or for floating-point values.

Alok 2010-02-13 05:54:52

Obsolete or not, I saw a Data General Eclipse humming away in a modern data centre only recently - presumably still faithfully running something of some importance. It did look somewhat out of place.

caf 2010-02-13 06:15:31

Interesting to see what different people's idea of "old" is, too. You might have a different opinion of computers from the early 90s if you were programming in the early 90s, than if your idea of obsolete tech is Pentium D ;-)

Steve Jessop 2010-02-13 11:49:11

The binary representation of NULL does not depend on the architecture (AFAIK), but is the compiler vendor's choice. The old Watcom C Compiler I've used in the past (on x86) represented a NULL-pointer with 0xffffffff.

Secure 2010-02-13 15:30:49

In addition to the benefits of initializing variables calloc also helps track down bugs.

If you accidently use a bit of the allocated memory without properly initializing it the application will always fail the same way. For example with a access violation from a null pointer. With malloc the memory has random values and this can cause the program to fail in random ways.

Random failures are very hard to track down and calloc helps avoid those.

Kevin Gale 2010-02-13 02:25:55

+6 A:

There are two camps: one says that initializing variables when they are declared helps find bugs. The people in this camp make sure everything they declare is initialized. They initialize pointers to NULL, ints to 0, etc. The idea is that everything is determinate, and when they see a NULL-pointer in a debugger, they immediately know it wasn't set properly. It can also help your program crash during testing because of NULL-pointer dereferencing rather than mysteriously crashing in production runs.

The other camp says that initializing variables at declaration makes things harder to debug, because now a compiler can't warn you about variables "used without being set".

Without telling you my personal preference¹: if you belong to the first camp, you would want to calloc() instead of malloc(). If you belong to the second camp (which apparently you do) then you prefer malloc() over calloc().

Now there are two exceptions:

If you belong to the "initialize everything" camp, you don't calloc() but malloc() because you are initializing floating-point numbers or pointers, and you know that all bits zero doesn't necessarily mean 0 for them. Or, you don't want the extra overhead.
If you belong to the "set when you need to" camp, you may want to calloc() when you are allocating some data and want it to be all zeroes. For example, if you want to calculate the row-wise sum of an n by m dynamically allocated int data.

¹ You can see my answers to many of the questions here on SO to see which camp I belong to :-).

Alok 2010-02-13 02:42:24

I'd like you to name a computer made this side of 1970 that doesn't use zeros for NULL pointers or zeroed out floating point values. Not to mention, the last few years I've programmed Windows and Linux, a simple malloc does zero the memory. I believe it's supposed to be a security feature.

Arthur Kalliokoski 2010-02-13 02:46:37

All the IEEE floating point formats define the special case of "all bits zero" represents true zero.

wallyk 2010-02-13 02:52:18

It's more a question of what is guaranteed by the standard. If I really want `NULL` pointers, I will set them to `NULL` in a loop.

Alok 2010-02-13 02:56:23

@akillio: that's almost a philosophical question, though: do you program to the standard, or do you program to the implementations you know about? Personally, I try to program to the standard *if* I'm writing supposedly-portable code: it's easier than learning about every computer made this side of 1970. I have in the past been genuinely surprised by compilers doing something I thought "never happens". One example which springs to mind is an ARM ABI with middle-endian data types, ffs. Obviously 0 was still bitwise 0, so not the example you asked for.

Steve Jessop 2010-02-13 02:56:47

http://c-faq.com/null/machexamp.html

Alok 2010-02-13 02:59:53

Also, I vaguely recall that on linux at least, malloc only clears the memory the *first* time it hands that block to user-level code in the process. If an allocation is freed and then assigned again, all in the same process, it could be non-zero. But I can't remember why I think that. If I'm right, it just goes to show you shouldn't count on "well-known" but unspecified behavior, and if I'm wrong it just goes to show that *I* shouldn't count on "well-known" behavior that I can't find any documentation to support, because my memory is dodgy ;-)

Steve Jessop 2010-02-13 03:02:04

@Steve: The `brk()` or `sbrk()` system calls (the Linux system call to expand memory allocation) returns zeroed memory for security reasons. The heap manager requests additional memory from sbrk(). But when a local allocation frees something, the heap manager probably doesn't clear it out before a subsequent allocation—only sets the header information to manage it.

wallyk 2010-02-13 03:10:16

"you are initializing floating-point numbers or pointers". Or `int` (see my comment on wallyk's answer). I say if you're going to be pedantic, be *really* pedantic.

Steve Jessop 2010-02-13 03:18:00

Upvote for objectively presenting an interesting and relevant controversy :)

Merlyn Morgan-Graham 2010-02-13 03:19:45

@Merlyn: thanks :-)

Alok 2010-02-13 03:41:46

@Steve Jessop: do you have a pointer to information on the 'middle-endian' data types? That sounds terrible and interesting (well at least a bit) at the same time.

Michael Burr 2010-02-13 04:24:06

@Michael: http://catb.org/jargon/html/M/middle-endian.html - but it has very little information.

Alok 2010-02-13 05:57:11

@Michael Burr: well, one day our compiler team started running around like headless chickens, because one of the ARM calling conventions they had to deal with had the bytes within each word of a double in little-endian order, but the most significant of the two words first. uint64_t didn't do the same, it was regular LE. There's a mention of it here: http://www.khronos.org/registry/kode/extensions/KHR/float64.html. It applies to ARM cores with the old FPA floating-point unit: http://wiki.debian.org/ArmEabiPort

Steve Jessop 2010-02-13 12:00:06

I really like the comments on this question (both my answer and wallyk's answer). Thanks everyone!

Alok 2010-02-13 16:53:23

First of all you cannot calloc pointers, at least not if you want to follow standard C.

Second, bugs just becoming masked when you clobber the member with all zeros. It is much better practice to have a debug version of malloc that initialises the memory to something that will always crash, such as 0xCDCDCDCD.

Then when you see an Access voilation you know the problem straight away. It is also beneficial to have debug free function that will whip thje memory with a different pattern so those who touch the memory after it is freed get an unexpected surprise.

Working on an embedded system, callocing just to "be sure" is usually not an option. You typically allocate and populate in one go so calloc just mens you are double touching memory.

TGF 2010-03-31 03:18:29

ansaurus

tags:

views:

answers:

calloc -- Usefulness of zeroing out memory

related questions