tags:

views:

818

answers:

11

A long time ago I used to program in C for school. I remember something that I really hated about C: unassigned pointers do not point to NULL.

I asked many people including teachers why in the world would they make the default behavior of an unassigned pointer not point to NULL as it seems far more dangerous for it to be unpredictable.

The answer was supposedly performance but I never bought that. I think many many bugs in the history of programming could have been avoided had C defaulted to NULL.

Here some C code to point out (pun intended) what I am talking about:

#include <stdio.h>

void main() {

  int * randomA;
  int * randomB;
  int * nullA = NULL;
  int * nullB = NULL;


  printf("randomA: %p, randomB: %p, nullA: %p, nullB: %p\n\n", 
     randomA, randomB, nullA, nullB);
}

Which compiles with warnings (Its nice to see the C compilers are much nicer than when I was in school) and outputs:

randomA: 0xb779eff4, randomB: 0x804844b, nullA: (nil), nullB: (nil)

+10  A: 

It's for performance.

C was first developed around the time of the PDP 11, for which 60k was a common maximum amount of memory, many will have had a lot less. Unnecessary assignments would be particularly expensive is this kind of environment

These days there are many many embedded devices that use C for which 60k of memory would seem infinite, the PIC 12F675 has 1k of memory.

David Sykes
I just don't get it though. Its getting a value from somewhere right? Somewhere its getting assigned. How could it be more costly for the runtime of C to point to NULL than make them assigned to some random value.
Adam Gent
Couldn't the compiler do some sort of optimization. Usually things that are invariant like always point to null are easier to optimize for.
Adam Gent
@Adam: The value was there before. It's just a reuse of a specific memory location.
tur1ng
It isn't needed to bring up embedded devices, just to remember that C was designed for a computer which allowed 64K of code and 64K of data at the same time. Other time, other constraints, other decisions.
AProgrammer
The runtime doesn't assign anything, just reuse what happen to be there.
AProgrammer
@tur1ng Now I remember the more specific reason. I wish future versions of C would change this and make it a compile time option. I cannot believe with high level languages like Haskell that can compile code that runs faster than C, C cannot have pointers that default to NULL.
Adam Gent
@Adam, BTW, compilers I use commonly are able to warn about use of uninitialized variables. Increase your warning level and fix what is found. Another thing, warnings and compile time options are out of scope of the standard, if you want them and don't have them, just lobby your compiler vendor.
AProgrammer
@Adam Gent: A given C implementation certainly could have pointers that defaulted to NULL, either normally or as a compile-time option. The Standard has no requirements for the values of certain uninitialized variables, so an implementation may do whatever it pleases in this case. BTW, are you claiming that compiled Haskell code is always faster than compiled C code, or only in your particular area of interest?
David Thornley
@Adam: The compiler doesn't init the pointer to a random value--it doesn't init it at all. (Technically, the value is not "random," but just difficult to predict.) Initializing a value to anything at all means the compiler has to generate (and the program has to execute) additional code, so it costs both space and time. C tries very hard to not waste space/time unless you tell it to.There is a bigger issue at play, though: If the value of an uninitialized variable is somehow relevant to the execution of your program, you're doing something wrong. :)
Casey Barker
@Casey Barker You hit the nail on the head. In Java and .NET land there are some people that use the keyword "final" on all there local variables to avoid this. I like how Scala makes a difference between "values" and "variables".
Adam Gent
@AProgrammer the PDP-11 on which C was developed had 24K bytes, the PDP-7 were B was developed had 8K 18-bit words. See http://cm.bell-labs.com/cm/cs/who/dmr/chist.html
ninjalj
@ninjalj, I was speaking of the architectural limit of PDP-11. Obviously not all systems were maxed out, and some models were even more limited that what the architecture would allow. My main point is that to understand C, you have to think about how things were at its time of rapid evolution (says till mid 80's).
AProgrammer
@David Thornley I'm not claiming Haskell is always faster than C. As a developer I never claim something "always" to be the case :)
Adam Gent
+5  A: 

This is because when you declare a pointer, your C compiler will just reserve the necessary space to put it. So when you run your program, this very space can already have a value in it, probably resulting of a previous data allocated on this part of the memory.

The C compiler could assign this pointer a value, but this would be a waste of time in most cases since you are excepted to assign a custom value yourself in some part of the code.

That is why good compilers give warning when you do not initialize your variables; so I don't think that there are so much bugs because of this behavior. You just have to read the warnings.

kbok
I believe he meant "many bugs" historically; apparently older C compilers were not as friendly as their contemporaries.
Kenny Evitt
imho any attempt to access the value of an uninitialized variable is a bug. it doesn't matter whether it's zero or random, you shouldn't be trying to read it if you didn't explicitly set it to something.
joefis
@joefis but its easier to find and understand the bug if its NULL and not some random value. This is particular useful if you are doing concurrent programming.
Adam Gent
@Adam Gent: hmm it depends on your perspective. Personally I would think that garbage was a more clear indicator that I forgot to set the variable, rather than NULL which I might have done on purpose...
joefis
+27  A: 

Actually, it depends on the storage of the pointer. Pointers with static storage are initizalized with null pointers. Pointers with automatic storage duration are not initialized. See ISO C 99 6.7.8.10:

If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static storage duration is not initialized explicitly, then:

  • if it has pointer type, it is initialized to a null pointer;
  • if it has arithmetic type, it is initialized to (positive or unsigned) zero;
  • if it is an aggregate, every member is initialized (recursively) according to these rules;
  • if it is a union, the first named member is initialized (recursively) according to these rules.

And yes, objects with automatic storage duration are not initialized for performance reasons. Just imagine initializing a 4K array on every call to a logging function (something I saw on a project I worked on, thankfully C let me avoid the initialization, resulting in a nice performance boost).

ninjalj
Certainly initializing a 4K array can't be that slow since languages like Java do this all the time (initialize for all references).You must have had a very hi-performance project
Adam Gent
Yes, performance was important on that project. Add to that the fact that ~99.99% of the time the logging function just checked its parameters against some flags in shared memory and see that logging was disabled and returned. Imagine my expression when I discovered that initialization was on one of the top 5 places of a cachegrind profile.
ninjalj
+5  A: 

Pointers are not special in this regard; other types of variables have exactly the same issue if you use them uninitialised:

int a;
double b;

printf("%d, %f\n", a, b);

The reason is simple: requiring the runtime to set uninitialised values to a known value adds an overhead to each function call. The overhead might not be much with a single value, but consider if you have a large array of pointers:

int *a[20000];
caf
Yes do to the procedural nature of C you frequently define variables before assigning them. So I can see how this becomes a performance problem.
Adam Gent
+15  A: 

Because in C, declaration and initialisation are deliberately different steps. They are deliberately different because that is how C is designed.

When you say this inside a function:

void demo(void)
{
    int *param;
    ...
}

You are saying, "my dear C compiler, when you create the stack frame for this function, please remember to reserve sizeof(int*) bytes for storing a pointer." The compiler does not ask what's going there - it assumes you're going to tell it soon. If you don't, maybe there's a better language for you ;)

Maybe it wouldn't be diabolically hard to generate some safe stack clearing code. But it'd have to be called on every function invocation, and I doubt that many C developers would appreciate the hit when they're just going to fill it themselves anyway. Incidentally, there's a lot you can do for performance if you're allowed to be flexible with the stack. For example, the compiler can make the optimisation where...

If your function1 calls another function2 and stores its return value, or maybe there are some parameters passed in to function2 that aren't changed inside function2... we don't have to create extra space, do we? Just use the same part of the stack for both! Note that this is in direct conflict with the concept of initialising the stack before every use.

But in a wider sense, (and to my mind, more importantly) it's aligned with C's philosophy of not doing very much more than is absolutely necessary. And this applies whether you're working on a PDP11, a PIC32MX (what I use it for) or a Cray XT3. It's exactly why people might choose to use C instead of other languages.

  • If I want to write a program with no trace of malloc and free, I don't have to! No memory management is forced upon me!
  • If I want to bit-pack and type-pun a data union, I can! (As long as I read my implementation's notes on standard adherence, of course.)
  • If I know exactly what I'm doing with my stack frame, the compiler doesn't have to do anything else for me!

In short, when you ask the C compiler to jump, it doesn't ask how high. The resulting code probably won't even come back down again.

Since most people who choose to develop in C like it that way, it has enough inertia not to change. Your way might not be an inherently bad idea, it's just not really asked for by many other C developers.

detly
malloc and delete? no conforming program has any trace of that! ;)
jk
I guess this is where my true annoyance with the language is. I prefer declarative immutable functional programming over mutable procedural languages like C.That being said I find it seriously ironic that C is the defactao language used for embedded programming. Today I think SAFETY should be far more important than performance (instead buy faster chips).I mean do you really want a memory leak in your elevators breaking system?
Adam Gent
@Adam Gent: Criticizing a language on the grounds that it isn't the sort of language you like is rather futile, isn't it? The reason C is heavily used for embedded programming is that it's efficient. If you're selling systems in the tens of millions, it's a lot cheaper to have engineers making the code safe than to have to spend an extra dollar each on a faster CPU or bigger program storage.
David Thornley
@Adam Gent - I want strict control over timing and memory in my high precision scientific instruments :) And it is safe, because I know at all times exactly what my code is doing - what I told it to do. Your definition of safety may vary. (Besides, find me a Haskell compiler for the PIC32MX series, or the dsPIC24.)
detly
You might want to change that to “`malloc` and `free`” there.
Donal Fellows
@detly touche.. touche... your point is taken and I didn't mean to offend. However if that is true why not program in assembly? It is 2010 right and chips are getting pretty cheap? When can we stop programming in C or do you believe C is that superior? Aren't we moving to a more concurrent architecture these days? It seems C is kind of weak in this area.
Adam Gent
Additionally, your frustration with the language is perfectly rational, but C has a context and a use like any other tool. The weakness of functional languages is that they are completely antithetical to maintaining, controlling and persisting **state**, which is exactly how many people design a control system.
detly
@jk - `malloc` and `free`! FREE! Arg! ... Can you tell I never use them? :P
detly
@Adam Gent - no offense, I don't want you to think you're being shouted down about it :) Anyway, assembly is a nightmare to read and write, and worse to maintain. Remember, it's *always* a practical consideration as to what language I use for the job - C is infinitely more readable, but still has a fairly close mapping to the same level of control (if desired). If a functional language were portable to my platform, I might like to try it, but I'd really need to rethink my design, and I'd *still* have to (re)learn all the low level details to make sure nothing blows up.
detly
In short... picking your language should not be done via [categorical imperative](http://en.wikipedia.org/wiki/Categorical_imperative) :)
detly
The upside of C is it lets you write code that works closer to the way the CPU works than some other languages. Something like ADA on a embedded system tends to have alot of undocumented features in the runtime support library to do all the hand holding it does for you. It practically includes it's own OS. That's not practical in all embedded systems. In the end not knowing how something is done can be more dangerous than having to do it yourself on an embedded system. i.e. When in doubt init your own pointers with NULL.
NoMoreZealots
@NoMoreZealots and @detly I will agree that C does map very well to how **stuff** really works and I am glad I learned the language as it helped me comprehend CPU architecture when I was in school.
Adam Gent
A: 

I think it comes from the following: there's no reason why memories should contain (when powered up) specific values (0, NULL or whatever). So, if not previously specifically written, a memory location can contain whatever value, that from your point of view, is anyway random (but that very location could have been used before by some other software, and so contain a value that was meaningful for that application, e.g. a counter, but from "your" point of view, is just a random number). To initialize it to a specific value, you need at least one instruction more; but there are situation where you don't need this initialization a priori, e.g. v = malloc(x) will assign to v a valid address or NULL, no matter the initial content of v. So, initializing it could be considered a waste of time, and a language (like C) can choose not to do it a priori. Of course, nowadays this is mainly insignificant, and there are languages where uninitialized variables have default values (null for pointers, when supported; 0/0.0 for numerical... and so on; lazy initialization of course make it not so expensive to initialize an array of say 1 million elements, since they are initialized for real only if accessed before an assignment).

ShinTakezou
+3  A: 

First, forced initialization doesn't fix bugs. It masks them. Using a variable that doesn't have a valid value (and what that is varies by application) is a bug.

Second, you can often do your own initialization. Instead of int *p;, write int *p = NULL; or int *p = 0;. Use calloc() (which initializes memory to zero) rather than malloc() (which doesn't). (No, all bits zero doesn't necessarily mean NULL pointers or floating-point values of zero. Yes, it does on most modern implementations.)

Third, the C (and C++) philosophy is to give you the means to do something fast. Suppose you have the choice of implementing, in the language, a safe way to do something and a fast way to do something. You can't make a safe way any faster by adding more code around it, but you can make a fast way safer by doing so. Moreover, you can sometimes make operations fast and safe, by ensuring that the operation is going to be safe without additional checks - assuming, of course, that you have the fast option to begin with.

C was originally designed to write an operating system and associated code in, and some parts of operating systems have to be as fast as possible. This is possible in C, but less so in safer languages. Moreover, C was developed when the largest computers were less powerful than the telephone in my pocket (which I'm upgrading soon because it's feeling old and slow). Saving a few machine cycles in frequently used code could have visible results.

David Thornley
+3  A: 

When you declare a (pointer) variable at the beginning of the function, the compiler will do one of two things: set aside a register to use as that variable, or allocate space on the stack for it. For most processors, allocating the memory for all local variables in the stack is done with one instruction; it figures out how much memory all the local vars will need, and pulls down (or pushes up, on some processors) the stack pointer by that much. Whatever is already in that memory at the time is not changed unless you explicitely change it.

The pointer is not "set" to a "random" value. Before allocation, the stack memory below the stack pointer (SP) contains whatever is there from earlier use:

         .
         .
 SP ---> 45
         ff
         04
         f9
         44
         23
         01
         40
         . 
         .
         .

After it allocates memory for a local pointer, the only thing that has changed is the stack pointer:

         .
         .
         45
         ff |
         04 | allocated memory for pointer.
         f9 |
 SP ---> 44 |
         23
         01
         40
         . 
         .
         .

This allows the compiler to allocate all local vars in one instruction that moves the stack pointer down the stack (and free them all in one instruction, by moving the stack pointer back up), but forces you to initialize them yourself, if you need to do that.

In C99, you can mix code and declarations, so you can postpone your declaration in the code until you are able to initialize it. This will allow you to avoid having to set it to NULL.

Tim Schaeffer
A: 

For it to point to NULL it would have to have NULL assigned to it ( even if it was done automatically and transparently ).

So, to answer your question, the reason a pointer can't be both unassigned and NULL is because a pointer can not be both not assigned and assigned at the same time.

In other languages such as Java and C# unassigned values still get some predictable value. Your argument based purely on your semantics and not mine.
Adam Gent
+1  A: 

So, to sum up what ninjalj explained, if you change your example program slightly you pointers will infact initialize to NULL:

#include <stdio.h>

// Change the "storage" of the pointer-variables from "stack" to "bss"  
int * randomA;
int * randomB;

void main() 
{
  int * nullA = NULL;
  int * nullB = NULL;

  printf("randomA: %p, randomB: %p, nullA: %p, nullB: %p\n\n", 
     randomA, randomB, nullA, nullB);
}

On my machine this prints

randomA: 00000000, randomB: 00000000, nullA: 00000000, nullB: 00000000

S.C. Madsen
Interestingly my C Runtime seems to print out nil() instead of 00000000.What OS/C compiler are you using.
Adam Gent
MSYS / MinGW, GCC version 3.4.2 (mingw-special)
S.C. Madsen
+1  A: 

The idea that this has anything to do with random memory contents when a machine is powered up is bogus, except on embedded systems. Any machine with virtual memory and a multiprocess/multiuser operating system will initialize memory (usually to 0) before giving it to a process. Failure to do so would be a major security breach. The 'random' values in automatic-storage variables come from previous use of the stack by the same process. Similarly, the 'random' values in memory returned by malloc/new/etc. come from previous allocations (that were subsequently freed) in the same process.

R..