tags:

views:

217

answers:

5

Where are the garbage value stored, and for what purpose?

+7  A: 

C chooses to not initialize variables to some automatic value for efficiency reasons. In order to initialize this data, instructions must be added. Here's an example:

int main(int argc, const char *argv[])
{
    int x;
    return x;
}

generates:

pushl %ebp
movl  %esp, %ebp
subl  $16, %esp
movl  -4(%ebp), %eax
leave
ret

While this code:

int main(int argc, const char *argv[])
{
   int x=1;
   return x;
}

generates:

pushl %ebp
movl  %esp, %ebp
subl  $16, %esp
movl  $1, -4(%ebp)
movl  -4(%ebp), %eax
leave
ret

As you can see, a full extra instruction is used to move 1 into x. This used to matter, and still does on embedded systems.

Rannick
Your 1st snippet is not a very good random number generator.
Hans Passant
I'm not sure what is meant by: "this used to matter, and still does on embedded systems".
Michael Burr
@Michael Burr: I interpreted it as "unless you're working on an embedded system, you shouldn't worry about the performance impact of the extra instruction."
Cogwheel - Matthew Orlando
It does still matter if your auto variable is a large array and your function happens to not do a lot in that call.
ninjalj
@Cogwheel: That was what it was intended to get across. You know the rules: make it work, make it right, make it fast... In THAT order.
Rannick
@ninjalj: given that my commodity laptop has a processor capable of doing 120 million instructions per second, you would have to be dealing with a VERY large array. Obviously you can make the case occur, but in reality, variable initialization does not take enough overhead to be given a second thought outside of the embedded world.
Rannick
@Rannick: or you could be calling that function thousands of times a second. It happened in a project I worked on. Some logging function was initializing a buffer, and then deciding that logging was disabled for anything with less priority than ERROR and returning.
ninjalj
@ninjalj: Ahh, i gotcha... That does sound like some scary code as well. I accept your large array exception, but i still say most variables can stand to be better served by being initialized to their default value, or an error value.I bet you run into the same issues at your place's definition of ANSI compliant. There's nothing wrong with defining variables on a per-block basis. It further limits their scope, as well. Initialize said buffer as the first thing in your if statement, or maybe a new function is in order?
Rannick
+3  A: 

IIRC, Thompson or Richie did an interview some years ago where they said the language definition purposely left things vague in some places so the implementers on specific platforms had leeway to do things that made sense (cycles, memory, etc) on that platform. Sorry I don't have a reference to link to.

DaveE
+2  A: 

C was designed to be a relatively low-level language so that it could be used for writing, well, low-level stuff like operating systems. (in fact, it was designed so that UNIX could be written in C) You can simply think of it as assembly code with readable syntax and higher-level constructs. For this reason, C (minus optimizations) does exactly what you ask it to do, nothing more, nothing less.

When you write int x;, the compiler simply allocates memory for the integer. You never asked it to store anything there, so whatever was in that location when your program started stays as such. Most often, it turns out that the pre-existing value is "garbage".

Sometimes, an external program (for eg. a device driver) may write into some of your variables, so it is unnecessary to add another instruction to initialize such variables.

casablanca
+3  A: 

Garbage values are not really stored anywhere. In fact, garbage values do not really exist, as far as the abstract language is concerned.

You see, in order to generate the most efficient code it is not sufficient for the compiler to operate in terms of lifetimes of objects (variables). In order to generate the most efficient code, the compiler must operate at much finer level: it must "think" in terms of lifetimes of values. This is absolutely necessary in order to perform efficient scheduling of the CPU registers, for one example.

The abstract language has no such concept as "lifetime of value". However, the language authors recognize the importance of that concept to the optimizing compilers. In order to give the compilers enough freedom to perform efficient optimizations, the language is intentionally specified so that it doesn't interfere with important optimizations. This is where the "garbage values" come into picture. The language does not state that garbage values are stored anywhere, the language does not guarantee that the garbage values are stable (i.e. repeated attempts to read the same uninitialized variable might easily result in different "garbage values"). This is done specifically to allow optimizing compilers to implement the vital concept of "lifetime of value" and thus perform more efficient variable manipulation than would be dictated by the language concept of "object lifetime".

AndreyT
A: 

See my response to this question about uninitialized pointers.

ninjalj