views:

586

answers:

8

I recently encountered a case where I need to compare two files (golden and expected) for verification of test results and even though the data written to both the files were same, the files does not match.

On further investigation, I found that there is a structure which contains some integers and a char array of 64 bytes, and not all the bytes of char array were getting used in most of the cases and unused fields from the array contain random data and that was causing the mismatch.

This brought me ask the question whether it is good practice to initialize the array in C/C++ as well, as it is done in Java?

+1  A: 

If you don't initialize the values in a c++ array, then the values could be anything, so it would be good practice to zero them out if you want predictable results.

But if you use the char array like a null terminated string, then you should be able to write it to a file with the proper function.

Although in c++ it might be better to use a more OOP solution. I.E. vectors, strings, etc.

zipcodeman
No, if you access uninitialized memory you have a bug in your program. Systematically initializing variables often generate unnecessary write accesses.
tristopia
+21  A: 

It is good practice to initialise memory/variables before you use them - uninitialised variables are a big source of bugs that are often very hard to track down.

Initialising all the data is a very good idea when writing it to a file format: It keeps the file contents cleaner so they are easier to work with, less prone to problems if someone incorrectly tries to "use" the uninitialised data (remember it may not just be your own code that reads the data in future), and makes the files much more compressible.

The only good reason not to initialise variables before you use them is in performance-critical situations, where the initialisation is technically "unnecessary" and incurs a significant overhead. But in most cases initialising variables won't cause significant harm (especially if they are only declared immediately before they are used), but will save you a lot of development time by eliminating a common source of bugs.

Jason Williams
just a few hints that go in the same direction: use the smallest scope possible for your variables. Initialize directly after declaration. Use variables only for one purpose. Use const if you intend your variable not to change value. The main reason for these hints: they improve readability and reduce the chance for subtle bugs during code changes. The code may be bugfree now, without following these hints - but following them it's easier to keep it bug free with each code change.
Tobias Langner
Define and initialize in one statement whenever possible.
No, initialize your variable where it is appropriate. A for loop is more readable when the loop variable is initialized in the for statement. Your variable might have been declared before the loop.
tristopia
+5  A: 

Using an undefined value in an array results in undefined behaviour. Thus the program is free to produce differing results. This may mean your files end up slightly different, or that the program crashes, or the program formats your hard drive, or the program causes demons to fly out the users nose ( http://catb.org/jargon/html/N/nasal-demons.html )

This doesn't mean you need to define your array values when you create the array, but you must ensure you initialise any array value before you use it. Of course the simplest way to ensure this is to do this when you create the array.

MyStruct array[10];
printf( "%f", array[2].v ); // POTENTIAL BANG!
array[3].v = 7.0;
...
printf( "%f", array[3].v ); // THIS IS OK.

Don't forget that for huge arrays of PODs there's a nice shorthand to initialise all members to zero

MyPODStruct bigArray[1000] = { 0 };
Michael Anderson
+1 to the shorthand, I use this style a fair amount. One word of caution without specific proof of it, but I believe the = {0} format can be optimized out much like the Win32 ZeroMemory (http://msdn.microsoft.com/en-us/library/aa366920%28VS.85%29.aspx) API, subject to the compilers desires and optimization settings.
dirtybird
It can only be optimised out if the compiler can prove that it won't make an observable difference (ie, all the values are subequently set anyway). In this case, with the array being written out to a file, it definitely would make an observable difference.
caf
A: 

First, you should initialize arrays, variables, etc. if not doing so will mess with your program's correctness.

Second, it appears that in this particular case, not initializing the array did not affect the correctness of the original program. Instead, the program meant to compare the files does not know enough about the file format used to tell if the files differ in a meaningful way ("meaningful" defined by the first program).

Instead of complaining about the original program, I would fix the comparison program to know more about the file format in question. If the file format isn't well documented then you've got a good reason to complain.

Max Lybbert
+3  A: 

I strongly disagree with the given opinions that doing so is "eliminating a common source of bugs" or "not doing so will mess with your program's correctness". If the program works with unitialized values then it has a bug and is incorrect. Initializing the values does not eliminate this bug, because they often still do not have the expected values at the first use. However, when they contain random garbage, the program is more likely to crash in a random way at every try. Always having the same values may give a more deterministic behaviour in crashing and makes debugging easier.

For your specific question, it is also good security practice to overwrite unused parts before they are written to a file, because they may contain something from a previous use that you do not want to be written, like passwords.

Secure
+1. I have seen folks just believe that initializing a variable solves all the problems. No...it just helps when you initialize that pointer to NULL that a kaboom *always* happens instead of only seeing the issue crop up here and there and pulling your hair out attempting to track it down. There are also the scenarios where you are debugging and execute the same blob over several times, and if you have a logic flaw, you may get down unexpected paths because the variable/array/etc just happens to wind up at the same memory location, using the previous value.
dirtybird
A: 

I would say that the good practice in C++ is using a std::vector<> instead of an array. This is not valid for C, of course.

Gorpik
+1  A: 

Keep in mind that keeping arrays uninitialized may have advantages like performance.

It's only bad reading from uninitialized arrays. Having them around without ever reading from uninitialized places is fine.

Moreover if your program has bug that makes it read from uninitialized place in array, then "covering it up" by defensively initializing all array to known value is not the solution for bug, and can only make it surface later.

zaharpopov
+1 It happens more often than one would think.
tristopia
A: 

One could write a big article on the difference between the two styles one can encounter, people who initialize variables always when declaring them and people who initialize them when necessary. I share a big project with someone who is in the first category and I am now definitly more of the second type. Always initializing variables has brought more subtle bugs and problems than not and I will try to explain why, remembering the cases I found. First example:

struct NODE Pop(STACK * Stack)
{
  struct NODE node = EMPTY_STACK;

  if(Stack && Stack->stackPointer)
    node = Stack->node[--Stack->stackPointer];

  return node;
}

This was the code written by the other guy. This function is the hottest function in our application (you imagine a text index on 500 000 000 sentences in a ternary tree, the FIFO stack is used to handle the recursion as we do not want to use recursive function calls). This was typical of his programming style because of his systematic initialization of variables. The problem with that code was the hidden memcpy of the initialization and the two other copies of the structures (which btw were not calls to memcpy gcc's strange sometimes), so we had 3 copies + a hidden function call in the hottest function of the project. Rewriting it to

struct NODE Pop(STACK * Stack)
{
  if(Stack && Stack->stackPointer)
    return Stack->node[--Stack->stackPointer];
  return EMPTY_STACK;
}

Only one copy (and supplemental benefit on SPARC where it runs, the function is a leaf function thanks to the avoided call to memcpy and does not need to build a new register window). So the function was 4 times faster.

Another problem I found ounce but do not remember where exactly (so no code example, sorry). A variable that was initialized when declared but it was used in a loop, with switch in a finite state automaton. The problem the initialization value was not one of the states of the automaton and in some extremly rare cases the automaton didn't work correctly. By removing the initializer, the warning the compiler emitted made it obvious that the variable could be used before it was properly initialized. Fixing the automaton was easy then. Morality: defensively initialising a variable may suppress a very usefull warning of the compiler.

Conclusion: Initialise your variables wisely. Doing it systematicaly is nothing more than following a cargo-cult (my buddy at work is the worse cargo-culter one can imagine, he never uses goto, always initialize a variable, use a lot of static declarations (it's faster ye know (it's in fact even really slow on SPARC 64bit), makes all functions inline even if they have 500 lines (using __attribute__((always_inline)) when the compiler does not want)

tristopia