views:

1991

answers:

24

I'd like to prepare a little educational tool for SO which should help beginners (and intermediate) programmers to recognize and challenge their unwarranted assumptions in C, C++ and their platforms.

Examples:

  • "integers wrap around"
  • "everyone has ASCII"
  • "I can store a function pointer in a void*"

I figured that a small test program could be run on various platforms, which runs the "plausible" assumptions which are, from our experience in SO, usually made by many inexperienced/semiexperienced mainstream developers and record the ways they break on diverse machines.

The goal of this is not to prove that it is "safe" to do something (which would be impossible to do, the tests prove only anything if they break), but instead to demonstrate to even the most uncomprehending individual how the most inconspicuous expression break on a different machine, if it has a undefined or implementation defined behavior..

To achieve this I would like to ask you:

  • How can this idea be improved?
  • Which tests would be good and how should they look like?
  • Would you run the tests on the platforms you can get your hands on and post the results, so that we end up with a database of platforms, how they differ and why this difference is allowed?

Here's the current version for the test toy:

#include <stdio.h>
#include <limits.h>
#include <stdlib.h>
#include <stddef.h>
int count=0;
int total=0;
void expect(const char *info, const char *expr)
{
    printf("..%s\n   but '%s' is false.\n",info,expr);
    fflush(stdout);
    count++;
}
#define EXPECT(INFO,EXPR) if (total++,!(EXPR)) expect(INFO,#EXPR)

/* stack check..How can I do this better? */
ptrdiff_t check_grow(int k, int *p)
{
    if (p==0) p=&k;
    if (k==0) return &k-p;
    else return check_grow(k-1,p);
}
#define BITS_PER_INT (sizeof(int)*CHAR_BIT)

int bits_per_int=BITS_PER_INT;
int int_max=INT_MAX;
int int_min=INT_MIN;

/* for 21 - left to right */
int ltr_result=0;
unsigned ltr_fun(int k)
{
    ltr_result=ltr_result*10+k;
    return 1;
}

int main()
{
    printf("We like to think that:\n");
    /* characters */
    EXPECT("00 we have ASCII",('A'==65));
    EXPECT("01 A-Z is in a block",('Z'-'A')+1==26);
    EXPECT("02 big letters come before small letters",('A'<'a'));
    EXPECT("03 a char is 8 bits",CHAR_BIT==8);
    EXPECT("04 a char is signed",CHAR_MIN==SCHAR_MIN);

    /* integers */
    EXPECT("05 int has the size of pointers",sizeof(int)==sizeof(void*));
    /* not true for Windows-64 */
    EXPECT("05a long has at least the size of pointers",sizeof(long)>=sizeof(void*));

    EXPECT("06 integers are 2-complement and wrap around",(int_max+1)==(int_min));
    EXPECT("07 integers are 2-complement and *always* wrap around",(INT_MAX+1)==(INT_MIN));
    EXPECT("08 overshifting is okay",(1<<bits_per_int)==0);
    EXPECT("09 overshifting is *always* okay",(1<<BITS_PER_INT)==0);
    {
        int t;
        EXPECT("09a minus shifts backwards",(t=-1,(15<<t)==7));
    }
    /* pointers */
    /* Suggested by jalf */
    EXPECT("10 void* can store function pointers",sizeof(void*)>=sizeof(void(*)()));
    /* execution */
    EXPECT("11 Detecting how the stack grows is easy",check_grow(5,0)!=0);
    EXPECT("12 the stack grows downwards",check_grow(5,0)<0);

    {
        int t;
        /* suggested by jk */
        EXPECT("13 The smallest bits always come first",(t=0x1234,0x34==*(char*)&t));
    }
    {
        /* Suggested by S.Lott */
        int a[2]={0,0};
        int i=0;
        EXPECT("14 i++ is strictly left to right",(i=0,a[i++]=i,a[0]==1));
    }
    {
        struct {
            char c;
            int i;
        } char_int;
        EXPECT("15 structs are packed",sizeof(char_int)==(sizeof(char)+sizeof(int)));
    }
    {
        EXPECT("16 malloc()=NULL means out of memory",(malloc(0)!=NULL));
    }

    /* suggested by David Thornley */
    EXPECT("17 size_t is unsigned int",sizeof(size_t)==sizeof(unsigned int));
    /* this is true for C99, but not for C90. */
    EXPECT("18 a%b has the same sign as a",((-10%3)==-1) && ((10%-3)==1));

    /* suggested by nos */
    EXPECT("19-1 char<short",sizeof(char)<sizeof(short));
    EXPECT("19-2 short<int",sizeof(short)<sizeof(int));
    EXPECT("19-3 int<long",sizeof(int)<sizeof(long));
    EXPECT("20 ptrdiff_t and size_t have the same size",(sizeof(ptrdiff_t)==sizeof(size_t)));
#if 0
    {
        /* suggested by R. */
        /* this crashed on TC 3.0++, compact. */
        char buf[10];
        EXPECT("21 You can use snprintf to append a string",
               (snprintf(buf,10,"OK"),snprintf(buf,10,"%s!!",buf),strcmp(buf,"OK!!")==0));
    }
#endif

    EXPECT("21 Evaluation is left to right",
           (ltr_fun(1)*ltr_fun(2)*ltr_fun(3)*ltr_fun(4),ltr_result==1234));

    {
    #ifdef __STDC_IEC_559__
    int STDC_IEC_559_is_defined=1;
    #else 
    /* This either means, there is no FP support
     *or* the compiler is not C99 enough to define  __STDC_IEC_559__
     *or* the FP support is not IEEE compliant. */
    int STDC_IEC_559_is_defined=0;
    #endif
    EXPECT("22 floating point is always IEEE",STDC_IEC_559_is_defined);
    }

    printf("From what I can say with my puny test cases, you are %d%% mainstream\n",100-(100*count)/total);
    return 0;
}

Oh, and I made this community wiki right from the start because I figured that people want to edit my blabber when they read this.

UPDATE Thanks for your input. I've added a few cases from your answers and will see if I can set up a github for this like Greg suggested.

UPDATE: I've created a github repo for this, the file is "gotcha.c":

Please answer here with patches or new ideas, so they can be discussed or clarified here. I will merge them into gotcha.c then.

+24  A: 

sdcc 29.7/ucSim/Z80

We like to think that:
..09a minus shifts backwards
   but '(t=-1,(15<<t)==7)' is false.
..19-2 short<int
   but 'sizeof(short)<sizeof(int)' is false.
..22 floating point is always IEEE
   but 'STDC_IEC_559_is_defined' is false.
..25 pointer arithmetic works outside arrays
   but '(diff=&var.int2-&var.int1, &var.int1+diff==&var.int2)' is false.
From what I can say with my puny test cases, you are Stop at 0x0013f3: (106) Invalid instruction 0x00dd

printf crashes. "O_O"


gcc 4.4@x86_64-suse-linux

We like to think that:
..05 int has the size of pointers
but 'sizeof(int)==sizeof(void*)' is false.
..08 overshifting is okay
but '(1<<bits_per_int)==0' is false.
..09a minus shifts backwards
but '(t=-1,(15<<t)==7)' is false.
..14 i++ is strictly left to right
but '(i=0,a[i++]=i,a[0]==1)' is false.
..15 structs are packed
but 'sizeof(char_int)==(sizeof(char)+sizeof(int))' is false.
..17 size_t is unsigned int
but 'sizeof(size_t)==sizeof(unsigned int)' is false.
..26 sizeof() does not evaluate its arguments
but '(i=10,sizeof(char[((i=20),10)]),i==10)' is false.
From what I can say with my puny test cases, you are 79% mainstream

gcc 4.4@x86_64-suse-linux(-O2)

We like to think that:
..05 int has the size of pointers
but 'sizeof(int)==sizeof(void*)' is false.
..08 overshifting is okay
but '(1<<bits_per_int)==0' is false.
..14 i++ is strictly left to right
but '(i=0,a[i++]=i,a[0]==1)' is false.
..15 structs are packed
but 'sizeof(char_int)==(sizeof(char)+sizeof(int))' is false.
..17 size_t is unsigned int
but 'sizeof(size_t)==sizeof(unsigned int)' is false.
..26 sizeof() does not evaluate its arguments
but '(i=10,sizeof(char[((i=20),10)]),i==10)' is false.
From what I can say with my puny test cases, you are 82% mainstream

clang 2.7@x86_64-suse-linux

We like to think that:
..05 int has the size of pointers
but 'sizeof(int)==sizeof(void*)' is false.
..08 overshifting is okay
but '(1<<bits_per_int)==0' is false.
..09a minus shifts backwards
but '(t=-1,(15<<t)==7)' is false.
..14 i++ is strictly left to right
but '(i=0,a[i++]=i,a[0]==1)' is false.
..15 structs are packed
but 'sizeof(char_int)==(sizeof(char)+sizeof(int))' is false.
..17 size_t is unsigned int
but 'sizeof(size_t)==sizeof(unsigned int)' is false.
..21a Function Arguments are evaluated right to left
but '(gobble_args(0,ltr_fun(1),ltr_fun(2),ltr_fun(3),ltr_fun(4)),ltr_result==4321)' is false.
ltr_result is 1234 in this case
..25a pointer arithmetic works outside arrays
but '(diff=&p1-&p2, &p2+diff==&p1)' is false.
..26 sizeof() does not evaluate its arguments
but '(i=10,sizeof(char[((i=20),10)]),i==10)' is false.
From what I can say with my puny test cases, you are 72% mainstream

open64 4.2.3@x86_64-suse-linux

We like to think that:
..05 int has the size of pointers
but 'sizeof(int)==sizeof(void*)' is false.
..08 overshifting is okay
but '(1<<bits_per_int)==0' is false.
..09a minus shifts backwards
but '(t=-1,(15<<t)==7)' is false.
..15 structs are packed
but 'sizeof(char_int)==(sizeof(char)+sizeof(int))' is false.
..17 size_t is unsigned int
but 'sizeof(size_t)==sizeof(unsigned int)' is false.
..21a Function Arguments are evaluated right to left
but '(gobble_args(0,ltr_fun(1),ltr_fun(2),ltr_fun(3),ltr_fun(4)),ltr_result==4321)' is false.
ltr_result is 1234 in this case
..25a pointer arithmetic works outside arrays
but '(diff=&p1-&p2, &p2+diff==&p1)' is false.
..26 sizeof() does not evaluate its arguments
but '(i=10,sizeof(char[((i=20),10)]),i==10)' is false.
From what I can say with my puny test cases, you are 75% mainstream

intel 11.1@x86_64-suse-linux

We like to think that:
..05 int has the size of pointers
but 'sizeof(int)==sizeof(void*)' is false.
..08 overshifting is okay
but '(1<<bits_per_int)==0' is false.
..09a minus shifts backwards
but '(t=-1,(15<<t)==7)' is false.
..14 i++ is strictly left to right
but '(i=0,a[i++]=i,a[0]==1)' is false.
..15 structs are packed
but 'sizeof(char_int)==(sizeof(char)+sizeof(int))' is false.
..17 size_t is unsigned int
but 'sizeof(size_t)==sizeof(unsigned int)' is false.
..21a Function Arguments are evaluated right to left
but '(gobble_args(0,ltr_fun(1),ltr_fun(2),ltr_fun(3),ltr_fun(4)),ltr_result==4321)' is false.
ltr_result is 1234 in this case
..26 sizeof() does not evaluate its arguments
but '(i=10,sizeof(char[((i=20),10)]),i==10)' is false.
From what I can say with my puny test cases, you are 75% mainstream

Turbo C++/DOS/Small Memory

We like to think that:
..09a minus shifts backwards
but '(t=-1,(15<<t)==7)' is false.
..16 malloc()=NULL means out of memory
but '(malloc(0)!=NULL)' is false.
..19-2 short<int
but 'sizeof(short)<sizeof(int)' is false.
..22 floating point is always IEEE
but 'STDC_IEC_559_is_defined' is false.
..25 pointer arithmetic works outside arrays
but '(diff=&var.int2-&var.int1, &var.int1+diff==&var.int2)' is false.
..25a pointer arithmetic works outside arrays
but '(diff=&p1-&p2, &p2+diff==&p1)' is false.
From what I can say with my puny test cases, you are 81% mainstream

Turbo C++/DOS/Medium Memory

We like to think that:
..09a minus shifts backwards
but '(t=-1,(15<<t)==7)' is false.
..10 void* can store function pointers
but 'sizeof(void*)>=sizeof(void(*)())' is false.
..16 malloc()=NULL means out of memory
but '(malloc(0)!=NULL)' is false.
..19-2 short<int
but 'sizeof(short)<sizeof(int)' is false.
..22 floating point is always IEEE
but 'STDC_IEC_559_is_defined' is false.
..25 pointer arithmetic works outside arrays
but '(diff=&var.int2-&var.int1, &var.int1+diff==&var.int2)' is false.
..25a pointer arithmetic works outside arrays
but '(diff=&p1-&p2, &p2+diff==&p1)' is false.
From what I can say with my puny test cases, you are 78% mainstream

Turbo C++/DOS/Compact Memory

We like to think that:
..05 int has the size of pointers
but 'sizeof(int)==sizeof(void*)' is false.
..09a minus shifts backwards
but '(t=-1,(15<<t)==7)' is false.
..16 malloc()=NULL means out of memory
but '(malloc(0)!=NULL)' is false.
..19-2 short<int
but 'sizeof(short)<sizeof(int)' is false.
..20 ptrdiff_t and size_t have the same size
but '(sizeof(ptrdiff_t)==sizeof(size_t))' is false.
..22 floating point is always IEEE
but 'STDC_IEC_559_is_defined' is false.
..25 pointer arithmetic works outside arrays
but '(diff=&var.int2-&var.int1, &var.int1+diff==&var.int2)' is false.
..25a pointer arithmetic works outside arrays
but '(diff=&p1-&p2, &p2+diff==&p1)' is false.
From what I can say with my puny test cases, you are 75% mainstream

cl65@Commodore PET (vice emulator)

alt text


I'll be updating these later:


Borland C++ Builder 6.0 on Windows XP

..04 a char is signed
   but 'CHAR_MIN==SCHAR_MIN' is false.
..08 overshifting is okay
   but '(1<<bits_per_int)==0' is false.
..09 overshifting is *always* okay
   but '(1<<BITS_PER_INT)==0' is false.
..09a minus shifts backwards
   but '(t=-1,(15<<t)==7)' is false.
..15 structs are packed
   but 'sizeof(char_int)==(sizeof(char)+sizeof(int))' is false.
..16 malloc()=NULL means out of memory
   but '(malloc(0)!=NULL)' is false.
..19-3 int<long
   but 'sizeof(int)<sizeof(long)' is false.
..22 floating point is always IEEE
   but 'STDC_IEC_559_is_defined' is false.
From what I can say with my puny test cases, you are 71% mainstream

Visual Studio Express 2010 C++ CLR, Windows 7 64bit

(must be compiled as C++ because the CLR compiler does not support pure C)

We like to think that:
..08 overshifting is okay
   but '(1<<bits_per_int)==0' is false.
..09a minus shifts backwards
   but '(t=-1,(15<<t)==7)' is false.
..14 i++ is structly left to right
   but '(i=0,a[i++]=i,a[0]==1)' is false.
..15 structs are packed
   but 'sizeof(char_int)==(sizeof(char)+sizeof(int))' is false.
..19-3 int<long
   but 'sizeof(int)<sizeof(long)' is false.
..22 floating point is always IEEE
   but 'STDC_IEC_559_is_defined' is false.
From what I can say with my puny test cases, you are 78% mainstream

MINGW64 (gcc-4.5.2 prerelase)

-- http://mingw-w64.sourceforge.net/

We like to think that:
..05 int has the size of pointers
   but 'sizeof(int)==sizeof(void*)' is false.
..05a long has at least the size of pointers
   but 'sizeof(long)>=sizeof(void*)' is false.
..08 overshifting is okay
   but '(1<<bits_per_int)==0' is false.
..09a minus shifts backwards
   but '(t=-1,(15<<t)==7)' is false.
..14 i++ is structly left to right
   but '(i=0,a[i++]=i,a[0]==1)' is false.
..15 structs are packed
   but 'sizeof(char_int)==(sizeof(char)+sizeof(int))' is false.
..17 size_t is unsigned int
   but 'sizeof(size_t)==sizeof(unsigned int)' is false.
..19-3 int<long
   but 'sizeof(int)<sizeof(long)' is false.
..22 floating point is always IEEE
   but 'STDC_IEC_559_is_defined' is false.
From what I can say with my puny test cases, you are 67% mainstream

64 bit Windows uses the LLP64 model: Both int and long are defined as 32-bit, which means that neither is long enough for a pointer.


avr-gcc 4.3.2 / ATmega168 (Arduino Diecimila)

The failed assumptions are:

..14 i++ is structly left to right
..16 malloc()=NULL means out of memory
..19-2 short<int
..21 Evaluation is left to right
..22 floating point is always IEEE

The Atmega168 has a 16 bit PC, but code and data are in separate address spaces. Larger Atmegas have a 22 bit PC!.


gcc 4.2.1 on MacOSX 10.6, compiled with -arch ppc

We like to think that:
..09a minus shifts backwards
   but '(t=-1,(15<<t)==7)' is false.
..13 The smallest bits come always first
   but '(t=0x1234,0x34==*(char*)&t)' is false.
..14 i++ is structly left to right
   but '(i=0,a[i++]=i,a[0]==1)' is false.
..15 structs are packed
   but 'sizeof(char_int)==(sizeof(char)+sizeof(int))' is false.
..19-3 int<long
   but 'sizeof(int)<sizeof(long)' is false.
..22 floating point is always IEEE
   but 'STDC_IEC_559_is_defined' is false.
From what I can say with my puny test cases, you are 78% mainstream

Luther Blissett
And you've identified another assumption: that you can fit 80 characters on a terminal line.
Mike Seymour
`sizeof(void*)>=sizeof(void(*)())` would be more relevant than ==. All we care about is "can we store a function pointer in a void pointer", so the assumption you need to test is whether a `void*` is *at least* as big as a function pointer.
jalf
@jalf, good point
Luther Blissett
If your environment is POSIX-compliant, you should be okay with `sizeof(void*)>=sizeof(void(*)())` - see http://www.opengroup.org/onlinepubs/009695399/functions/dlsym.html
Daniel Earwicker
+3  A: 
  • Discretization errors due to floating point representation. For example, if you use the standard formula to solve quadratic equations, or finite differences to approximate derivatives, or the standard formula to calculate variances, precision will be lost due to the calculation of differences between similiar numbers. The Gauß algorithm to solve linear systems is bad because rounding errors accumulate, thus one uses QR or LU decomposition, Cholesky decomposition, SVD, etc. Addition of floating point numbers is not associative. There are denormal, infinite and NaN values. a + bab.

  • Strings: Difference between characters, code points, and code units. How Unicode is implemented on the various operating systems; Unicode encodings. Opening a file with an arbitrary Unicode file name is not possible with C++ in a portable way.

  • Race conditions, even without threading: if you test whether a file exists, the result could become invalid at any time.

  • ERROR_SUCCESS = 0

Philipp
+11  A: 

You need to include the ++ and -- assumptions people make.

a[i++]= i;

For example, is syntactically legal, but produces varying results depending on too many things to reason out.

Any statement that has ++ (or --) and a variable which occurs more than once is a problem.

S.Lott
And it's just such a common question too!
Matthieu M.
+6  A: 

Very interesting!

Other things I can think of it might be useful to check for:

  • do function pointers and data pointers exist in the same address space? (Breaks in Harvard architecture machines like DOS small mode. Don't know how you'd test for it, though.)

  • if you take a NULL data pointer and cast it to the appropriate integer type, does it have the numeric value 0? (Breaks on some really ancient machines --- see http://c-faq.com/null/machexamp.html.) Ditto with function pointer. Also, they may be different values.

  • does incrementing a pointer past the end of its corresponding storage object, and then back again, cause sensible results? (I don't know of any machines this actually breaks on, but I believe the C spec does not allow you to even think about pointers that don't point to either (a) the contents of an array or (b) the element immediately after the array or (c) NULL. See http://c-faq.com/aryptr/non0based.html.)

  • does comparing two pointers to different storage objects with < and > produce consistent results? (I can imagine this breaking on exotic segment-based machines; the spec forbids such comparisons, so the compiler would be entitled to compare the offset part of the pointer only, and not the segment part.)

Hmm. I'll try and think of some more.

Edit: Added some clarifying links to the excellent C FAQ.

David Given
Incidentally, a while back I did an experimental project called Clue (http://cluecc.sourceforge.net) which allowed you to compile C into Lua, Javascript, Perl, LISP, etc. It ruthlessly exploited the undefined behaviour in the C standard to make pointers work. It may be interesting to try this test on it.
David Given
IIRC C allows you to increment a pointer by **1** beyond the end of an object, but not any further. Decrementing it to a position before the beginning of an object is not allowed, however.
R..
@R. Same in C++. And incrementing further might break if incrementing the pointer causes an overflow, on CPU's which don't just treat pointers as integers.
jalf
+3  A: 

Some of them can't easily be tested from inside C because the program is likely to crash on the implementations where the assumption doesn't hold.


"It's ok to do anything with a pointer-valued variable. It only needs to contain a valid pointer value if you dereference it."

void noop(void *p); /* A no-op function that the compiler doesn't know to optimize away */
int main () {
    char *p = malloc(1);
    free(p);
    noop(p); /* may crash in implementations that verify pointer accesses */
    noop(p - 42000); /* and if not the previous instruction, maybe this one */
}

Same with integral and floating point types (other than unsigned char), which are allowed to have trap representations.


"Integer calculations wrap around. So this program prints a large negative integer."

#include <stdio.h>
int main () {
    printf("%d\n", INT_MAX+1); /* may crash due to signed integer overflow */
    return 0;
}

(C89 only.) "It's ok to fall off the end of main."

#include <stdio.h>
int main () {
    puts("Hello.");
} /* The status code is 7 on many implementations. */
Gilles
As a concrete example: When compiled with `gcc -ftrapv -O`, the output is `We like to think that:` followed by `Aborted`
caf
@caf: "This option generates traps for signed overflow on addition, subtraction, multiplication operations." Nice to know, thanks.
Gilles
The last one is ok in C++ (98, 03 and 0x) as well, and implicitly returns 0.
jalf
Which is nasty because pre-ANSI C allowed this and C99 does as well.
Joshua
@Joshua: AFAIK there is no difference between pre-ANSI C and C89 on return from `main` with no value: the program is correct but returns an undefined termination status (C89 §2.1.2.2). With many implementations (such as gcc, and older unix compilers) you get whatever was in a certain register at that point. The program typically works until it's used in a makefile or other environment that checks the termination status.
Gilles
This could be a real issue with dos protected mode architecture. Trying to load an invalid pointer into a segment register:register pair would blow up if the segment register wasn't a valid segment. I had a long debug session with this once when my pointer was to a real-mode address. I handled it safely, the library didn't. (And the debugger was heisenbugged also--single-stepping through assembly code worked despite the invalid segment.
Loren Pechtel
Giles, you're right about random return. The guarantee in pre-ansi was only it wouldn't crash as that -ftrapv compilation did.
Joshua
+3  A: 

EDIT: Updated to the last version of the program

Solaris-SPARC

gcc 3.4.6 in 32 bit

We like to think that:
..08 overshifting is okay
   but '(1<<bits_per_int)==0' is false.
..09 overshifting is *always* okay
   but '(1<<BITS_PER_INT)==0' is false.
..09a minus shifts backwards
   but '(t=-1,(15<<t)==7)' is false.
..13 The smallest bits always come first
   but '(t=0x1234,0x34==*(char*)&t)' is false.
..14 i++ is strictly left to right
   but '(i=0,a[i++]=i,a[0]==1)' is false.
..15 structs are packed
   but 'sizeof(char_int)==(sizeof(char)+sizeof(int))' is false.
..19-3 int<long
   but 'sizeof(int)<sizeof(long)' is false.
..22 floating point is always IEEE
   but 'STDC_IEC_559_is_defined' is false.
From what I can say with my puny test cases, you are 72% mainstream

gcc 3.4.6 in 64 bit

We like to think that:
..05 int has the size of pointers
   but 'sizeof(int)==sizeof(void*)' is false.
..08 overshifting is okay
   but '(1<<bits_per_int)==0' is false.
..09 overshifting is *always* okay
   but '(1<<BITS_PER_INT)==0' is false.
..09a minus shifts backwards
   but '(t=-1,(15<<t)==7)' is false.
..13 The smallest bits always come first
   but '(t=0x1234,0x34==*(char*)&t)' is false.
..14 i++ is strictly left to right
   but '(i=0,a[i++]=i,a[0]==1)' is false.
..15 structs are packed
   but 'sizeof(char_int)==(sizeof(char)+sizeof(int))' is false.
..17 size_t is unsigned int
   but 'sizeof(size_t)==sizeof(unsigned int)' is false.
..22 floating point is always IEEE
   but 'STDC_IEC_559_is_defined' is false.
From what I can say with my puny test cases, you are 68% mainstream

and with SUNStudio 11 32 bit

We like to think that:
..08 overshifting is okay
   but '(1<<bits_per_int)==0' is false.
..09a minus shifts backwards
   but '(t=-1,(15<<t)==7)' is false.
..13 The smallest bits always come first
   but '(t=0x1234,0x34==*(char*)&t)' is false.
..14 i++ is strictly left to right
   but '(i=0,a[i++]=i,a[0]==1)' is false.
..15 structs are packed
   but 'sizeof(char_int)==(sizeof(char)+sizeof(int))' is false.
..19-3 int<long
   but 'sizeof(int)<sizeof(long)' is false.
From what I can say with my puny test cases, you are 79% mainstream

and with SUNStudio 11 64 bit

We like to think that:
..05 int has the size of pointers
   but 'sizeof(int)==sizeof(void*)' is false.
..08 overshifting is okay
   but '(1<<bits_per_int)==0' is false.
..09a minus shifts backwards
   but '(t=-1,(15<<t)==7)' is false.
..13 The smallest bits always come first
   but '(t=0x1234,0x34==*(char*)&t)' is false.
..14 i++ is strictly left to right
   but '(i=0,a[i++]=i,a[0]==1)' is false.
..15 structs are packed
   but 'sizeof(char_int)==(sizeof(char)+sizeof(int))' is false.
..17 size_t is unsigned int
   but 'sizeof(size_t)==sizeof(unsigned int)' is false.
From what I can say with my puny test cases, you are 75% mainstream
tristopia
+3  A: 

Well the classic portability assumptions not meantioned yet are

  • assumptions about size of integral types
  • endianness
jk
"Endianness", including "There is an endianness": there are middle-endian machines, and the standard allows weird things like storing a `short` value fedcab9876543210 (that's 16 binary digits) as the two bytes 0248ace and fdb97531.
Gilles
yes endianess for sure includes mixed/middle endian as well as big and little. if you go to custom hardware you could have any endianess you like on any bus.
jk
Middle endian?!?!?! Wow...you learn something new every day!
A. Levy
Middle endian is known as PDP endian. Gilles decribes something even weirder though that would cause headaches for implementing TCP/IP.
Joshua
@Gilles: middle-endian... I am very glad I'm not developing on that one. (but now I'll get asked to do a middle-endian networking project, I'm sure)...
Paul Nathan
ARM FPE used middle-endian doubles, where they were stored as a <high quad> <low quad> pair but the ordering of the bits inside each quad were the wrong way round. (Thankfully, ARM VFP doesn't do this any more.)
David Given
What about the pure binary representation in C99 §6.2.6.2? Bytes inside a word may have any ordering, but value bits inside a byte should follow a pure binary representation.
Alek
+4  A: 

I think you should make an effort to distinguish between two very different classes of "incorrect" assumptions. A good half (right shift and sign extension, ASCII-compatible encoding, memory is linear, data and function pointers are compatible, etc.) are pretty reasonable assumptions for most C coders to make, and might even be included as part of the standard if C were being designed today and if we didn't have legacy IBM junk grandfathered-in. The other half (things related to memory aliasing, behavior of library functions when input and output memory overlap, 32-bit assumptions like that pointers fit in int or that you can use malloc without a prototype, that calling convention is identical for variadic and non-variadic functions, ...) either conflict with optimizations modern compilers want to perform or with migration to 64-bit machines or other new technology.

R..
it's not just "IBM junk" (though I agree the IBM stuff is junk). Many embedded systems today have similar problems.
rmeador
To clarify, using `malloc` without a prototype means not including `<stdlib.h>`, which causes `malloc` to default to `int malloc(int)`, a no-no if you want to support 64-bit.
Joey Adams
Technically you're free not to include `<stdlib.h>` as long as you include another header that defines `size_t` and you then declare `malloc` with a correct prototype yourself.
R..
+35  A: 

The order of evaluation of subexpressions, including

  • the arguments of a function call and
  • operands of operators (e.g., +, -, =, * , /), with the exception of:
    • the binary logical operators (&& and ||),
    • the ternary conditional operator (?:), and
    • the comma operator (,)

is Unspecified

For example

  int Hello()
  {
       return printf("Hello"); /* printf() returns the number of 
                                  characters successfully printed by it
                               */
  }

  int World()
  {
       return printf("World !");
  }

  int main()
  {

      int a = Hello() + World(); //might print Hello World! or World! Hello
      /**             ^
                      | 
                Functions can be called in either order
      **/
      return 0;
  } 
Prasoon Saurav
+1 Damn, never knew that
mmsmatt
I had always known that about function parameters, but I never thought of it in terms of operators ... ... and if I ever see you writing code like that in a production environment, I will slap you with a wet noodle.
Stargazer712
Billy ONeal
@Billy: But only for the primitive versions of the operators.
Dennis Zickefoose
@Dennis @Billy Which is one of the more confusing parts of reading C++ code, and the reason I never overload those operators
Michael Mrozek
@Dennis: That is true. (Which is why it's an item in Effective/MoreEffective C++ to never overload those (Unless you're writing `boost::spirit`)
Billy ONeal
@Billy ONeal - item 30 in Sutter/Alexandrescu's Coding Standards book. But I'm not sure who it should be aimed at. If the problem is that users expect short-circuiting to work consistently, then the advice should be: don't *use* libraries that overload ` it's your caller who has to be careful. (Ultimately I think it's hollow advice anyway: internal DSLs have different semantics, get used to it.)
Daniel Earwicker
@Daniel: I'm not sure what you're trying to say. It sounds like you are suggesting its okay to overload the operators because its only the users of your class that might get it wrong, and if you aren't writing in straight C++ it doesn't matter. Neither of which make any sense at all.
Dennis Zickefoose
Daniel Earwicker
... then an exception is made in the rule for some libraries; how are those exceptional libraries identified? Do they need to have smarter users, or do they take some other steps to ensure the changed semantics are not harmful?
Daniel Earwicker
A: 

Wow overshift wasn't always ok?

it teachs me something!

anyone knows why?

Hernán Eche
Suppose you're designing a processor with a 32-bit left shift instruction. You need a circuit that shifts the 32-bit number left by 1 position, used if bit 0 of the shift amount is set, a circuit that shifts by 2 used if bit 1 of the amount is set; etc. Each circuit requires N more transistors and one more cycle running time. The useful values for the shift amount are 0 to 32, but 32 means that the result is 0 so who cares anyway. So most processors have a shift depth of 5 (bits 5–31 of the amount are ignored), meaning that they can shift by up to 31 bits.
Gilles
AFAIR on 80[1]86 shifting was in reality `x << (y % 16)` because only the 4 bit of register CX were used.
tristopia
No, the 8086 worked rather more simplistically. It shifted by one bit repeatedly for the number you gave it. It used all 8 bits of the shift count in register CL, as this did not require any additional circuitry compared to limiting it to 31. Of course, shifting by 255 would be rather slow.The 386 introduced the faster circuitry described by Gilles together with the limit of 31.
jilles
Well, it really doesn't mind how hardware is designed, C is a standard language, imagine multiply operator, there are microprocesors that can't multiply by instruction, but it's a task for the compiler to generate the correct code using repeated sums, just because that's what is intended, the reason for "undefined" behaviours isn't always a hardware difficulty, some things are undefined for a language are well defined for others. Doing 32 single-shifts gives a zero? if yes, within the posible the compiler could solve it, of course if it were defined by the language standard specification
Hernán Eche
@jilles I just found it and that was what I had remembered, in fact what I described was the way the 80186 worked (except with 5 bits not 4 as I said) and the 8086 worked like you said with an 8 bit counter. It was a distinct feature that was often used to distinguish between both, as can be seen in this example: http://www.assembly.happycodings.com/code18.html
tristopia
On the old Transputer, shifting right by 0 would hang the device for about a week. This was because the microcode implemented it as 'do value >>= 1; while (--shift);' --- notice the predecrement? It would then happily shift value by 4294967296 bits, with interrupts off... and then return 0.
David Given
Implmenting overshift (for logical shifts) correctly wouldn't even be that difficult in hardware. Just NOR together all but the last 5 bits of the shift amount, and AND this bit with each of the bits of the shift result. Then overshift will give 0.
dan04
@Gilles: I can understand why shifting left by more than the bit size would be undefined for any value other than zero, since overflow conditions are undefined, period. But why wouldn't compilers be expected to properly handle large right-shift values?
supercat
@supercat: overflow conditions are actually defined for unsigned integers. Compilers aren't expected to handle large shifts in either direction because 1: a lot of underlying hardware doesn't handle it (see previous comments), 2: C rarely expects the compiler to compensate for the hardware.
Gilles
@dan04: Sure, but your CPU would cost 0.1¢ more!
Gilles
Another reason why (overshift ok != overshift always ok) is cross compiling. Any implementation dependent arithmetic the compiler can resolve gets resolved at compile time using the host rules, while any other implementation dependent arithmetic gets resolved at runtime using the target rules.
Joshua
+3  A: 

Include a check for integer sizes. Most people assume that an int is bigger than a short is bigger than a char. However, these might all be false: sizeof(char) < sizeof(int); sizeof(short) < sizeof(int); sizeof(char) < sizeof(short)

This code might fail (crashes to unaligned access)

unsigned char buf[64];

int i = 234;
int *p = &buf[1];
*p = i;
i = *p;
nos
would this code fail in C++? IIRC, it is illegal to cast pointers between unrelated types, EXCEPT for char*, which can be cast to any type (or is it the other way around?).
rmeador
You could just do `int *p = (int*)` in c++, people expect that to work too.
nos
@nos, yeah that can fail but the fail is crash so his program can't test for that one. :(
Joshua
`sizeof(char) < sizeof(int)` is required. For example, fgetc() returns the value of the character as an unsigned char converted to int, or `EOF` which is a negative value. `unsigned char` may not have padding bits, so the only way this can be done is by making int larger than char. Also, (most versions of) the C spec require that any value from the range -32767..32767 can be stored in an int.
jilles
@illes still, there's DSPs with 32 bit chars and 32 bit ints.
nos
@jilles: there is no requirement that `char` is only 8 bits though. And fgetc's return type doesn't by itself prove anything.
jalf
@nos: I believe those DSP implementations are *freestanding* implementations. A hosted implementation, unlike a freestanding one, is required to have `fgetc`, and thus jilles' point about the impossibility of implementing `fgetc` when `sizeof(int)==1` is a valid point and it precludes the existence of any such hosted implementation.
R..
+16  A: 

A long time ago, I was teaching C from a textbook that had

printf("sizeof(int)=%d\n", sizeof(int));

as a sample question. It failed for a student, because sizeof yields values of type size_t, not int, int on this implementation was 16 bits and size_t was 32, and it was big-endian. (The platform was Lightspeed C on 680x0-based Macintoshes. I said it was a long time ago.)

David Thornley
+1 for pointing out one of the most common, and commonly-overlooked, errors of this sort.
R..
This also happens on 64-bit systems, where size_t is 64 bit and ints are almost always shorter. Win64 is still weirder, because size_t is an `unsigned long long` there. Added as Test 17.
Luther Blissett
+2  A: 

How about this one:

No data pointer can ever be the same as a valid function pointer.

This is TRUE for all flat models, MS-DOS TINY, LARGE, and HUGE models, false for MS-DOS SMALL model, and almost always false for MEDIUM and COMPACT models (depends on load address, you will need a really old DOS to make it true).

I can't write a test for this

And worse: pointers casted to ptrdiff_t may be compared. This not true for MS-DOS LARGE model (the only difference between LARGE and HUGE is HUGE adds compiler code to normalize pointers).

I can't write a test because the environment where this bombs hard won't allocate a buffer greater than 64K so the code that demonstrates it would crash on other platforms.

This particular test would pass on one now-defunct system (notice it depends on the internals of malloc):

  char *ptr1 = malloc(16);
  char *ptr2 = malloc(16);
  if ((ptrdiff_t)ptr2 - 0x20000 == (ptrdiff_t)ptr1)
      printf("We like to think that unrelated pointers are equality comparable when cast to the appropriate integer, but they're not.");
Joshua
+3  A: 

A couple of things about built-in data types:

  • char and signed char are actually two distinct types (unlike int and signed int which refer to the same signed integer type).
  • signed integers are not required to use two's complement. Ones's complement and sign+magnitude are also valid representations of negative numbers. This makes bit operations involving negative numbers implementation-defined.
  • If you assign an out-of-range integer to a signed integer variable, the behaviour is implementation-defined.
  • In C90, -3/5 could return 0 or -1. Rounding towards zero in case one operand was negative is only guaranteed in C99 upwards and C++0x upwards.
  • There are no exact size guarantees for the built-in types. The standard only covers minimal requirements such as an int has at least 16 bits, a long has at least 32 bits, a long long has at least 64 bits. A float can at least represent 6 most significant decimal digits correctly. A double can at least represent 10 most significant decimal digits correctly.
  • IEEE 754 is not mandatory for representing floating point numbers.

Admittedly, on most machines we'll have two's complement and IEEE 754 floats.

sellibitze
+1  A: 

gcc 3.3.2 on AIX 5.3 (yeah, we need to update gcc)

We like to think that:
..04 a char is signed
   but 'CHAR_MIN==SCHAR_MIN' is false.
..09a minus shifts backwards
   but '(t=-1,(15<<t)==7)' is false.
..13 The smallest bits come always first
   but '(t=0x1234,0x34==*(char*)&t)' is false.
..14 i++ is structly left to right
   but '(i=0,a[i++]=i,a[0]==1)' is false.
..15 structs are packed
   but 'sizeof(char_int)==(sizeof(char)+sizeof(int))' is false.
..16 malloc()=NULL means out of memory
   but '(malloc(0)!=NULL)' is false.
..19-3 int<long
   but 'sizeof(int)<sizeof(long)' is false.
..22 floating point is always IEEE
   but 'STDC_IEC_559_is_defined' is false.
From what I can say with my puny test cases, you are 71% mainstream
chauncey
+1  A: 

An assumption that some may do in C++ is that a struct is limited to what it can do in C. The fact is that, in C++, a struct is like a class except that it has everything public by default.

C++ struct:

struct Foo
{
  int number1_;  //this is public by default


//this is valid in C++:    
private: 
  void Testing1();
  int number2_;

protected:
  void Testing2();
};
Alerty
+1  A: 

Standard math functions on different systems don't give identical results.

arsenm
+1  A: 

Visual Studio Express 2010 on 32-bit x86.

Z:\sandbox>cl testtoy.c
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.30319.01 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.

testtoy.c
testtoy.c(54) : warning C4293: '<<' : shift count negative or too big, undefined
 behavior
Microsoft (R) Incremental Linker Version 10.00.30319.01
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:testtoy.exe
testtoy.obj

Z:\sandbox>testtoy.exe
We like to think that:
..08 overshifting is okay
   but '(1<<bits_per_int)==0' is false.
..09a minus shifts backwards
   but '(t=-1,(15<<t)==7)' is false.
..14 i++ is structly left to right
   but '(i=0,a[i++]=i,a[0]==1)' is false.
..15 structs are packed
   but 'sizeof(char_int)==(sizeof(char)+sizeof(int))' is false.
..19-3 int<long
   but 'sizeof(int)<sizeof(long)' is false.
..22 floating point is always IEEE
   but 'STDC_IEC_559_is_defined' is false.
From what I can say with my puny test cases, you are 78% mainstream
Paul Nathan
+1  A: 

Via Codepad.org (C++: g++ 4.1.2 flags: -O -std=c++98 -pedantic-errors -Wfatal-errors -Werror -Wall -Wextra -Wno-missing-field-initializers -Wwrite-strings -Wno-deprecated -Wno-unused -Wno-non-virtual-dtor -Wno-variadic-macros -fmessage-length=0 -ftemplate-depth-128 -fno-merge-constants -fno-nonansi-builtins -fno-gnu-keywords -fno-elide-constructors -fstrict-aliasing -fstack-protector-all -Winvalid-pch) .

Note that Codepad did not have stddef.h. I removed test 9 due to codepad using warnings as errors. I also renamed the count variable since it was already defined for some reason.

We like to think that:
..08 overshifting is okay
   but '(1<<bits_per_int)==0' is false.
..14 i++ is structly left to right
   but '(i=0,a[i++]=i,a[0]==1)' is false.
..15 structs are packed
   but 'sizeof(char_int)==(sizeof(char)+sizeof(int))' is false.
..19-3 int<long
   but 'sizeof(int)<sizeof(long)' is false.
From what I can say with my puny test cases, you are 84% mainstream
Brian
+4  A: 

Here's a fun one: What's wrong with this function?

float sum(unsigned int n, ...)
{
    float v = 0;
    va_list ap;
    va_start(ap, n);
    while (n--)
        v += va_arg(ap, float);
    va_end(ap);
    return v;
}

[Answer (rot13): Inevnqvp nethzragf borl gur byq X&E cebzbgvba ehyrf, juvpu zrnaf lbh pnaabg hfr 'sybng' (be 'pune' be 'fubeg') va in_net! Naq gur pbzcvyre vf erdhverq abg gb gerng guvf nf n pbzcvyr-gvzr reebe. (TPP qbrf rzvg n jneavat, gubhtu.)]

Zack
Oh, that's a good one. clang 2.7 eats this and produces complete nonsense without a warning.
Luther Blissett
va_arg expands if it's a macro and the while loop only executes the first statement, of perhaps many?
Maister
Nope (if that happened it would be a bug in the implementation).
Zack
A: 

Figured that one the hard way : casting between different types is implementation defined :

int i = 0xdefaced;
short a = 0;
i = a;
printf("%x\n",i);

On some architecture (including some old ARM procs), i == 0xdef0000, because the underlying processor doesn't have some "sign extent" operation., and only clears out the two least significant bytes. And this was when compiled with GCC.

BatchyX
I can't imagine a clause that would allow this. A compiler bug / instruction bug probably?
Luther Blissett
Assuming that the initial value of `i` fits in an `int`, the behavior down to the assignment is defined (C89 $3.2.1.1: "The integral promotions preserve value including sign.", so the assignment promotes `short` 0 to `int` 0 and assigns that to `int` 0). IIRC the behavior of `printf` is potentially undefined in C89 because `%x` expects an `unsigned int`, but a later DR (or maybe just C99) specifies that signed and unsigned types must have the same size and use the same representation for values that are common to both.
Gilles
+4  A: 
EXPECT("## pow() gives exact results for integer arguments", pow(2, 4) == 16);

Another one is about text mode in fopen. Most programmers assume that either text and binary are the same (Unix) or that text mode adds \r characters (Windows). But C has been ported to systems that use fixed-width records, on which fputc('\n', file) on a text file means to add spaces or something until the file size is a multiple of the record length.

And here are my results:

gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3 on x86-64

We like to think that:
..05 int has the size of pointers
   but 'sizeof(int)==sizeof(void*)' is false.
..08 overshifting is okay
   but '(1<<bits_per_int)==0' is false.
..09a minus shifts backwards
   but '(t=-1,(15<<t)==7)' is false.
..14 i++ is strictly left to right
   but '(i=0,a[i++]=i,a[0]==1)' is false.
..15 structs are packed
   but 'sizeof(char_int)==(sizeof(char)+sizeof(int))' is false.
..17 size_t is unsigned int
   but 'sizeof(size_t)==sizeof(unsigned int)' is false.
From what I can say with my puny test cases, you are 78% mainstream
dan04
pow??? Would not have expected to hear that.
trinithis
I've actually seen code that combined `pow(2, n)` with bit operations.
dan04
+1  A: 

You can use text-mode (fopen("filename", "r")) to read any sort of text file.

While this should in theory work just fine, if you also use ftell() in your code, and your text file has UNIX-style line-endings, in some versions of the Windows standard library, ftell() will often return invalid values. The solution is to use binary mode instead (fopen("filename", "rb")).

Chinmay Kanchi
+1  A: 

How about right-shifting by excessive amounts--is that allowed by the standard, or worth testing?

Does Standard C specify the behavior of the following program:

void print_string(char *st)
{
  char ch;
  while((ch = *st++) != 0)
    putch(ch);  /* Assume this is defined */
}
int main(void)
{
  print_string("Hello");
  return 0;
}

On at least one compiler I use, that code will fail unless the argument to print_string is a "char const *". Does the standard permit such a restriction?

Some systems allow one to produce pointers to unaligned 'int's and others don't. Might be worth testing.

supercat
@supercat: C89 §3.3.7: “If the valueof the right operand is negative or is greater than or equal to thewidth in bits of the promoted left operand, the behavior is undefined.” (applies to both `<<` and `>>`). C99 has identical language in §6.5.7-3.
Gilles
@supercat: Apart from `putch` (why didn't you use the standard `putchar`?), I can't see any undefined behavior in your program. C89 §3.1.4 specifies that “a character string literal has […] type‘array of char’” (note: no `const`), and that “if the program attempts to modify a string literal […], thebehavior is undefined”. What compiler is that, and how does it translate this program?
Gilles
In C++ character constants are *not* char[], they're const char[]. However... there *used* to be a specific hole in the type system to allow you to use a string constant in a context where a char* was expected and not get a type error. This led to situations where print_string("foo") would work but print_string("foo"+0) would not. This was deeply confusing, particular in environments where C files are compiled using a C++ compiler by default. The hole has been removed in new compilers but there are still plenty of old ones around. AFAIK C99 still defines string constants to be char[].
David Given
On the HiTech compilers for the Microchip PIC series of controllers, a pointer without a storage qualifier can only point to RAM. A const-qualified pointer may point to either RAM or ROM. Non-const-qualified pointers are dereferenced directly in the code; const-qualified pointers are dereferenced via library routine. Depending upon the particular type of PIC, non-const-qualified pointers are 1 or 2 bytes; const-qualified ones are 2 or 3. Since ROM is much more plentiful than RAM, having constants in ROM is generally a good thing.
supercat
@David Given: Note my previous comment too. I prefer compilers which use qualifiers other than "const" to denote hardware storage class; the HiTech compiler has some rather annoying quirks with its storage class allocation (e.g. data items whose "component size" is a byte, or data items which are over 256 bytes, go in a "big" segment. Other data items go in the "bss" segment for the module they're defined; all the "bss" items in a module must fit within 256 bytes. Arrays that are slightly short of 256 bytes can be a real nuisance.
supercat