views:

434

answers:

6

Hello,

I've been told that C types are machine dependent. Today I wanted to verify it.

void legacyTypes()
{
    /* character types */
    char k_char = 'a';

        //Signedness --> signed & unsigned
        signed char k_char_s = 'a';
        unsigned char k_char_u = 'a';

    /* integer types */
    int k_int = 1; /* Same as "signed int" */

        //Signedness --> signed & unsigned
        signed int k_int_s = -2;
        unsigned int k_int_u = 3;

        //Size --> short, _____,  long, long long
        short int k_s_int = 4;
        long int k_l_int = 5;
        long long int k_ll_int = 6;

    /* real number types */
        float k_float = 7;
        double k_double = 8;
}

I compiled it on a 32-Bit machine using minGW C compiler

_legacyTypes:
    pushl   %ebp
    movl    %esp, %ebp
    subl    $48, %esp
    movb    $97, -1(%ebp)  # char
    movb    $97, -2(%ebp)  # signed char
    movb    $97, -3(%ebp)  # unsigned char
    movl    $1, -8(%ebp)    # int
    movl    $-2, -12(%ebp)# signed int 
    movl    $3, -16(%ebp) # unsigned int
    movw    $4, -18(%ebp) # short int
    movl    $5, -24(%ebp) # long int
    movl    $6, -32(%ebp) # long long int
    movl    $0, -28(%ebp) 
    movl    $0x40e00000, %eax
    movl    %eax, -36(%ebp)
    fldl    LC2
    fstpl   -48(%ebp)
    leave
    ret

I compiled the same code on 64-Bit processor (Intel Core 2 Duo) on GCC (linux)

legacyTypes:
.LFB2:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    movq    %rsp, %rbp
    .cfi_offset 6, -16
    .cfi_def_cfa_register 6
    movb    $97, -1(%rbp) # char
    movb    $97, -2(%rbp) # signed char
    movb    $97, -3(%rbp) # unsigned char
    movl    $1, -12(%rbp) # int
    movl    $-2, -16(%rbp)# signed int 
    movl    $3, -20(%rbp) # unsigned int
    movw    $4, -6(%rbp)   # short int
    movq    $5, -32(%rbp) # long int
    movq    $6, -40(%rbp) # long long int
    movl    $0x40e00000, %eax
    movl    %eax, -24(%rbp)
    movabsq $4620693217682128896, %rax
    movq    %rax, -48(%rbp)
    leave
    ret

Observations

  • char, signed char, unsigned char, int, unsigned int, signed int, short int, unsigned short int, signed short int all occupy same no. of bytes on both 32-Bit & 64-Bit Processor.

  • The only change is in long int & long long int both of these occupy 32-bit on 32-bit machine & 64-bit on 64-bit machine.

  • And also the pointers, which take 32-bit on 32-bit CPU & 64-bit on 64-bit CPU.

Questions:

  • I cannot say, what the books say is wrong. But I'm missing something here. What exactly does "Variable types are machine dependent mean?"
  • As you can see, There is no difference between instructions for unsigned & signed numbers. Then how come the range of numbers that can be addressed using both is different?
  • I was reading http://stackoverflow.com/questions/2511246/how-to-maintain-fixed-size-of-c-variable-types-over-different-machines I didn't get the purpose of the question or their answers. What maintaining fixed size? They all are the same. I didn't understand how those answers are going to ensure the same size.

EDIT:

Isn't it impossible to provide same size over different machines? I mean, how can one maintain same pointer size on both 64-bit & 32-bit machine?

+6  A: 

There are a lot more platforms out there, and some of them are 16 or even 8 bit! On these, you would observe much bigger differences in the sizes of all the above types.

Signed and unsigned versions of the same basic type occupy the same number of bytes on any platform, however their range of numbers is different since for a signed number the same range of possible values is shared between the signed and unsigned realm.

E.g. a 16 bit signed int can have values from -32767 (or -32768 on many platforms) to 32767. An unsigned int of the same size is in the range 0 to 65535.

After this, hopefully you understand the point of the referred question better. Basically if you write a program assuming that e.g. your signed int variables will be able to hold the value 2*10^9 (2 billion), your program is not portable, because on some platforms (16 bits and below) this value will cause an overflow, resulting in silent and hard to find bugs. So e.g. on a 16 bit platform you need to #define your ints to be long in order to avoid overflow. This is a simple example, which may not work across all platforms, but I hope it gives you a basic idea.

The reason for all these differences between platforms is that by the time C got standardized, there was already many C compilers used on a plethora of different platforms, so for backward compatibility, all these varieties had to be accepted as valid.

Péter Török
+4  A: 

What exactly does "Variable types are machine dependent mean?"

It means exactly what it says: The sizes of most integral C types are machine-dependent (not really machine so much as architecture and compiler). When I was doing a lot of C in the early 90s, int was mostly 16 bits; now it's mostly 32 bits. Earlier than my C career, it may have been 8 bits. Etc.

Apparently the designers of the C compiler you're using for 64-bit compilation decided int should remain 32 bits. Designers of a different C compiler might make a different choice.

T.J. Crowder
Then what is the point of having <stdint.h> and all those uint32 , uint16 etc.. http://en.wikipedia.org/wiki/Stdint.h#Exact-width_integer_types_.28optional.29 Isn't it impossible to maintain fixed size on different machines?
claws
@claws: If `unit32` and such are defined, you can rely on them to be the size they say they are (otherwise the implementation is lying to you). But they're optional. `int` and such are intentionally defined to be implementation-dependent.
T.J. Crowder
claws
@Claws: As someone else commented to you earlier, stdint.h is *provided by the implementation*. The implementation knows the size of its intrinsic types (`int`, `long`, etc.), and so may define `uint32` as `int` (for instance) if it knows that `int` is 32 bits *in that implementation*. If another implementation has `int` as 16 bits but `long` as 32 bits, it'll alias `uint32` to `long` instead. You'd be right to be concerned if stdint.h weren't provided as part of the C implementation, but it is, and so it all plays together. So if you need exact sizes, you can safely use those (if provided).
T.J. Crowder
I got your point. But still user expects uint64 to be 64-bit and codes his application. But that is typedefed to long int which is 32-bit in *implementation* of 32-bit processor. Then when the code is run. Wouldn't that crash because its using 64 bit values everywhere. Who should the user blame now?
claws
I've edited my question. Can you kindly comment on the that. Is there any way by which I can have pointer size to be constant on different machines?
claws
@claws: You don't seem to understand. C compilers take C code and emit machine code for *a specific architecture*. The compiler comes with headers appropriate for what it's going to do. Machine code is not very portable; properly written C source code is *very* portable. So if your compiler provides `uint32` and you compile a program using that for a variable, the size in memory of that variable will be 32 bits on any machine that program will run on. If you need to run it on a different architecture, you have to compile it with a compiler that targets that architecture, which... (cont'd)
T.J. Crowder
@claws: (continuing ...which will provide its own stdint.h. So say Compiler A targets 32-bit platforms and uses a 32-bit `int` value. Its stdint.h may well just alias `uint32` to `int`. But say Compiler B targets 64-bit platforms and uses `int` as a 64-bit value. The stdint.h provided with Compiler B won't just alias `uint32` to `int` because that would be lying to you. It'll alias it to whatever type it will treat as a 32-bit value. If you use `uint32` in your program, whichever compiler you use, you'll get a 32-bit variable.
T.J. Crowder
+1  A: 

If you were to repeat your test on, say, a Motorola 68000 processor, you'd find you'd get different results (with a word being 16bit, and a long being 32 -- typically an int is a word)

Rowland Shaw
+5  A: 

Machine dependent is not quite exact. Actually, it's implementation-defined. It may depend on compiler, machine, compiler options etc.

For example, using Visual C++, long would be 32 bit even on 64 bit machines.

oefe
Then what is the point in having those types (typedefed) in <stdint.h> . There is no way we can maintain a fixed size.
claws
stdint.h is provided by the implementation. That means, the people supplying the compiler also supply you with a version of stdint.h where the right types are used in those typedefs.
Michael Madsen
Also, the exact-width types from stdint.h is **optional** for an implementation to provide. In practice, since the implementation knows what size its types are, it will provide those that it can, and ignore those it can't.
KTC
It should also be possible to define integer types in stdint that don't correspond to any of the normal c data types if a compiler supported such for a particular target, though they would need some __ prefixed real name so that the typedef could be done.
nategoose
+1  A: 

Real compilers don't usually take advantage of all the variation allowed by the standard. The requirements in the standard just give a minimum range for the type -- 8 bits for char, 16 bits for short and int, 32 bits for long, and (in C99) 64 bits for long long (and every type in that list must have at least as large a range as the preceding type).

For a real compiler, however, backward compatibility is almost always a major goal. That means they have a strong motivation to change as little as they can get away with. As a result, in practice, there's a great deal more commonality between compilers than the standard requires.

Jerry Coffin
+1  A: 

Here is something one another implementation -- quite different of what you are used to, but one which is still present on the Internet today even if it is no more used for general purpose computing excepted by retro-computing hobbyists -- None of the sizes are the same as yours :

@type sizes.c
#include <stdio.h>
#include <limits.h>

int main()
{
   printf("CHAR_BIT = %d\n", CHAR_BIT);
   printf("sizeof(char) = %d\n", sizeof(char));
   printf("sizeof(short) = %d\n", sizeof(short));
   printf("sizeof(int) = %d\n", sizeof(int));
   printf("sizeof(long) = %d\n", sizeof(long));
   printf("sizeof(float) = %d\n", sizeof(float));
   printf("sizeof(double) = %d\n", sizeof(double));
   return 0;
}
@run sizes.exe
CHAR_BIT = 9
sizeof(char) = 1
sizeof(short) = 2
sizeof(int) = 4
sizeof(long) = 4
sizeof(float) = 4
sizeof(double) = 8
AProgrammer
I don't know what is CHAR_BIT but other than `long` as I said all values match mine.
claws
It gives the number of bits in a byte; char is 9 bits, short 18, int 36, long 36, float 36 and double 72.
AProgrammer
claws
Read here for some computing history: http://en.wikipedia.org/wiki/Byte -- There is something about C as well.
Secure
The CPU is a (emulated, there are still some real one running but that is out of my budget) PDP-10, the compiler is KCC -- there was a gcc port, see http://pdp10.nocrew.org/
AProgrammer
You have to remember that C was developed in the early 1970s, and the IT ecosystem was a *lot* different then. There was a lot of variety between architectures. Bytes could be anywhere from 6 to 9 bits wide, and you could have word sizes that weren't multiples of the byte size (e.g., 8-bit bytes and 36-bit words).
John Bode