we know it depends on machine or compiler..but why and how?
The size of a data type depends on many things, including the language, the machine architecture etc.
In the case of languages with architecture-dependent data types, such as C, the variation is a result of the word size (register size) of the machine. For example, in C/C++, the size of a long
integer is 4 bytes (32-bits) on a 32-bit machine, and 8 bytes (64-bit) on a 64-bit machine.
It depends on the language you're refering to. Most of the time, it's about the size of the registers on that machine, whether they are 32-bit long or 64-bit long, or something different. Values of any data type must be stored in there registers while being manipulated at a low level, so it's best to take advantage of the machine's native register length when doing computations.
There are languages where some data types are guaranteed to be of a certain specified size, such as the int8, int16, int32, int64 types in Matlab and their .NET platform equivalents in terms of size.
For starters, the machine's architecture, say 16 bit or 64 bits, will determine how wide the address bus is, and that effectively determines how much memory you can access without resorting to tricks like virtual memory.
Usually, the hardware registers inside the CPU will have the same bit width as most of the rest of the architecture; so on a 32 bit CPU it will usually be most efficient to load and store data in 32 bit chunks. In some environments, it's even necessary that all your data be aligned on 32 bit boundaries, i.e. you can't grab data from a memory address whose byte address is not divisible by 4.
It's possible to circumvent all these restrictions in software, but for the most efficient programs, you will want your compiler's data types to be closely related to the system hardware's.
Because not all machines have 32-bit registers. Some have 64, others have 8. The compiler is allowed to pick a size that is better for the target processor.
They are often independent of architecture
MS .net byte
and SQL tinyint
are always 8 bit unsigned etc.
It depends on the language as well.
Some languages do define specific data type sizes. .NET, for example, specifies than int must be 32 bits wide, a short must be 16, and a long must be 64.
C and C++ take the opposite approach. They simply guarantee that an int must be at least as big as a short, and a long must be at least as big as an int. And past that, the compiler is free to pick the sizes that are most efficient on the target platform.
Different CPU's deal with different data sizes more or less efficiently. Old (or small, embedded) CPU's may not have 32-bit wide registers, so adding two 32-bit values might require splitting the data into multiple 16-bit registers, and perform several additions -- which takes more time.
Others have 64-bit registers, and can perform a 64-bit addition as quickly as a 32 one. Some may not have hardware support for 16-bit addition at all, and then extra instructions may be introduced to mask out the upper bits of the data, which again becomes slow.
So C and C++ are simply designed to pick data types that the CPU can work efficiently with. If I declare a variable of type int
in C++, I don't know exactly how big it's going to be, but I do know that it's going to be of a size that the CPU can process efficiently.