views:

136

answers:

6

The size of an integer type (or any type) in units of char/bytes is easily computed as sizeof(type). A common idiom is to multiply by CHAR_BIT to find the number of bits occupied by the type, but on implementations with padding bits, this will not be equal to the width in value bits. Worse yet, code like:

x>>CHAR_BIT*sizeof(type)-1

may actually have undefined behavior if CHAR_BIT*sizeof(type) is greater than the actual width of type.

For simplicity, let's assume our types are unsigned. Then the width of type is ceil(log2((type)-1). Is there any way to compute this value as a constant expression?

+1  A: 

Yes, since for all practical purposes, the number of possible widths is limited:

#if ~0 == 0xFFFF
# define INT_WIDTH 16
#elif ~0 == 0xFFFFFFFF
# define INT_WIDTH 32
#else
# define INT_WIDTH 64
#endif
Aaron Digulla
Are you guaranteed the integer type used by the pre-processor is the same as the integer type of compiled code? I don't know if the standard says this is the case or not.
Paul
@Paul: Good point. My gut feeling is that you should be safe unless you start using flags to switch the int width (like "use 32bit ints when 16bit is the default").
Aaron Digulla
@Paul: see C99 6.10.1 §4: "For the purposes of this token conversion and evaluation, all signed integer types and all unsigned integer types act as if they have the same representation as, respectively, the types `intmax_t` and `uintmax_t` defined in the header `<stdint.h>`"; use the `_MAX` macros from `<limits.h>` instead
Christoph
Please define "for all practical purposes". http://en.wikipedia.org/wiki/Byte "Various implementations of C and C++ define a byte as 8, 9, 16, 32, or 36 bits."
Secure
@Aaron, for the reasons that Christoph cites, your preprocessor approach only will work to determine the width of `uintmax_t`, which in any case has a width of at least 64, so your approach doesn't lead far, unfortunately. And you can't use this for signed types in any case, since you can't know how exactly how a signed and unsigned type are related.
Jens Gustedt
@Secure: True but today, almost any desktop CPU uses 32 or 64 bits. And if you have something else, you will *know* (and won't need to guess the bit size).
Aaron Digulla
@Jens: Interesting. When I used C the last time, there were no 64bit types.
Aaron Digulla
@Aaron, depends on the standard your compiler implements. What I said was for the current one, C99. If you happen to have a compiler that has now fallen 11 years behind, well yes, there might be no 64bit types.
Jens Gustedt
@Aaron, for your reply @Secure: if you think you always know on which platform your code will be executed, you are fine. If you want to write portable code that might end up e.g on embedded devices, you better not take the bet.
Jens Gustedt
@Jens: My point exactly.
Aaron Digulla
A: 

Usually, size of int is known for a given compiler/platform. If you have macros that identify compiler/platform, then you can use them to conditionnally define INT_WIDTH.

You can take a look at <sys/types.h> and its dependents for examples.

mouviciel
+2  A: 

Compare the macros from <limits.h> against known max values for specific integer widths:

#include <limits.h>

#if UINT_MAX == 0xFFFF
#define INT_WIDTH 16
#elif UINT_MAX == 0xFFFFFF
#define INT_WIDTH 24
#elif ...
#else
#error "unsupported integer width"
#endif
Christoph
+1  A: 

First approach, if you know what standard type you have (so your type is no typedef) go with the {U}INT_MAX macros and check against the possible sizes.

If you don't have that, for unsigned types this is relatively easy conceptually. For your favorite type T, just do (T)-1 and do a monster test macro that checks against all possible values with ?:. Since these then are only compile time constant expressions, any decent compiler will optimize that out a leave you with just the value that you are interested in.

This wouldn't work in #if etc, because of the type cast, but this can't be avoided in a simple way.

For signed types this is more complicated. For types at least as wide as int you can hope to do a trick to promote to the corresponding unsigned type and get the width of that type then. But to know whether or not your signed type has just one value bit less or not, no I don't think that there is a generic expression to know that.

Edit: Just to illustrate this a bit, I give some extracts of what you can do to make this approach (for unsigned types) not generate too large expressions in P99 I have something like

#ifndef P99_HIGH2
# if P99_UINTMAX_WIDTH == 64
#  define P99_HIGH2(X)                                         \
((((X) & P00_B0) ? P00_S0 : 0u)                              \
 | (((X) & P00_B1) ? P00_S1 : 0u)                            \
 | (((X) & P00_B2) ? P00_S2 : 0u)                            \
 | (((X) & P00_B3) ? P00_S3 : 0u)                            \
 | (((X) & P00_B4) ? P00_S4 : 0u)                            \
 | (((X) & P00_B5) ? P00_S5 : 0u))
# endif
#endif
#ifndef P99_HIGH2
# if P99_UINTMAX_WIDTH <= 128
#  define P99_HIGH2(X)                                         \
((((X) & P00_B0) ? P00_S0 : 0u)                              \
 | (((X) & P00_B1) ? P00_S1 : 0u)                            \
 | (((X) & P00_B2) ? P00_S2 : 0u)                            \
 | (((X) & P00_B3) ? P00_S3 : 0u)                            \
 | (((X) & P00_B4) ? P00_S4 : 0u)                            \
 | (((X) & P00_B5) ? P00_S5 : 0u)                            \
 | (((X) & P00_B6) ? P00_S6 : 0u))
# endif
#endif

where the magic constants are defined with a sequence of #if at the beginning. There it is important to not to expose too large constants for compilers that can't handle them.

/* The preprocessor always computes with the precision of uintmax_t */
/* so for the preprocessor this is equivalent to UINTMAX_MAX       */
#define P00_UNSIGNED_MAX ~0u

#define P00_S0 0x01
#define P00_S1 0x02
#define P00_S2 0x04
#define P00_S3 0x08
#define P00_S4 0x10
#define P00_S5 0x20
#define P00_S6 0x40

/* This has to be such ugly #if/#else to ensure that the            */
/* preprocessor never sees a constant that is too large.            */
#ifndef P99_UINTMAX_MAX
# if P00_UNSIGNED_MAX == 0xFFFFFFFFFFFFFFFF
#  define P99_UINTMAX_WIDTH 64
#  define P99_UINTMAX_MAX 0xFFFFFFFFFFFFFFFFU
#  define P00_B0 0xAAAAAAAAAAAAAAAAU
#  define P00_B1 0xCCCCCCCCCCCCCCCCU
#  define P00_B2 0xF0F0F0F0F0F0F0F0U
#  define P00_B3 0xFF00FF00FF00FF00U
#  define P00_B4 0xFFFF0000FFFF0000U
#  define P00_B5 0xFFFFFFFF00000000U
#  define P00_B6 0x0U
# endif /* P00_UNSIGNED_MAX */
#endif /* P99_UINTMAX_MAX */
#ifndef P99_UINTMAX_MAX
# if P00_UNSIGNED_MAX == 0x1FFFFFFFFFFFFFFFF
#  define P99_UINTMAX_WIDTH 65
#  define P99_UINTMAX_MAX 0x1FFFFFFFFFFFFFFFFU
#  define P00_B0 0xAAAAAAAAAAAAAAAAU
#  define P00_B1 0xCCCCCCCCCCCCCCCCU
#  define P00_B2 0xF0F0F0F0F0F0F0F0U
#  define P00_B3 0xFF00FF00FF00FF00U
#  define P00_B4 0xFFFF0000FFFF0000U
#  define P00_B5 0xFFFFFFFF00000000U
#  define P00_B6 0x10000000000000000U
# endif /* P00_UNSIGNED_MAX */
#endif /* P99_UINTMAX_MAX */
.
.
.
Jens Gustedt
As far as I know, a signed type always has just one value bit less than the corresponding unsigned type. This is a consequence of the representations being compatible for positive values of the signed type, and of the restriction that only twos complement, ones complement, and sign/magnitude allowed as signed representations.
R..
@R., in all practical cases that I know of you are right, but if you think of just that condition (positive signed -> unsigned) it is perfectly possible that the signed type has even less value bits. The C99 standard explicitly allows that, it just requires that the signed type has no more value bits than the unsigned type. So my guess would be that there was at least one implementor in the committee that vetoed a cleaner solution, implying in turn that such a weird things existed somewhere.
Jens Gustedt
@Jens: Yes, the important part is "no more value bits". The signed type is even allowed to have the same number of value bits as the unsigned type, plus the sign bit. So unsigned could have 32 bits and signed 33 bits.
Secure
@Secure: yes, I think this was what I tried to say. (signed and unsigned with 31 value bits is more likely than 32, though ;-) The point I was trying to make is that you can't know the maximum values of the signed integer types without inspecting the `MAX` macros. There is no such expression that is guaranteed to work in all cases.
Jens Gustedt
A: 

You can calculate it at runtime with a simple loop, well-defined and without the danger of UB:

unsigned int u;
int c;

for (c=0, u=1; u; c++, u<<=1);

total_bits   = CHAR_BIT * sizeof(unsigned int);
value_bits   = c;
padding_bits = total_bits - value_bits;

The simplest way would be to check in your unit tests (you have them, right?) that value_bits is identical to your current INT_WIDTH definition.

If you really need to calculate it at compile time, I'd go with one of the given #if-#elif cascades, either testing UINT_MAX or your target system.

What do you need it for? Maybe YAGNI?

Secure
The example usage I had in mind was implementing a bit array that uses a type that's expected to be efficient, like `unsigned int` or `size_t`.
R..
I would use separate definitions here, i.e. `typedef unsigned int t_bitarray_int;` `#define BITARRAY_WIDTH INT_WIDTH`. This gives you the possibility for fine-tuning and to test different types. On some exotic system with the integer width of 36, it may be more efficient to only use 32 bits and let the compiler optimize the div and mod operations for the array indexing to simple bit shifts and bit masking.
Secure
A: 

A general observation is that if you rely on the width of a data type in your calculations, you should use the explicit width data types defined in <stdint.h> e.g. uint32_t.

Trying to count the bytes in the standard types is begging the question of what your 'portable' code would do in the event of an overflow.

Will
I disagree. A very reasonable thing to do when implementing a bit array, for instance, would be to assume that `unsigned` or `size_t` is an efficient size to work with, and then you'd want to know the number of bits each "word" stores.
R..