tags:

views:

196

answers:

4

C code targeting x64, as has been previously discussed, should always use size_t instead of int for things like counts and array indexes.

Given that, it would arguably be simpler and less error prone to just standardize on size_t (typedef'd to something shorter) instead of int as the usual integer type across the entire code base.

Is there anything I'm missing? Assuming you don't need signed integers, and you're not storing large arrays of small integers (where making them 32 bits instead of 64 bits could save memory), is there any reason to use int in preference to size_t?

A: 

Um, what? "Assuming you don't need signed integers"? Have I ever worked on a program that didn't??

egrunin
Heh, fair enough, different experiences I guess; signed integers could be deleted entirely from the language and I wouldn't miss them. Okay, let's say instead _except in cases where you need signed integers_.
rwallace
@rwa it really depends, consider int to be Jack of all trades.
aaa
@rwallace: Oddly I'm more in favour of scrapping unsigned integer types from all languages.
JUST MY correct OPINION
@JUST MY correct OPINION: Right, who needed well defined modular arithmetic anyway.
schot
Signed integers are just unsigned integers with unspecified implementation, undefined overflow behavior, and a fixed cutoff between negative and positive. If you use unsigned types, **you** can decide where the cutoff between negative and positive lies. One popular design is only using a few small values (-1, -2, maybe -3... or in the case of the Linux kernel, -1 through -4096) as negative, and treating everything else as positive.
R..
@R: But what about unbound negative values for mathematical computations (FFT springs to mind)? How can you accomodate those while still having a sane, well-defined overflow behaviour? (the implementation cannot be reasonably specified for obvious reasons)
Michael Foukarakis
A: 

As Eli says, int is usually (not always) the word size, i.e. the preferred unit for moving objects around memory and the CPU. Thus, even if you ignore memory usage, you may still get better performance.

Thus, I think it is quite reasonable to use long as the "regular" signed integral type, when you don't need a range bigger than +/- (2^15 - 1), or a particular width.

Matthew Flaschen
By normal usage of the term, the word size on x86 is 64 bits; but in practice int is always 32 bits for backward compatibility. In what sense would you say x86 has a word size of 32 bits? What operations are more efficient in 32 bits?
rwallace
@rwallace, that's certainly not true. Both ILP64 and SILP64 have both been implemented, and they both have 64-bit ints. I'm not qualified to speak in depth about x86 performance. But I don't think 32-bit values will have *worse* performance on x86_64. So the general (not intended for any specific architecture) rule I stated still applies.
Matthew Flaschen
Right, unqualified _always_ was the wrong term; what I mean is that int=32 bits is the mainstream, if you run e.g. Microsoft C++ under Windows or GCC under Linux, you'll get 32-bit int regardless of the CPU. And there are cases where 32 bits has worse performance on x64, e.g. because of needing to expand to 64 bits before using as an offset. The hard question seems to be whether 32 bits ever has better performance.
rwallace
@rwallace, @Matthew: If you one day you want to have integer types of 5 different width (8,...,128) with standard types, `int` will be fixed to be 32 bit, there is not much choice.
Jens Gustedt
A: 

Using size_t "for counts" and as a generic unsigned integer type is almost always a design error. size_t is only enough to hold the size of the largest continuous object supported by the platform. This immediately means that it can be fairly reasonably used as a count of bytes in an object or a count (or index) of elements in an array (since array is always a continuous object). But once we get rid of the continuity requirement, size_t no longer works. You can't meaningfully use size_t to count elements in a linked list, since in general case the range of size_t will not be sufficient.

Of course, using size_t for such purposes is also wrong conceptually. size_t implements the concept of object size, not the concept of object count. Using size_t for array indexing is only justified for abstract arrays. Using size_t for indexing concrete application-specific arrays is, well, weird.

I personally prefer using unsigned for counts and array indexing (unless I have a more specific type for that purpose) assuming that the range of the type is sufficient within the domain of my application.

AndreyT
`size_t` is much more appropriate than `unsigned` for counts too. Look at the arguments to `calloc`, `fread`, and `fwrite`. Sure there may be some *pathological* architectures where `size_t` is smaller than the actual maximal count of objects (e.g. in a linked list) because of memory segmentation or other such ugliness, but there are plenty of *real* architectures where `unsigned` is much smaller than the maximal count of objects - basically any 64-bit machine running any Unix, as well as Windows.
R..
@R..: The maximal count of the objects in an application-specific container is dictated by the application domain, not by the architecture. I explicitly stated that I would use `unsigned` only if its range is sufficient within my application domain. `size_t` is *never* appropriate for that purpose, regardless of its range, since the concept it implements is not the right one. It is like using a type intended to designate *temperature* for counting apples in a basket, just because its range happens to be "large enough".
AndreyT
@R..: What are references to `calloc` and `fwrite` doing here is not clear to me. All these functions work with *array element counts*, not with generic counts, and I already covered it in my answer.
AndreyT
@R..: As for "pathological architectures"... There's nothing pathological about them. What is really pathological is the inability of the current generation of programmers to think outside the flat memory model. In any case, if someone wanted to use an integer type that is "always enough" because of the total memory restriction, that would be `intptr_t` \ `uintptr_t`, not `size_t`. Using `uintptr_t` is still a conceptual error in application-domain-specific code, but at least it is not such a total disaster as `size_t` would be.
AndreyT
"size_t is only enough to hold the size of the largest continuous object supported by the platform.", in C this perversely is not necessarily true. Using `calloc` you can allocate objects larger than what `size_t` can represent, if your system supports it, I believe. The key point is that in C, objects don't in and of itself have any type, so the logic that `sizeof` must be applicable to their type does not apply here.
Johannes Schaub - litb
You can never allocate objects larger than `size_t` can represent, even with `calloc`. By definition, `size_t` can hold the size of the largest object possible in the implementation.
R..
Well, that's an interesting issue. I always though that `calloc` is supposed to recognize the potential `size_t` overflow on multiplication of size and count arguments and handle it gracefully as an allocation failure (return null) (some implementations are known to ignore overflow with disastrous results). But the standard doesn't seem to explicitly prohibit "larger than `size_t`" allocations through `calloc`.
AndreyT
@R.. where is that definition stated? `size_t` is defined to be the result type of the `sizeof` operator.
Johannes Schaub - litb
Look for the definition of `SIZE_MAX`.
R..
A: 

I would say in the contrary, I would prefer something where you fix the size of the integers, uint8_t ... uint64_t (and sometime soon unit128_t), and these would be the base types. So you will know what you get.

And other typedef like size_t then aliasing to these. You could then simply inspect the typedef for uintprt_t and deduce your address width, e.g.

And also, people need signed types for sure. But the relation could certainly clarified. Already now in the standard, signed types are sort of deduced from the unsigned types. This could be made explicit by forcing a prefix signed. But for sure the later wouldn't happen, people are too much emotionally attached to int :)

Jens Gustedt