Is it important to declare a variable as unsigned if you know it should never be negative? Does it help prevent anything other than negative numbers being fed into a function that shouldn't have them?
It doesn't prevent people misusing your interface, but at least they should get a warning unless they add a C-style cast or static_cast
to make it go away (in which case you cannot help them further).
Yes, there is value in this as it properly expresses the semantics you wish.
It does two things:
1) It gives you double the range for your unsigned values values. When "signed" the highest bit is used as the sign bit (1 means negative, 0 for positive), when "unsigned" you can use that bit for data. E.g., a char
type goes from -128 to 127, an unsigned char
goes form 0 to 255
2) It affects how the >>
operator acts, specifically when right shifting negative values.
By using unsigned when signed values will not be needed, in addition to ensuring the datatype doesn't represent values below the desired lower bound, you increase the maximum upper bound. All the bit combinations that would otherwise be used to represent negative numbers, are used to represent a larger set of positive numbers.
This has value for the same reason that "const
correctness" has value. If you know that a particular value shouldn't change, declare it const
and let the compiler help you. If you know a variable should always be non-negative, then declare it as unsigned
and the compiler will help you catch inconsistencies.
(That, and you can express numbers twice as big if you use unsigned int
rather than int
in this context.)
One minor nicety is that it cuts down on the amount of array bounds checking testing that might be necessary... e.g. instead of having to write:
int idx = [...];
if ((idx >= 0)&&(idx < arrayLength)) printf("array value is %i\n", array[idx]);
you can just write:
unsigned int idx = [...];
if (idx < arrayLength) printf("array value is %i\n", array[idx]);
The other answers are good, but sometimes it can lead to confusion. Which is I believe why some languages have opted to not have unsigned
integer types.
For example, suppose you have a structure that looks like this to represent a screen object:
struct T {
int x;
int y;
unsigned int width;
unsigned int height;
};
The idea being that it is impossible to have a negative width. Well what data type do you use to store the right edge of the rectangle?
int right = r.x + r.width; // causes a warning on some compilers with certain flags
and certainly it still doesn't protect you from any integer overflows. So in this scenario, even though width
and height
cannot be conceptually be negative, there is no real gain in making them unsigned
except for requiring some casts to get rid of warnings regarding mixing signed and unsigned types. In the end, at least for cases like this it is better to just make them all int
s, after all, odds are you aren't have a window wide enough to need it to be unsigned
.
It also keeps you from having to cast to/from unsigned whatever when interacting with other interfaces. For example:
for (int i = 0; i < some_vector.size(); ++i)
That will generally annoy the hell out of anyone who needs to compile without warnings.
Declaring variables for semantically non-negative values as unsigned
is a good style and good programming practice.
However, keep in mind that it doesn't prevent you from making errors. If is perfectly legal to assign negative values to unsigned integers, with the value getting implicitly converted to unsigned form in accordance with the rules of unsigned arithmetic. Some compilers might issue warnings in such cases, some will not.
It is also worth noting that working with unsigned integers requires knowing some dedicated unsigned techniques. For example, a "classic" example that is often mentioned with relation to this issue is backward iteration
for (int i = 99; i >= 0; --i) {
/* whatever */
}
The above cycle looks natural with signed i
, but it cannot be directly converted to unsigned form, meaning that
for (unsigned i = 99; i >= 0; --i) {
/* whatever */
}
doesn't really do what it is intended to do (it is actually an endless cycle). The proper technique in this case is either
for (unsigned i = 100; i > 0; ) {
--i;
/* whatever */
}
or
for (unsigned i = 100; i-- > 0; ) {
/* whatever */
}
This is often used as an argument against unsigned types, i.e. allegedly the above unsigned versions of the cycle look "unnatural" and "unreadable". In reality though the issue we are dealing here is the generic issue of working near the left end of a closed-open range. This issue manifests itself in many different ways in C and C++ (like backward iteration over an array using the "sliding pointer" technique of backward iteration over a standard container using an iterator). I.e. regardless of how inelegant the above unsigned cycles might look to you, there's no way to avoid them entirely, even if you never use unsigned integer types. So, it is better to learn these techniques and include them into your set of established idioms.
It won't prevent negative numbers from being fed into a function; instead it will interpret them as large positive numbers. This may be moderately useful if you know an upper bound for error checking, but you need to do the error checking yourself. Some compilers will issue warnings, but if you're using unsigned types a lot there may be too many warnings to deal with easily. These warnings can be covered up with casts, but that's worse than sticking to signed types only.
I wouldn't use an unsigned type if I knew the variable shouldn't be negative, but rather if it couldn't be. size_t
is an unsigned type, for example, since a data type simply can't have negative size. If a value could conceivably be negative but shouldn't be, it's easier to express that by having it as a signed type and using something like i < 0
or i >= 0
(these conditions come out as false
and true
respectively if i
is an unsigned type, regardless of its value).
If you're concerned about strict Standard conformance, it may be useful to know that overflows in unsigned arithmetic are fully defined, while in signed arithmetic they're undefined behavior.
A counter argument to using unsigned
is that you may find yourself in very reasonable situations where it gets awkward and unintentional bugs are introduced. Consider a class—for example a list class or some such—with the following method:
unsigned int length() { ... }
Seems very reasonable. But then when you want to iterate over it, you get the following:
for (unsigned int i = my_class.length(); i >= 0; --i) { ... }
Your loop won't terminate and now you're forced to cast or do some other awkwardness.
An alternative to using unsigned
is just to assert
that your values are non-negative.
Is it important to declare a variable as unsigned if you know it should never be negative?
Certainly it is not important. Some people (Stroustrup and Scott Meyers, see "Unsigned vs signed - Is Bjarne mistaken?") reject the idea that a variable should be unsigned just because it represents an unsigned quantity. If the point of using unsigned
would be to indicate that a variable can only store non-negative values, you need to somehow check that. Otherwise, all you get is
- A type that silently hides errors because it doesn't let negative values to expose
- Double of the positive range of the corresponding signed type
- Defined overflow/bit-shift/etc semantics
Certainly it doesn't prevent people from supplying negative values to your function, and the compiler won't be able to warn you about any such cases (think about a negative runtime based int-value being passed). Why not assert in the function instead?
assert((idx >= 0) && "Index must be greater/equal than 0!");
The unsigned type introduces many pitfalls too. You have to be careful when you use it in calculations that can temporary be less than zero (down counting loop, or something) and especially the automatic promotions that happen in the C and C++ languages among unsigned and signed values
// assume idx is unsigned. What if idx is 0 !?
if(idx - 1 > 3) /* do something */;
Mixing signed and unsigned types can be a major headache. The resulting code will often be bloated, wrong, or both(*). In many cases, unless you need to store values between 2,147,483,648 and 4,294,967,295 within a 32-bit variable, or you need to work with values larger than 9,223,372,036,854,775,807, I'd recommend not bothering with unsigned types at all.
(*)What should happen, for example, if a programmer does:
{ Question would be applicable to C, Pascal, Basic, or any other language } If SignedVar + UnsignedVar > OtherSignedVar Then DoSomething;
I believe Borland's old Pascal would handle the above scenario by converting SignedVar and UnsignedVar to a larger signed type (the largest supported type, btw, was signed, so every unsigned type could be converted to a larger signed one). This would produce big code, but it would be correct. In C, if one signed variable is negative the result is likely to be numerically wrong even if UnsignedVar holds zero. Many other bad scenarios exist as well.
There are two main things using unsigned
gives you
it allows you to use the right shift
>>
operator safely on any value, as it can't be negative -- using a right shift on a negative value is undefined.it gives you wrap-around mod 2^n arithmetic. With signed values, the effect of underflow/overflow is undefined. With unsigned values you get mod 2^n arithmetic, so
0U - 1U
will always give you the largest possible unsigned value (which will always be 1 less than a power of 2).
As an aside, I seldom use int
or unsigned int
, rather I use either {int16_t
, int32_t
, ... } or {uint16_t
, uint32_t
, ...}. (You have to include stdint.h
to use them though.) I am not sure how my colleagues find it, but I try to convey the size of the variable this way. In some places, I try to be more blatant by doing some thing like: typedef uint32_t Counter32;