views:

473

answers:

14

Is it important to declare a variable as unsigned if you know it should never be negative? Does it help prevent anything other than negative numbers being fed into a function that shouldn't have them?

+5  A: 

It doesn't prevent people misusing your interface, but at least they should get a warning unless they add a C-style cast or static_cast to make it go away (in which case you cannot help them further).

Yes, there is value in this as it properly expresses the semantics you wish.

Steve Townsend
You don't need C-style casts to cast (away) unsignedness. It's a simple static cast.
Noah Roberts
noted, thanks...
Steve Townsend
+4  A: 

It does two things:

1) It gives you double the range for your unsigned values values. When "signed" the highest bit is used as the sign bit (1 means negative, 0 for positive), when "unsigned" you can use that bit for data. E.g., a char type goes from -128 to 127, an unsigned char goes form 0 to 255

2) It affects how the >> operator acts, specifically when right shifting negative values.

miked
It's not really a sign bit. 1 for negative, but 0 for non-negative.
csj
@csj: What value would you assign a "real" sign bit for zero? The statement "0 for positive" is correct; "0 means positive" is not what he said.
Potatoswatter
Complying C++ platforms do not need to use two's complement. In practice, using `unsigned` does double the range of values available.
strager
@Potatoswatter I was being needlessly pedantic. When the "sign bit" is zero, the integer itself might be zero, which is neither positive or negative.
csj
+1  A: 

By using unsigned when signed values will not be needed, in addition to ensuring the datatype doesn't represent values below the desired lower bound, you increase the maximum upper bound. All the bit combinations that would otherwise be used to represent negative numbers, are used to represent a larger set of positive numbers.

csj
+2  A: 

This has value for the same reason that "const correctness" has value. If you know that a particular value shouldn't change, declare it const and let the compiler help you. If you know a variable should always be non-negative, then declare it as unsigned and the compiler will help you catch inconsistencies.

(That, and you can express numbers twice as big if you use unsigned int rather than int in this context.)

John at CashCommons
+3  A: 

One minor nicety is that it cuts down on the amount of array bounds checking testing that might be necessary... e.g. instead of having to write:

int idx = [...];
if ((idx >= 0)&&(idx < arrayLength)) printf("array value is %i\n", array[idx]);

you can just write:

unsigned int idx = [...];
if (idx < arrayLength) printf("array value is %i\n", array[idx]);
Jeremy Friesner
If `[...]` returns a negative value, you will catch the error in the first case. However in the second case, you will not catch the error, but you will work with another "random" positive index that resulted from the unsigned-wrap around behavior. That's much worse.
Johannes Schaub - litb
@Johannes: I don't see that. If idx was negative, it will be changed into a very large unsigned number (over two billion for 32-bit `int`s). Assuming `arrayLength` is a vaguely reasonable number, the second case will catch the error.
David Thornley
@David Well it's a principal problem of that thinking, not necessarily a problem in this particular case. What if instead of `idx` you have `len` and there is no upper limit? Applying the principal applied in this answer means that you don't do any check anymore, and then you operate on that large length. No good.
Johannes Schaub - litb
+2  A: 

The other answers are good, but sometimes it can lead to confusion. Which is I believe why some languages have opted to not have unsigned integer types.

For example, suppose you have a structure that looks like this to represent a screen object:

struct T {
    int x;
    int y;
    unsigned int width;
    unsigned int height;
};

The idea being that it is impossible to have a negative width. Well what data type do you use to store the right edge of the rectangle?

int right = r.x + r.width; // causes a warning on some compilers with certain flags

and certainly it still doesn't protect you from any integer overflows. So in this scenario, even though width and height cannot be conceptually be negative, there is no real gain in making them unsigned except for requiring some casts to get rid of warnings regarding mixing signed and unsigned types. In the end, at least for cases like this it is better to just make them all ints, after all, odds are you aren't have a window wide enough to need it to be unsigned.

Evan Teran
This rect example is nice. Consider: In the above case, if `r.x` is negative then `r.x + r.width` will yield a totally weird result: `-5 + 4u` for example yields `UINT_MAX`.
Johannes Schaub - litb
Sometimes I wishes languages would define unsigned 7, 15, 31, etc.-bit integer types; such types would be processed as unsigned, but operations between such types and larger (even if only by one bit) signed types would be signed. On some processors, certain operations may be faster with signed or unsigned values. For example, if a 32-bit processor always sign-extends 16-bit values when loading, an "unsigned 15-bit" value could be loaded in one instruction; an "unsigned 16-bit" value would require two. A compiler could use the shorter code with unsigned 15-bit type.
supercat
It's a good example of a case where unsigned doesn't make sense, but removing unsigned values from a language completely -- a la Java -- merely causes other problems. I have several times now encountered subtle and hard-to-identify bugs in Java programs where someone wrote some binary I/O routines that failed in bizarre ways when they tried to work with bytes greater than 0x7F.
Porculus
+1  A: 

It also keeps you from having to cast to/from unsigned whatever when interacting with other interfaces. For example:

for (int i = 0; i < some_vector.size(); ++i)

That will generally annoy the hell out of anyone who needs to compile without warnings.

Noah Roberts
+15  A: 

Declaring variables for semantically non-negative values as unsigned is a good style and good programming practice.

However, keep in mind that it doesn't prevent you from making errors. If is perfectly legal to assign negative values to unsigned integers, with the value getting implicitly converted to unsigned form in accordance with the rules of unsigned arithmetic. Some compilers might issue warnings in such cases, some will not.

It is also worth noting that working with unsigned integers requires knowing some dedicated unsigned techniques. For example, a "classic" example that is often mentioned with relation to this issue is backward iteration

for (int i = 99; i >= 0; --i) {
  /* whatever */
}

The above cycle looks natural with signed i, but it cannot be directly converted to unsigned form, meaning that

for (unsigned i = 99; i >= 0; --i) {
  /* whatever */
}

doesn't really do what it is intended to do (it is actually an endless cycle). The proper technique in this case is either

for (unsigned i = 100; i > 0; ) {
  --i;
  /* whatever */
}

or

for (unsigned i = 100; i-- > 0; ) {
  /* whatever */
}

This is often used as an argument against unsigned types, i.e. allegedly the above unsigned versions of the cycle look "unnatural" and "unreadable". In reality though the issue we are dealing here is the generic issue of working near the left end of a closed-open range. This issue manifests itself in many different ways in C and C++ (like backward iteration over an array using the "sliding pointer" technique of backward iteration over a standard container using an iterator). I.e. regardless of how inelegant the above unsigned cycles might look to you, there's no way to avoid them entirely, even if you never use unsigned integer types. So, it is better to learn these techniques and include them into your set of established idioms.

AndreyT
I use unsigned exclusively for bit manipulation and in rarest cases for the biggger range. I know how to code with unsigneds but I do not expect all my coworkers to get it right every time. PS: The proper technique uses >0 instead of >=0 ;-).
Peter G.
@Peter G.: Fixed. Thanks for pointing it out.
AndreyT
@AndreyT: actually the reverse iteration is pretty well handled by the standard library I think. Anyway, I don't loop over integers often myself...
Matthieu M.
`size_t` for example in your signature is an example that you're expecting/returning a size of something. I suspect the examples of the 'loop in decreasing order' you've given here are one of the reasons why there aren't any unsigned types in Java.
Andre Holzner
There's also the infamous "goes-to operator," ` i --> 0`.
Potatoswatter
@Potatoswatter: Actually, that's what I was supposed to use in my last example. Fixed.
AndreyT
How about `for (int i=(int)v.size()-1; i>=0; --i)`, where v is a vector or something. Is it bad style?
Inverse
@Inverse: C-style casts are always bad style.
Potatoswatter
@Inverse: There's no real need for a cast there. Otherwise, it is more of a "loser's way out" than "bad style". The proper type for indexing a `std::vector<>` is `std::vector<>::size_type`. And it happens to be unsigned.
AndreyT
@AndreyT if you don't cast then `v.size()-1` for `v.size() == 0` will be `UINT_MAX` (or something similarly high) and then your loop is all-screwed :(. I regard it as a bad design that `std::vector<>::size_type` is unsigned and wouldn't trust C++ Standard Library on the matter of design - I just have to look out to `vector<bool>`, `foo_facet` and all the other "infamous" things. :) This is one of the cases where I believe that Java made good design principles.
Johannes Schaub - litb
Oh, OK, somehow I missed the `- 1` bit,
AndreyT
@AndreyT so I wonder why it is a good style and good programming practice. What will you gain? This answer shows some excellent techniques on how to work with `unsigned`, but really doesn't bring up arguments for using `unsigned` in the first place. Your answer could be so much better if it contained a list with the benefits of unsigned types.
Johannes Schaub - litb
@Johannes Schaub: I don't see any obvious performance or "safety"-related benefits (actually I do see some performance benefits, but I don't think they are critical in any way). I simply believe that conceptually unsigned types better express the intent of the code designer. Unsigned types should be used to represent naturally non-negative quantities. They convey this information to those who will read the code later: this quantity is never negative.
AndreyT
@AndreyT this guy has a nice summary which coincides with my opinion: http://groups.google.com/group/comp.lang.c++.moderated/msg/5bce424269082624 , in particular the natural/modulo behavior bit.
Johannes Schaub - litb
+1  A: 

It won't prevent negative numbers from being fed into a function; instead it will interpret them as large positive numbers. This may be moderately useful if you know an upper bound for error checking, but you need to do the error checking yourself. Some compilers will issue warnings, but if you're using unsigned types a lot there may be too many warnings to deal with easily. These warnings can be covered up with casts, but that's worse than sticking to signed types only.

I wouldn't use an unsigned type if I knew the variable shouldn't be negative, but rather if it couldn't be. size_t is an unsigned type, for example, since a data type simply can't have negative size. If a value could conceivably be negative but shouldn't be, it's easier to express that by having it as a signed type and using something like i < 0 or i >= 0 (these conditions come out as false and true respectively if i is an unsigned type, regardless of its value).

If you're concerned about strict Standard conformance, it may be useful to know that overflows in unsigned arithmetic are fully defined, while in signed arithmetic they're undefined behavior.

David Thornley
A: 

A counter argument to using unsigned is that you may find yourself in very reasonable situations where it gets awkward and unintentional bugs are introduced. Consider a class—for example a list class or some such—with the following method:

unsigned int length() { ... }

Seems very reasonable. But then when you want to iterate over it, you get the following:

for (unsigned int i = my_class.length(); i >= 0; --i) { ... }

Your loop won't terminate and now you're forced to cast or do some other awkwardness.

An alternative to using unsigned is just to assert that your values are non-negative.

Reference.

jeffamaphone
+2  A: 

Is it important to declare a variable as unsigned if you know it should never be negative?

Certainly it is not important. Some people (Stroustrup and Scott Meyers, see "Unsigned vs signed - Is Bjarne mistaken?") reject the idea that a variable should be unsigned just because it represents an unsigned quantity. If the point of using unsigned would be to indicate that a variable can only store non-negative values, you need to somehow check that. Otherwise, all you get is

  • A type that silently hides errors because it doesn't let negative values to expose
  • Double of the positive range of the corresponding signed type
  • Defined overflow/bit-shift/etc semantics

Certainly it doesn't prevent people from supplying negative values to your function, and the compiler won't be able to warn you about any such cases (think about a negative runtime based int-value being passed). Why not assert in the function instead?

assert((idx >= 0) && "Index must be greater/equal than 0!");

The unsigned type introduces many pitfalls too. You have to be careful when you use it in calculations that can temporary be less than zero (down counting loop, or something) and especially the automatic promotions that happen in the C and C++ languages among unsigned and signed values

// assume idx is unsigned. What if idx is 0 !?
if(idx - 1 > 3) /* do something */;
Johannes Schaub - litb
“Why not assert in the function” – because compile-time is better than run-time and cutting down on asserts is a good thing. As you have pointed out, using `unsigned` doesn’t make the `assert` go away but it makes the condition simpler: `x < max` instead of `0 <= x and x < max` (or, alternatively, one assert in place of two). As far as I’m concerned, that’s a really powerful argument for `unsigned`.
Konrad Rudolph
Many if not most compilers are able to warn on signed-to-unsigned conversions. This once saved my butt.
Potatoswatter
@Konrad My point is that using `unsigned` will make the function unable to catch the error because the parameter in the function is always positive by definition. The compiler can't warn you in all cases and sometimes just casting to `unsigned` on the call-side just to get rid of a warning will not fix any bug - instead a negative value will silently wrap around.
Johannes Schaub - litb
+1  A: 

Mixing signed and unsigned types can be a major headache. The resulting code will often be bloated, wrong, or both(*). In many cases, unless you need to store values between 2,147,483,648 and 4,294,967,295 within a 32-bit variable, or you need to work with values larger than 9,223,372,036,854,775,807, I'd recommend not bothering with unsigned types at all.

(*)What should happen, for example, if a programmer does:

{ Question would be applicable to C, Pascal, Basic, or any other language }
  If SignedVar + UnsignedVar > OtherSignedVar Then DoSomething;

I believe Borland's old Pascal would handle the above scenario by converting SignedVar and UnsignedVar to a larger signed type (the largest supported type, btw, was signed, so every unsigned type could be converted to a larger signed one). This would produce big code, but it would be correct. In C, if one signed variable is negative the result is likely to be numerically wrong even if UnsignedVar holds zero. Many other bad scenarios exist as well.

supercat
I have never encountered code bloat (or headaches, even minor ones) related to unsigned values. It makes code more explicit, certainly. But that’s *good*.
Konrad Rudolph
@Konrad Rudolph: If all variables are unsigned, things work predictably. But for what values of variables will the above condition execute? The code isn't exactly complicated, but in many languages its exact behavior is.
supercat
@supercat: easy, it will warn (or, with my compiler settings: fail to compile) because of signed/unsigned comparison. That’s the behaviour I expected and the one I got.
Konrad Rudolph
>unless you need to store values between 2,147,483,648 and 4,294,967,295 within a 32-bit variable... Those of use working with embedded microcontrollers use 8 and 16-bit variables all the time, where the doubling of the values available with unsigned variables can sometimes be a big deal.
tcrosley
The one unambiguously useful unsigned type is `unsigned char`. I have never once wanted a char to be signed.
Porculus
tcrosley: I work with embedded controllers. As noted in one of my other comments, I've sometimes wished for 7, 15, and 31-bit unsigned types since a compiler could optimize for those. For example, the PIC-18 series has an "FSRn+W" addressing mode which uses the sign-extended value in the accumulator as a displacement. If the index were a 7-bit type, a compiler could use that addressing mode (a coder could specify a signed char, but that would look icky).
supercat
+1  A: 

There are two main things using unsigned gives you

  • it allows you to use the right shift >> operator safely on any value, as it can't be negative -- using a right shift on a negative value is undefined.

  • it gives you wrap-around mod 2^n arithmetic. With signed values, the effect of underflow/overflow is undefined. With unsigned values you get mod 2^n arithmetic, so 0U - 1U will always give you the largest possible unsigned value (which will always be 1 less than a power of 2).

Chris Dodd
A: 

As an aside, I seldom use int or unsigned int, rather I use either {int16_t, int32_t, ... } or {uint16_t, uint32_t, ...}. (You have to include stdint.h to use them though.) I am not sure how my colleagues find it, but I try to convey the size of the variable this way. In some places, I try to be more blatant by doing some thing like: typedef uint32_t Counter32;

ArunSaha