Or to reformulate the question: is there a performance penalty in using unsigned values?
And in general: what is the most performant type (16bit signed?, 32bit signed? etc.) on the IPhone ARM processor?
Or to reformulate the question: is there a performance penalty in using unsigned values?
And in general: what is the most performant type (16bit signed?, 32bit signed? etc.) on the IPhone ARM processor?
The C99 standard allows you to answer your general question; the fastest type on the target system which is above a particular, required width is defined in stdint.h
. Imagine that I need at least an 8-bit-wide integer:
#include <stdio.h>
#include <stdint.h>
int main (int argc, char **argv)
{
uint_fast8_t i;
printf("Width of uint_fast8_t is %d\n", sizeof(i));
return 0;
}
Regarding the use of signed or unsigned, there are other requirements than performance, such as whether you actually need to use unsigned types or what you want to happen in the case of an overflow. Given what I know about my own code, I'm willing to bet that there are other slow-downs in your code beyond the choice of primitive integer types ;-).
It always depends:
For loops signed integers as counters and limits are a tad faster because in C the compiler is free to assume that overflow never happends.
Consider this: You have a loop with an unsigned loop counter like this:
void function (unsigned int first, unsigned int last)
{
unsigned int i;
for (i=first; i!=last; i++)
{
// do something here...
}
}
In this loop the compiler must make sure that the loop even terminates if first is greater than last because I will wrap from UINT_MAX to 0 on overflow (just to name one example - there are other cases as well). This removes the opportunity for some loop optimizations. With signed loop counters the compiler assumes that wrap-around does not occur and may generate better code.
For integer division otoh unsigned integers are a tad faster on the ARM. The ARM does not has a hardware divide unit, so division is done in software, and is always done on unsigned values. You'll save some cycles for the extra-code required to turn a signed division into an unsigned division.
For all other things like arithmetic, logic, load and write to memory the choice of sign-ness will not make any difference.
Regarding the data-size: As Rune pointed out they are more or less of equal speed with 32 bit types beeing the fastest. Bytes and words sometimes need to be adjusted after processing as they reside in a 32 bit register and the upper (unused) bits need to be sign or zero extended.
However, the ARM CPU's have a relative small data-cache and are often connected to relative slow memory. If you're able to utilize the cache more efficient by choosing smaller data-types the code may executes faster even if the theoretical cycle-count goes up.
You have to experiment here.
I'm curious at Nils answer so these questions are directed at him. This is not an answer to the original question.
In this loop the compiler must make sure that the loop even terminates if first is greater than last because I will wrap from UINT_MAX to 0 on overflow
for (i=first; i!=last; i++)
{
// do something here...
}
I don't think it does. The compiler only needs to check that i!=last
at the beginning of each loop iteration:
i=first;
if (i == last) goto END;
START:
// do sth
++i;
if (i != last) goto START;
END:
Signess of the variables won't change the code so the example is wrong in my opinion. I even compiled the code with msvc08/release and compared the assembler results - basically the same (except for the jump types) in all signed/unsiged and !=/< combinations.
Now, I do agree that compiler could optimize the code in some cases, but I can't think of any good examples - if anyone can, respond.
I can think of only a "bad" example:
signed i, j, k;
if (i > k)
{
i += j;
if (i > k)
{
}
}
The i+= j
could overflow, but signed overflow is undefined in C, so anything goes. Two things might happen:
As I said, I'm sure there are legitimate optimizations possible as Nils points out, but the posted loop isn't among them as far as I can tell.
As for the original question:
ARM is a 32-bit architecture, so 32-bit integers are the fastest. However, 16-bit integers and 8-bit integers are only slightly slower. Signed vs unsigned doesn't matter much except in special circumstances (as noted by the other answers here). 64-bit integers will be emulated by two or more 32-bit operations are therefore slower.
When it comes to floating-point types, in the iPhone processor (ARM11 with VFP hardware floating point), 32-bit floats are somewhat faster than 64-bit doubles.
Since unsigned and signed int have the same size and basically the same performance, worrying about any possible optimization of this kind (if it were possibly, and it is not) at this stage is evil premature optimization (search for it on Google to learn more), even on an iPhone. Arguments about correctness and economy of thought come first, unless this is your topmost execution hotspot and you have measured an actual significant performance difference. Otherwise, this one is just a waste of time you could have spent for a 2x speedup by other means.
compiler could assume the signed overflows to INT_MIN
That would be strange, since many program rely on 2's complement being used (I didn't know overflow was undefined behaviour, but that doesn't surprise me since 2's complement is not universal). Also, with unsigned vars the same would happen, wouldn't it? At least where two's complement is a safe assumption, the behaviour would be the same. Or is the behaviour of signed and unsigned overflow allowed to be different??