views:

155

answers:

4

Most code I have ever read uses a int for standard error handling (return values from functions and such). But I am wondering if there is any benefit to be had from using a uint_8 will a compiler -- read: most C compilers on most architectures -- produce instructions using the immediate address mode -- i.e., embed the 1-byte integer into the instruction ? The key instruction I'm thinking about is the compare after a function, using uint_8 as its return type, returns.

I could be thinking about things incorrectly, as introducing a 1 byte type just causes alignment issues -- there is probably a perfectly sane reason why compiles like to pack things in 4-bytes and this is possibly the reason everyone just uses ints -- and since this is stack related issue rather than the heap there is no real overhead.

Doing the right thing is what I'm thinking about. But lets say say for the sake of argument this is a popular cheap microprocessor for a intelligent watch and that it is configured with 1k of memory but does have different addressing modes in its instruction set :D

Another question to slightly specialize the discussion (x86) would be: is the literal in:

uint_32 x=func(); x==1;

and

uint_8 x=func(); x==1;

the same type ? or will the compiler generate a 8-byte literal in the second case. If so it may use it to generate a compare instruction which has the literal as an immediate value and the returned int as a register reference. See CMP instruction types..

Another Refference for the x86 Instruction Set.

+2  A: 

Processors typically likes to work with their natural register sizes, which in C is 'int'.

Although there are exceptions, you're thinking too much on a problem that does not exist.

nos
indeed.I am, but if it could use immediate addressing it would not tie up a register, no ?
Hassan Syed
@Vainstah: That's completely purposeless: registers are faster, the call/cmp/jz sequence can't be optimized further, and if someone else needed that register you'd have a context switch..
Michael Foukarakis
There are two elements in the instruction, the hard-coded compare literal and the returned value. The return value cannot be hard-coded but the former may be. For example in x86 the CMP instruction may take its two operands as REG and IMMED. if both elements with 4 byte they would both have to be in registers. But the hard-coded literal is 8-byte it may be put into the instruction.
Hassan Syed
If you take a look at the instruction sets of popular architectures, you'll find their respective implementations of CMP instructions support up to 32/64bit immediate constants.
Michael Foukarakis
but for x86 wouldn't you then have to read in 3 32-bit instruction words? vs just the one if uint_8 was used as return types ?
Hassan Syed
(assuming a 8-bit arch) the relevant instruction would use a pointer to the 32-bit value instead of an immediate 8-bit value. For x86, you can see that this would not be the case: http://siyobik.info/index.php?module=x86
Michael Foukarakis
Also, doesn't the use of immediates depend on the *value* rather than the *type*? If you're thinking of using a uint8_t for errors, can't you instead use an int but make sure your error values just so happen to be in the range 0 ... 255. Then the compiler mostly likely will still be able to use 8bit intrinsics if they're available on the architecture. Or am I missing something?
Steve Jessop
+3  A: 

There may be very small speed differences between the different integral types on a particular architecture. But you can't rely on it, it may change if you move to different hardware, and it may even run slower if you upgrade to newer hardware.

And if you talk about x86 in the example you are giving, you make a false assumption: An immediate needs to be of type uint8_t.

Actually 8-bit immediates embedded into the instruction are of type int8_t and can be used with bytes, words, dwords and qwords, in C notation: char, short, int and long long.

So on this architecture there would be no benefit at all, neither code size nor execution speed.

drhirsch
so if a 4 byte return type is used everywhere, may I assume that CPU architects and compiler writers agree on this, to put niggle out my head :D ?
Hassan Syed
I think it's some kind of co-evolution: C evolved as a first, very efficient abstraction around the typcial von-neumann architecture. Soon microprocessors started to efficiently implent the typical C addressing modes. So, yes, compiler writers and processor designers agree on this ;-)
drhirsch
Fantastic :D Glad I never wrote any code with 8-byte return types. Thanks a lot drhirsch.
Hassan Syed
+3  A: 

You should use int or unsigned int types for your calculations. Using smaller types only for compounds (structs/arrays). The reason for that is that int is normally defined to be the "most natural" integral type for the processor, all other derived type may necessitate processing to work correctly. We had in our project compiled with gcc on Solaris for SPARC the case that accesses to 8 and 16 bit variable added an instruction to the code. When loading a smaller type from memory it had to make sure the upper part of the register was properly set (sign extension for signed type or 0 for unsigned). This made the code longer and increased pressure on the registers, which deteriorated the other optimisations.

I've got a concrete example:

I declared two variable of a struct as uint8_t and got that code in Sparc Asm:

    if(p->BQ > p->AQ)

was translated in

ldub    [%l1+165], %o5 ! <variable>.BQ,
ldub    [%l1+166], %g5 ! <variable>.AQ,
and     %o5, 0xff, %g4 ! <variable>.BQ, <variable>.BQ
and     %g5, 0xff, %l0 ! <variable>.AQ, <variable>.AQ
cmp     %g4, %l0 ! <variable>.BQ, <variable>.AQ
bleu,a,pt %icc, .LL586 !

And here what I got when I declared the two variables as uint_t

lduw    [%l1+168], %g1 ! <variable>.BQ,
lduw    [%l1+172], %g4 ! <variable>.AQ,
cmp     %g1, %g4 ! <variable>.BQ, <variable>.AQ
bleu,a,pt %icc, .LL587 !

Two arithmetic operations less and 2 registers more for other stuff

tristopia
+4  A: 

Here's what one particular compiler will do for the following code:

extern int foo(void) ;
void bar(void)
{
        if(foo() == 31) { //error code 31
                do_something();
        } else {
                do_somehing_else();
        }
}

   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   83 ec 08                sub    $0x8,%esp
   6:   e8 fc ff ff ff          call   7 <bar+0x7>
   b:   83 f8 1f                cmp    $0x1f,%eax
   e:   74 08                   je     18 <bar+0x18>
  10:   c9                      leave
  11:   e9 fc ff ff ff          jmp    12 <bar+0x12>
  16:   89 f6                   mov    %esi,%esi
  18:   c9                      leave
  19:   e9 fc ff ff ff          jmp    1a <bar+0x1a>

a 3 byte instruction for the cmp. if foo() returns a char , we get b: 3c 1f cmp $0x1f,%al

If you're looking for efficiency though. Don't assume comparing stuff in %a1 is faster than comparing with %eax

nos