views:

44

answers:

2

Suppose we've got two integer and character variables:

int adad=12345;
char character;

Assuming we're discussing a platform in which, length of an integer variable is longer than or equal to three bytes, I want to access third byte of this integer and put it in the character variable, with that said I'd write it like this:

character=*((char *)(&adad)+2);

Considering that line of code and the fact that I'm not a compiler or assembly expert, I know a little about addressing modes in assembly and I'm wondering the address of the third byte (or I guess it's better to say offset of the third byte) here would be within the instructions generated by that line of code themselves, or it'd be in a separate variable whose address (or offset) is within those instructions ?

+2  A: 

Without knowing anything about the compiler and the underlying CPU architecture no definitive answer can be given. For example, not all CPU architectures allow the addressing of every arbitrary byte in memory (though I believe all the currently popular ones do): on a CPU that's word-addressed, instead of byte-addressed, what the compiler will generate is inevitably going to be the loading into some register of the whole word adad (presumably by an offset from a base pointer register, if the variable in question is on stack [1]), followed by shifting and masking to isolate the byte of interest.

[1] note that, without knowing what CPU architecture we're talking about and how the compiler uses it, we can't even say whether "load a word at a fixed offset from a base register" is something that's done inline within the instruction (as one might hope, and many popular architectures definitely do support;-) or needs separate address arithmetic in an auxiliary register.

IOW, whether it's a good idea or not, it's definitely possible to define a CPU architecture which cannot load / store registers except from other registers or memory addresses defined by other registers or constant, and some such architectures exist (though they may not be all that popular at this time;-).

Alex Martelli
+3  A: 

The best thing to do in situations like this is to try it. Here's an example program:

int main(int argc, char **argv)
{
  int adad=12345;
  volatile char character;

  character=*((char *)(&adad)+2);

  return 0;
}

I added the volatile to avoid the assignment line being completely optimized away. Now, here's what the compiler came up with (for -Oz on my Mac):

_main:
    pushq   %rbp
    movq    %rsp,%rbp
    movl    $0x00003039,0xf8(%rbp)
    movb    0xfa(%rbp),%al
    movb    %al,0xff(%rbp)
    xorl    %eax,%eax
    leave
    ret

The only three lines that we care about are:

    movl    $0x00003039,0xf8(%rbp)
    movb    0xfa(%rbp),%al
    movb    %al,0xff(%rbp)

The movl is the initialization of adad. Then, as you can see, it reads out the 3rd byte of adad, and stores it back into memory (the volatile is forcing that store back).

I guess a good question is why does it matter to you what assembly gets generated? For example, just by changing my optimization flag to -O0, the assembly output for the interesting part of the code is:

    movl    $0x00003039,0xf8(%rbp)
    leaq    0xf8(%rbp),%rax
    addq    $0x02,%rax
    movzbl  (%rax),%eax
    movb    %al,0xff(%rbp)

Which is pretty straightforwardly seen as the exact logical operations of your code:

  1. Initialize adad
  2. Take the address of adad
  3. Add 2 to that address
  4. Load one byte by dereferencing the new address
  5. Store one byte into character

Various optimizations will change the output... if you really need some specific behaviour/addressing mode for some reason, you might have to write the assembly yourself.

Carl Norum
Good answer for the x86, but all the world is not (even today) an x86.
dmckee
@dmckee, the point is not to be x86, but that the OP should just look at his binary and find out. I happen to have an x86 machine handy. Want me to go make an ARM one?
Carl Norum