While writing some C code, I decided to compile it to assembly and read it--I just sort of, do this from time to time--sort of an exercise to keep me thinking about what the machine is doing every time I write a statement in C.
Anyways, I wrote these two lines in C
asm(";move old_string[i] to new_string[x]");
new_string[x] = old_string[i];
asm(";shift old_string[i+1] into new_string[x]");
new_string[x] |= old_string[i + 1] << 8;
(old_string
is an array of char
, and new_string
is an array of unsigned short
, so given two chars, 42 and 43, this will put 4342 into new_string[x]
)
Which produced the following output:
#move old_string[i] to new_string[x]
movl -20(%ebp), %esi #put address of first char of old_string in esi
movsbw (%edi,%esi),%dx #put first char into dx
movw %dx, (%ecx,%ebx,2) #put first char into new_string
#shift old_string[i+1] into new_string[x]
movsbl 1(%esi,%edi),%eax #put old_string[i+1] into eax
sall $8, %eax #shift it left by 8 bits
orl %edx, %eax #or edx into it
movw %ax, (%ecx,%ebx,2) #?
(I'm commenting it myself, so I can follow what's going on). I compiled it with -O3, so I could also sort of see how the compiler optimizes certain constructs. Anyways, I'm sure this is probably simple, but here's what I don't get:
the first section copies a char
out of old_string[i]
, and then movw's it (from dx
) to (%ecx,%ebx)
. Then the next section, copies old_string[i+1]
, shifts it, ors it, and then puts it into the same place from ax
. It puts two 16 bit values into the same place? Wouldn't this not work?
Also, it shifts old_string[i+1]
to the high-order dword of eax
, then ors edx
(new_string[x]
) into it... then puts ax
into the memory! Wouldn't ax
just contain what was already in new_string[x]
? so it saves the same thing to the same place in memory twice?
Is there something I'm missing? Also, I'm fairly certain that the rest of the compiled program isn't relevant to this snippet... I've read around before and after, to find where each array and different variables are stored, and what the registers' values would be upon reaching that code--I think that this is the only piece of the assembly that matters for these lines of C.
-- oh, turns out GNU assembly comments are started with a #.