ansaurus

Question

Answer 1

+2 A:

The first mov is copying from the offset goo relative to the segment register [e]DS. The second mov is writing at the offset of foo into a data location relative to the DS register. If the CS and DS are coincidental, then this can be ignored. Assuming the CS and DS are coincidental, you're next likely to run into various protection mechanisms that render code sections read-only.

RE followups:

A label isnt like a reference - you dont dereference as such. The assembler substitutes in a number representing the location in the resulting code. You can load either the address, or the thing at the address. The [ and ] indicate dereferencing - I've fixed a confusing element in my first response to cover this. IOW doing [goo] loads the thing at that address.
A CISC instruction set like x86 has [very] variable length instructions - some even not a multiple of the word length. RISC ones generally try to rstict this to make decoding instructions simpler.
3 - you are only modifing the first 4 bytes of the mov eax, 2 (which, due to the little endian encoding does get replaced with 4 but then gets overwritten by the next instruction which hasnt been modified at all - 5 is never in the picture as a candidate (I thought you were thinking the code gets reordered the way you first asked the question[1] though you clearly know quite a bit more as I should have guessed from your rep :P)]).

Note that all of this assumes that CS = DS and DEP isnt stepping in.

Also, if you were using BX instead of EBX, the sort of things you were expecting will come into play (using xX instead of ExX accesses the low 2 bytes of the register [and xL accesses the lowest byte])

[1] Remember that an assembler is purely a tool for writing opcodes - stuff like labels etc. all get boiled down to numbers etc. with very little magic or impressive transformations of the code - there's no closures or anything deep of that nature lurking in there. (This is slightly oversimplifying - code can be relocatable, and in many cases fixups get applied to usages of offsets by a combination of the linker and the loader)

Ruben Bartelink 2009-08-18 21:10:40

right - I was with RISC in mind... thanks for clearing up that point

Yuval A 2009-08-18 21:30:49

Answer 2

+7 A:

boo is the offset of the instruction mov eax, 3 inside section .data. mov ebx, [boo] means “fetch four bytes at the offset indicated by boo inside ebx”. Likewise, mov [goo], ebx would move the content of ebx at the offset indicated by goo.

However, code is often read-only, so it wouldn't be surprising to see the code just crashing.

Here is how the instructions at boo are encoded:

boo:
b8 03 00 00 00          mov    eax,0x3
c3                      ret

So what you get in ebx is actually 4/5 of the mov eax, 3 instruction.

Bastien Léonard 2009-08-18 21:18:13

It looks like this happens to work because they're not full 32-bit quantities and the last byte will always be 0. This code will fail if you try something like mov, eax 0xC000000

Michael 2009-08-18 21:21:27

"fetch the four bytes" was what I was looking for. thanks!

Yuval A 2009-08-18 21:26:17

Answer 3

+1 A:

Follow up answers:

It gives you the machine code starting at the address. How much of that depends of the length of your load, in this case it is 4 byte.
It can be more than one command or only a fragment of a command. On this architecture (Intel x86) machine code commands are between 8 and 120 Bit.
3.

drhirsch 2009-08-18 21:45:44

ansaurus

tags:

views:

answers:

Dereferencing a label in x86 assembly

related questions