views:

87

answers:

4

hi,

there is something i can't digest. I'm learning some assembler and right now i'm at chapter with addressing. I understand the concept of brackets for dereferencing, but somehow when I see the usage of it I just can't soak up the point of it. To be a little bit more exact here is where my confusion started:

mov al, [L1]

ok here I suppose L1 as an example case is some kind of macro which is later substituted for a real address in the machine code..right?

so what this instruction does is: dereferencing al register (because you could hardly change physical address) and changing the value to the one stored at L1.

if till now everything is ok:

mov [L1], al

that would analogicaly mean... there must have been an address stored (so there was some point in doing this) and you change it to some other place in memory.. right?

if you could just tell me it's ok in case you don't see any mistakes please do it, that would make it possible for me to continue learning.

oh and one last thing, NASM adds a bunch of 0xAA55 under my code (this sequence is supposed to end the program right?), why is it there so many times? thank you very much for reading it to here..

+2  A: 
mov al, [L1]

I wouldn't think of L1 as a macro. L1 is a register. The processor reads the contents of that register, and finds an address there. Then, it reads from THAT address to find the data it stores in al.

Think about if you were, say, adding 5 to every number in an array. Then you could keep your "index" in the array in the L1 register, retrieve the number from [L1], add 5, store it, then increment L1 and repeat.

Disclaimer: I've never used NASM specifically... this is just working from what "dereferencing" meant on another platform...

ETA: As @Benoit Miller says below, L1 is not a register but a place in regular memory.. but the description of dereferencing still applies, I think.

rescdsk
+3  A: 

L1 is typically/probably a label, associated with one particular address in memory. The programmer defines various labels for his/her convenience, and such labels are used to symbolically represent a particular location in memory (L1 is a lousy name, but labels are typically indicative of the underlying purpose of the location: say, PingCounter, ErrorMessage, Login and the like).

L1 itself contains an address in memory (i.e. say typically 4 or 8 bytes of memory, depending on architecture and of memory model etc.). This can get a bit confusing because all labels correspond to an address in memory, but not all contain an address. In other words, with L1 (or more precisely with [L1], note the brackets) we have an additional indirection to the location in memory where the read/write will take place. In the assembly language, this indirect addressing mode is signified by square bracketing the source and/or destination operand of a given instruction.

mov al, [L1]

uses the address stored in L1, to locate some location in memory and reads two bytes (= 16 bites = the size of AL register) at this location, and store this into the AL register.

  mov [L1], al

Does this in reverse. i.e., specifically, read the address stored in L1, use this address to find a particular place in memory and stores the contents of AL register there.


Provided that you understand the following information to be incomplete and somewhat outdated with regards to the newer processors in the x86 family, this primer on the 8086 architecture is probably very useful to get one started with Assembly language for the x86 family.
The advantage of starting with this "antiquity of a CPU" (still in use, actually), is that the fundamental concepts are all there, unencumbered of the newer sets of registers, fancy addressing modes, modes of operation and other concepts. The bigger sizes, features and modes of the newer CPUs merely introduce a combinatorial explosion of options, all (most?) of them useful in their way, but essentially irrelevant for an initiation.

mjv
I think i get it now. there is just one but. you said every label stands for an address so in the second instruction what is actually stored to L1 is AL address.. ? ?
stupid_idiot
In the second instruction, the address at label L1 is set to the value of the AL register.
Benoit Miller
@stupid_idiot (btw, you don't seem to be either). No, AL (and AH, AX and all register names) are _not_ memory location, and don't have an address per se. They merely reference a particular location within the CPU itself. Indeed, I meant to "correct" you on the use of the expression "dereferencing al register" in your question; there is no dereferencing taking place with regards to AL, in this very context, the Addressing mode for for AL in this context is "register". (this could get confusing for sometimes registers can be used to produce an address in memory which is then de-referenced)
mjv
oh..that ....oh.. i just realized, you probably can't address registers like the rest of the memory space.. but how is then AL translated into machine code?
stupid_idiot
ah yea.. everything is clear now. thank you so much, you just spared me a lot of time i would waste with trial and error method :)
stupid_idiot
@stupid_idiot With regards to "translating" the instruction to machine code, the "AL" corresponds to a few bits within the pattern of the code. (This is unlike say L1 which is also found in the assembly but as stand-alone operand directly visible when inspecting memory). AL (or "references" to other registers) have to be "decoded out" instruction code, typically in the form of a 3 bits code giving up to 8 register names, to be understood explicitly on the basis of other bits or context. See Intel's x86 Instruction Set reference for details on this.
mjv
yep, I read the instruction set to i8086 and I think I get it now. I suppose it is designed like this because registers have a purpose for the processor's function in contrast to everything that is adressed regulary on the bus. thx a lot again for all the effort you have made to help me I really appreciate it. You've got my respect :)
stupid_idiot
+1  A: 

It's hard to follow your question, but I'll try to help out.

In assembly, a symbol is just a name for a an address. In your assembly source, L1 is a symbol defined elsewhere, which the assembler will resolve as an offset to memory.

When dereferencing (using the [] notation), you can dereference a register (as in "mov al, [esi]") or an address (as in "mov al, [L1]"). Both statements do the same thing, the only difference is where the address comes from.

I recommend downloading the Intel CPU Documentation and skimming through the instruction reference. If you don't want to be overwhelmed, start reading from an older x86 processor (say, 486 or older), that documentation isn't exactly friendly but it is quite useful to have on hand.

I don't know the specifics of NASM, I learned assembly 15 years ago with Turbo Assembler, and that knowledge is still useful today :)

Also, might I suggest you try Googling for "x86 assembly tutorial", you'll find plenty of relevant documentation that may be useful for you.

Benoit Miller
+1  A: 

oh and one last thing, NASM adds a bunch of 0xAA55 under my code (this sequence is supposed to end the program right?), why is it there so many times? thank you very much for reading it to here..

I'm pretty sure thats only applicable if your creating a bootloader. It is the "boot signature." Say you write this code to a floppy(is your produced machine code also exactly 512 bytes?), well when you want to start the computer with this bootloader code, the BIOS will look at the floppy and determine if it's an actual bootloader. In order to do that, it will look at the last two bytes of the first sector of the floppy, which should be 0xAA55 to indicate that it is bootable.. (also, this works the same way if your booting off of harddrive, or thumb-drive, or whatever. Slightly different for CDs because they have 4096 byte sectors)

In your source code, is like the last line something like $(times.. db 0xAA55 or something like that? If your not intending on making a bootloader, you can effectively remove that line.

Earlz
thank you Earlz. I was a bit confused because i thought it's a regular instruction, what is wrong.. now that I know it's just NASM funky mode for easier coding everything is clear to me.
stupid_idiot