views:

114

answers:

7

Suppose these two are essensially the same:

push 1

and

0x1231

Which says each assembly instruction maps to a machine code.

But is it necessary that each machine code can only map to one assembly code?

+2  A: 

You could perfectly well define an assembler program that supports "synonyms" for instructions: no harm is done if you let the user code FOO meaning exactly the same as BAR. I don't know offhand of assemblers that do that, but you can certainly achieve the same effect with a trivially simple macro in any macro-assembler;-).

Alex Martelli
how does the processor segment machine code,by two words per instruction?
Mask
The binary format of machine code, and the syntax of the assembly language which generates that machine code, are quite uncorrelated. x86, for example, has binary instructions of widely varying lengths, from one byte on up, but each is generated from a single assembly language instruction.
Alex Martelli
In the view of processor,all are a sequence of bits.How does it know the starting and finishing bits for each instruction?
Mask
@Mask, all modern processors use sequences of words (possibly including bytes), not bits. Those with instructions of different lengths obviously have some extra logic such that, depending on the first byte or word, they know how many more they'll need. Again, the assembler (whose job is to read assembly code text and generate binary machine code) has nothing to do with the case.
Alex Martelli
A: 

I don't see any conceptual reason why you couldn't design an assembly language wherein more than one assembly statement map to the same opcode on the underlying processor.

I also don't immediately see any particularly good reason to do that, but it's late and maybe I'm missing something.

Syntactic
+1  A: 

Even without synonyms, an assembly instruction can map to more than one machine codes.
E.g. add eax, ebx can be represented as either 03 C3 or 01 D8.
In fact, this can be useful, e.g. to identify particular compilers.
You can find more examples in this article.

The reverse can also be true, in a way.
The example is a bit far-fetched, but the same machine code (F3 90) maps to either REP NOP or PAUSE on x86.
Which one is executed, depends on the CPU the code runs on.
Although the same opcode was chosen deliberately and as far as the processor state is concerned, they make no difference, the execution time - and the exact internal implementation - can differ on a HT (PAUSE) vs non-HT (NOP) CPU.

Apart from the PAUSE vs REP NOP that makes little difference, it is possible to write machine code that is hard to disassemble it statically.
E.g. one can carefully construct a machine code sequence that results in completely different assembly instructions if the disassembly starts at say offset 0 vs offset 1.
One can also write self-modifying assembly code to make static analysis harder.

andras
BTW,how does the processor segment machine code,by two words per instruction?
Mask
@Mask: If your question is whether there are one byte instructions in machine code, then yes, there are many of them.
andras
No.In the view of processor,all are a sequence of bits.How does it know the starting and finishing bits for each instruction?
Mask
@Mask: Opcodes can encode the necessary information about the arguments and their lengths. An opcode table: http://www.sandpile.org/ia32/opc_1.htm An article about the redundancy of machine code: http://www.strchr.com/machine_code_redundancy
andras
@Mask: though a much better - albeit longer and more sophisticated - source is "Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 2A: Instruction Set Reference, A-M", in which a whole chapter is dedicated to the instruction and opcode format. http://www.intel.com/Assets/PDF/manual/253666.pdf
andras
A: 

Generally the point of assembly is to allow you to directly program the machine without an ambiguity on what will be executed. The pretty much requires a 1:1 mapping.

I wouldn't be surprised if somewhere in some assembler there are some indirect mappings probably to deal with changes to opcodes in some line of processors. I don't know of any though.

ScottS
A: 

What a particular machine code instruction does is dictated by the processor (or processor family) it is for. And the same machine code instruction will always do fundamentally the same thing.

Normally, a particular machine code instruction will dis-assemble to only one statement. In some more complex instruction sets, there are several ways to write the same expression in assembler. A good example is indexed lookups. Some statements can also have synonyms but again, will still mean the same thing to the processor.

However, it is possible for multiple whole assembly sets to exist for an architecture. This has happened for the x86 architecture where there is the standard set as defined by Intel, and then there's another based on one created by AT&T, which his is the one used by GCC.

staticsan
+2  A: 

Yes. A real-world example of this is 68k assembler, where

The official mnemonics BCC (branch on carry clear) and BCS (branch on carry set) can be renamed as BHS (branch on higher than or same) and BLO (branch on less than), respectively. Many 68000 assemblers support these alternative mnemonics.

David Gelhar
+1  A: 

MIPS assembly language has several "pseudoinstructions". For example, "move" is internally just an "add" with an implicit $0 operand.

dan04