I'm interested in writing an x86 assembler. I'm wondering what is a good way to map x86 assembly mnemonic instructions (using an Intel-like syntax) into the corresponding binary machine code instructions.
views:
277answers:
3For x86, it's complicated as hell. A little less complicated since 32-bit processors took over, but yeah. Still a pain.
You may want to take a look at nasm ( http://www.nasm.us ). It's an open source 32-bit assembler. See how they do it. Or, use it instead. :)
It's just a straight-up one-to-one mapping; the Intel documentation describes all of the instructions and their encodings. You'll need to build a giant lookup table or something equivalent to do the matching and code generation.
Do you want to understand the physical mapping of mnemonics to machine code? If so volume 2A & 2B of the the Intel IA32/IA64 reference manuals describe the binary format of x86 machine code .
The x86 instruction set page on Wikipedia has a compact listing of all the instructions categorized by when they were introduced, which might help you prioritize what to implement first.
However, if you are asking about how to go about parsing an assembly code text file to get to the point where your program could start writing out machine code then you basically need to understand how to write a compiler. The tools lex and yacc are good places to start but if you don't know how build a compiler you'll also need to get a book. I think the Dragon book is the best one out there but there are any number of other books you could use, SO has plenty of recommendations.