I would recommend checking out some open source disassemblers, preferably distorm and especially "disOps (Instructions Sets DataBase)" (ctrl+find it on the page).
The documentation itself is full of juicy information about opcodes and instructions.
Quote from http://ragestorm.net/distorm/vol1.html:
80x86 Instruction:
A 80x86 instruction is divided to a
number of elements:
- Instruction prefixes, affects the behaviour of the instruction's
operation.
- Mandatory prefix used as an opcode byte for SSE instructions.
- Opcode bytes, could be one or more bytes (up to 3 whole bytes).
- ModR/M byte is optional and sometimes could contain a part of the
opcode itself.
- SIB byte is optional and represents complex memory indirection
forms.
- Displacement is optional and it is a value of a varying size of
bytes(byte, word, long) and used as an
offset.
- Immediate is optional and it is used as a general number value built
from a varying size of bytes(byte,
word, long).
The format looks as follows:
/-------------------------------------------------------------------------------------------------------------------------------------------\
|*Prefixes | *Mandatory Prefix | *REX Prefix | Opcode Bytes | *ModR/M | *SIB | *Displacement (1,2 or 4 bytes) | *Immediate (1,2 or 4 bytes) |
\-------------------------------------------------------------------------------------------------------------------------------------------/
* means the element is optional.
The data structures and decoding phases are explained in http://ragestorm.net/distorm/vol2.html
Quote:
Decoding Phases
- [Prefixes]
- [Fetch Opcode]
- [Filter Opcode]
- [Extract Operand(s)]
- [Text Formatting]
- [Hex Dump]
- [Decoded Instruction]
Each step is explained also.