views:

95

answers:

3

Hello,

Admittedly, I have a bit silly question. Basically, I am wondering if there are some special mechanisms provided by Intel processors to efficiently execute a series of dummy, i.e., NOP instructions? For instance,I could imagine there could be some kind of pre-fetch mechanism that identifies NOPS, discards them and tries to fetch some useful instructions instead. Or are these NOPS dispatched to the execution unit as normal instructions, meaning that i can roughly process 5 nops each cycle (under the assumption that there are 5 execution units)

Thanks, Reinhard

+1  A: 

No. They are decoded and executed as normal instructions; there is hardware support to remove the false dependency that would otherwise be introduced on the EAX register for the single byte NOP, 0x90 (which is really xchg eax, eax), but that's all.

Reference: Intel(R) 64 and IA-32 Architectures Optimization Reference Manual - section 3.5.1.8, "Using NOPs".

Matthew Slattery
+1  A: 

Discarding them would be pretty bad idea: they are often used for busy-waiting. If you discard NOPs, you make your wait-loop much tighter than it should be and potentially introduce considerable communications overhead.

If you feel that NOPs are inefficient, you could try HLT which saves some energy. Or you could even send the CPU into a sleep state. However, these only make sense if you want to "do nothing" for a considerable amount of time and they usually require suvervisor privileges.

Jörg W Mittag
A: 

There's very little need for optimizing sequences of no-ops on the x86 architecture because it has no-op encodings of varying lengths. Instead of many one-byte no-ops, one can just use a single multi-byte no-op. Somewhat more work for the decoder, but the actual execution units only see a single instruction to execute.

Stephen Canon
Thanks for the answers. Does it make a difference from a performance point of view using a multiword versus a multi-single-NOP instruction? Or is this just interesting from a code size point of view?
reinhard
It's hard to say exactly what the performance effects are for multi-byte NOPs. I don't know if they can all go through the simple decoder path (you can probably look that up somewhere). If they require the complex decoder path, and it's already saturated, it might be preferred to use two smaller NOPs.
Stephen Canon
Actually, with long NOP support you can make a single NOP of any size from 1 to 15 bytes. If you need to skip a larger space, then JMP instead of NOP.
slacker