tags:

views:

91

answers:

2

I'm reading "The Art of Assembly - Chapter 11. The MMX Instruction Set"

After executing some MMX instructions, the EMMS instruction needs to be executed to reset the FPU. It states the EMMS instruction is quite slow.

However when I profiled the EMMS execution time to see just how slow it was, (using RDTSC to count clock cycles), it appears to execute in 0 cycles.

What's going on? Have I made a mistake somewhere or is Art Of Assembly out of date?

+1  A: 

It was slow on the ancient Pentium MMX, but on more modern processors it is very fast.

Still, MMX is mostly obsolete today. Use SSE2, and you'll have no problems multiplexing with the FPU.

Also, the RDTSC instruction can be executed in parallel with other instructions, which explains your measurement - the CPU simply started executing both RDTSCs and the EMMS simultaneously in the same clock cycle... If you want to measure the time a piece of code takes, you must serialize both RDTSCs with regard to the code - usually the CPUID instruction is used for that. As the serializing instructions you use take CPU cycles themselves, you have to also measure how many cycles does it take with no code between the measurement rig you write.

The last point is that even on the Pentium MMX the EMMS instruction itself finished fast - it was the first FPU instruction after that that was getting a nasty delay...

stormsoul
Thanks Stormsoul
A: 

You need a serializing instruction, such as CPUID, to ensure that RDTSC is not executed out of order. You can read more here.

zvrba