Below is a clip from a listing of two Pentium assembly sequences. We have an outside loop that is trying to time our sequences and is doing a call-through-table to get to these routines. Therefore, the outside call is being made from the same location every time. The two sequences differ in that the first one has one less instruction than the second.
The results we get on two Intel machines are very different.
The CPUID instruction tells the Family, Model, and Stepping.
Machine 1: Family 6, Model 15 Stepping 11. CPUZ reports "Intel Core 2 Duo E6750"
The instructions execute at statistically the same speed.
Machine 2: Family 15, Model 3, Stepping 3. CPUZ reports "Intel Pentium 4"
The first sequence takes about 8% longer than the second sequence.
We simply can not explain the increase in time. There should not be any different flag hold-off, prediction of branches, register usage problems, etc. At least not that we can tell.
Does anyone have an idea why the first sequence would take longer to execute on the one machine?
Edit: Adding "XOR PTR ereg, 0" to the first sequence does make the timing match the second one on the Pentium 4. Curious.
First Sequence:
00000040 ALUSHIFT_AND_C_V_E LABEL NEAR
00000040 0F B7 04 55 MOVZX EAX, gwr[(SIZEOF WORD) * EDX] ; EAX = 0000000000000000 LLLLLLLLLLLLLLLL
00000000 E
00000048 0F B7 14 4D MOVZX EDX, gwr[(SIZEOF WORD) * ECX] ; EDX = 0000000000000000 RRRRRRRRRRRRRRRR
00000000 E
00000050 23 C2 AND EAX, EDX ; AX = L&R (result)
00000052 A3 00000000 E MOV dvalue, EAX ; Save the temporary ALU/Shifter result
00000057 C3 RET ; Return
Second Sequence:
00000060 ALUSHIFT_AND_C_V_NE LABEL NEAR
00000060 0F B7 04 55 MOVZX EAX, gwr[(SIZEOF WORD) * EDX] ; EAX = 0000000000000000 LLLLLLLLLLLLLLLL
00000000 E
00000068 0F B7 14 4D MOVZX EDX, gwr[(SIZEOF WORD) * ECX] ; EDX = 0000000000000000 RRRRRRRRRRRRRRRR
00000000 E
00000070 23 C2 AND EAX, EDX ; AX = L&R (result)
00000072 80 35 00000000 E XOR BYTE PTR ereg, 1 ; E = ~E
01
00000079 A3 00000000 E MOV dvalue, EAX ; Save the temporary ALU/Shifter result
0000007E C3 RET ; Return