ansaurus

Question

Why do more Pentium assembly instructions take less time?

Answer 1

+3 A:

After the Pentium I or II, most optimizations performed by the compiler, were not as necessary. The chip will decompose these instructions into micro ops and then optimize for you. t could be the branch prediction differences between the chips or the fact that the XOR + RET is just as expensive as a plain RET. I'm not as familiar with what models of Pentiums you are looking at above to say. Another possibility is that it could also be a cache-line issue or hardware difference.

There may be something in the Intel docs or there may not.

Regardless. Experienced assembly coders know that the only truth is achieved via testing, which is what you are doing.

drudru 2009-07-08 16:57:08

Pentium and higher coding is as much voodoo as anything else. Sometimes adding more instructions makes things faster, etc. Actual testing and timing is the only route to go!

Brian Knoblauch 2009-07-08 16:59:44

Answer 2

A:

It turns out that there is some curious interaction with where the code is located that causes the increase. Even though everything is cache aligned, switching the blocks of code caused the increase in time on the Pentium-4

Thanks to all who took the time to investigate this or look at it.

piCookie 2009-07-10 14:35:47

Answer 3

A:

You can add one, two, etc nops in front of this code (and change nothing else) to move where this lands in the cache to see if there are cache effects (or just turn off the cache). Warning though as little as an extra nop can change an instruction elsewhere that can no longer reach something using relative to the pc addressing, causing possibly more instruction bytes causing both the code under test to move more than desired as well as possibly a chain reaction of other relatively addressed instructions to change.

Even if you play the cache game the nature of the beast here is the magic inside the chip that takes one stream of instructions and divides it up among the execution units.

Tweak and test is what really gets performance in the end even if you dont understand why. Although as soon as you move that code to an older chip or newer chip or different motherboard or same chip family but different stepping all your performance tweaks can turn on you.

dwelch 2009-07-13 19:41:28

Answer 4

A:

A few months ago, I had something similar occur to me. My project has a configure-switch for enabling the use of __thread for thread-local variables. Without it, it would use pthread_getspecific and the likes. The latter does every bit as much as the __thread version plus a function call plus some additional instructions for setting up arguments, saving registers, and so forth. Interestingly, the more laborious version was consistently faster. Only on Pentium 4, though. All other chips behaved sanely.

Ringding 2009-08-06 06:19:54

ansaurus

tags:

views:

answers:

Why do more Pentium assembly instructions take less time?

related questions