views:

92

answers:

4

Hello!

I have to approximate execution time of PowerPC and x86 assembler code.I understand that I cannot compute exact it dependson many problems (current processor state - x86 processor dicides internal instructions in microinstructions, memory access time obtainig code from cache of from slower memory etc.).

I found some information in Intel Optimization reference (APPENDIX C), but it does not provide information about all general purpose instructions. Is there any complete reference about it?

What about PowerPC processors? Where can I find such information?

+1  A: 

PowerPC is pretty well documented, but it depends which processor you're talking about. IBM did a pretty good manual for the 970 (G5). Intel is a little less forthcoming when it comes to details of micro-architecture.

Having said that though, what you want to do is quite tricky. Both x86 and PowerPC are superscalar - they have multiple execution units and pipelines, so it's not like the old days where you maybe executed one instruction per clock cycle. The PowerPC 970 for example can have up to 215 instructions "in flight" at any given time. Ideally you need a simulator if you want to measure exact cycle counts for small sections of code.

Paul R
A: 

Modern processors spend most of their time waiting for memory, or finding stuff to do while waiting for memory for their current thread.

I think you should probably just try optimising your memory usage.

Douglas Leeder
A: 

You'd have to do an extremely rigorous analysis. Take into account all the caches, alignment, pipelining, time slicing, etc, etc, etc. Does x86 even have hard clock cycle times per instruction any more? Better off to just write the optimized code for speed according to how the CPU manual suggests.

Longpoke
+1  A: 

This must be very hard to do for a modern, general-purpose, OS without either controlling the execution environment extremely tightly, or making assumptions that won't be true at least some of the time.

For example: If some hardware resource is overloaded either by one very hungry competing process or multiple competing processes, then the elapsed time to execute a given piece of code will depend upon how fairly the OS can share the overloaded resource between the competing processes. Even if the OS can share the resource perfectly fairly, you have to be able to limit the number of competing processes to determine a finite time limit.

richj