With modern CPU's, there are no simple tables to look up how long an instruction will take to complete (although such tables exist for some old processors, e.g. 486). Your best information on what each instruction does and how long it might take comes from the chip manufacturer. E.g. Intel's documentation manuals are quite good (there's also an optimisation manual on that page).
On pretty much all modern CPU's there's also the RDTSC
instruction that reads the time stamp counter for the processor on which the code is running into EDX:EAX
. There are pitfalls with this also, but essentially if the code you are profiling is representative of a real use situation, its execution doesn't get interrupted or shifted to another CPU core, then you can use this instruction to get the timings you want. I.e. surround the code you are optimising with two RDTSC
instructions and take the difference in TSC as the timing. (Variances on timings in different tests/situations can be great; statistics is your friend.)