views:

221

answers:

4

Hi!

I am using an embedded RISC processor. There is one basic thing I have a problem figuring out.

The CPU manual clearly states that the instruction ld r1, [p1] (in C: r1 = *p1) takes one cycle. Size of register r1 is 32 bits. However, the memory bus is only 16 bits wide. So how can it fetch all data in one cycle?

A: 

Could there be a 32-bit cache within the processor itself? Which processor are you using?

Adam Liss
A: 

How about L1 cache?

BTW, in which architecture? Are you sure that the bus is 16bit wide?

elcuco
+1  A: 

My understanding is : when saying some instruction take one cycle , it is not that instruction will be finished in one cycle. We should take in count of instruction pipe-line. Suppose your CPU has 5 stage pipe line , that instruction would takes 5 cycles if it were exectued sequentially.

pierr
+2  A: 

The clock times are assuming full width zero wait state memory. The time it takes for the core to execute that instruction is one clock cycle.

There was a time when each instruction took a different number of clock cycles. Memory was relatively fast then too, usually zero wait state. There was a time before pipelines as well where you had to burn a clock cycle fetching, then a clock cycle decoding, then a clock cycle executing, plus extra clock cycles for variable length instructions and extra clock cycles if the instruction had a memory operation.

Today clock speeds are high, chip real estate is relatively cheap so a one clock cycle add or multiply is the norm, as are pipelines and caches. Processor clock speed is no longer the determining factor for performance. Memory is relatively expensive and slow. So caches (configuration, number of and size), bus size, memory speed, peripheral speed determine the overall performance of a system. Normally increasing the processor clock speed but not the memory or peripherals will show minimal if any performance gain, in some occasions it can make it slower.

Memory size and wait states are not part of the clock execution spec in the reference manual, they are talking about only what the core itself costs you in units of clocks for each of the instructions. If it is a harvard architecture where the instruction and data bus are separate, then one clock is possible with the memory cycle. The fetch of the instruction happens at least the prior clock cycle if not before that, so at the beginning of the clock cycle the instruction is ready, decode, and execute (the read memory cycle) happen during the one clock at the end of the one clock cycle the result of the read is latched into the register. If the instruction and data bus are shared, then you could argue that it still finishes in one clock cycle, but you do not get to fetch the next instruction so there is a bit of a stall there, they might cheat and call that one clock cycle.

dwelch