ansaurus

Question

Why increased pipeline depth does not always mean increased throughput?

Answer 1

A:

I would also think that increasing pipelining beyond the amount of time the longest instruction in a series would take to execute would not cause an increase in performance. I do think that stalling and branching are the fundamental issues though.

WhirlWind 2010-04-08 02:30:20

Answer 2

+1 A:

Definitely stalls/bubbles in long pipelines cause a huge loss in throughput. And of course, the longer the pipeline the more clock cycles are wasted.

I tried for a long time to think of other scenarios where longer pipelines could cause a loss in performance, but it all comes back to stalls. (And number of execution units and issue schemes, but those don't have much to do with pipeline length.)

SoapBox 2010-04-08 02:54:19

Answer 3

+1 A:

Agreed. The biggest problems are stalls (waiting for results from previous instructions), and incorrect branch prediction. If your pipeline is 20 stages deep, and you stall waiting for the results of a condition or operation, you're going to wait longer than if your pipeline was only 5 stages. If you predict the wrong branch, you have to flush 20 instructions out of the pipeline, as opposed to 5.

I guess presumably you could have a deep pipeline where multiple stages are attempting to access the same hardware (ALU, etc), which would cause a performance hit, though hopefully you throw in enough additional units to support each stage.

awshepard 2010-04-08 03:30:10

That's not 20 instructions, but 20 cycles' worth of instructions. On a heavily superscalar CPU, that may be MUCH more.

slacker 2010-04-13 00:23:23

Answer 4

+2 A:

The throughout can be stalled by other instructions when waiting for a result, or on cache misses. Pipelining doesn't itself guarantee that the operations are totally independent. Here is a great presentation about the intricacies of the x86 Intel/AMD architecture: http://www.infoq.com/presentations/click-crash-course-modern-hardware

It explains stuff like this in great detail, and covers some solutions on how to further improve throughput and hide latency. JustJeff mentioned out-of-order execution for one, and you have shadow registers not exposed by the programmer model (more than 8 registers on x86), and you also have branch prediction.

Mads Elvheim 2010-04-14 21:58:54

Answer 5

A:

Instruction level parallelism has diminishing returns. In particular, data dependencies between instructions determine the possible parallelism.

Consider the case of Read after Write (known as RAW in textbooks).

In the syntax where the first operand gets the result, consider this example.

10: add r1, r2, r3
20: add r1, r1, r1

The result of line 10 must be known by the time the computation of line 10 begins. Data forwarding mitigates this problem, but...only to the point where the data gets known.

Paul Nathan 2010-04-14 22:49:42

ansaurus

tags:

views:

answers:

Why increased pipeline depth does not always mean increased throughput?

related questions