Saturday 3 May 2014

Chapter 3: Pipelining and Parallel Processing Keshab K. Parhi

Chapter 3: Pipelining and Parallel
Processing
Keshab K. Parhi

Outline
• Introduction
• Pipelining of FIR Digital Filters
• Parallel Processing
• Pipelining and Parallel Processing for Low Power
– Pipelining for Lower Power
– Parallel Processing for Lower Power
– Combining Pipelining and Parallel Processing for Lower Power

Introduction
• Pipelining
– Comes from the idea of a water pipe: continue sending water without
waiting the water in the pipe to be out
– leads to a reduction in the critical path
– Either increases the clock speed (or sampling speed) or reduces the power
consumption at same speed in a DSP system
• Parallel Processing
– Multiple outputs are computed in parallel in a clock period
– The effective sampling speed is increased by the level of parallelism
– Can also be used to reduce the power consumption
water pipe

Introduction (cont’d)
• Example 1: Consider a 3-tap FIR filter: y(n)=ax(n)+bx(n-1)+cx(n-2)
– The critical path (or the minimum time required for processing a new
sample) is limited by 1 multiply and 2 add times. Thus, the “sample period”
(or the “sample frequency” ) is given by:
D D
a b c
x(n)
y(n)
x(n-1) x(n-2)
T Addition time
T multiplica tion time
A
M


:
:
M A
sample
sample M A
T T
f
T T T
2
1
2
+

≥ +

Introduction (cont’d)
– If some real-time application requires a faster input rate (sample rate),
then this direct-form structure cannot be used! In this case, the critical
path can be reduced by either pipelining or parallel processing.
• Pipelining: reduce the effective critical path by introducing pipelining latches
along the critical data path
• Parallel Processing: increases the sampling rate by replicating hardware so
that several inputs can be processed in parallel and several outputs can be
produced at the same time
• Examples of Pipelining and Parallel Processing
– See the figures on the next page

Introduction (cont’d)
Figure (a): A data path
Figure (b): The 2-level pipelined structure of (a)
Figure (c): The 2-level parallel processing structure of (a)
Example 2
 

Pipelining of FIR Digital Filters
• The pipelined implementation: By introducing 2 additional latches in
Example 1, the critical path is reduced from TM+2TA to TM+TA.The schedule of
events for this pipelined system is shown in the following table. You can see
that, at any time, 2 consecutive outputs are computed in an interleaved manner.
D D
a b c
x(n)
y(n)
D
D
1 2
3
Clock Input Node 1 Node 2 Node 3 Output
0 x(0) ax(0)+bx(-1)   
1 x(1) ax(1)+bx(0) ax(0)+bx(-1) cx(-2) y(0)
2 x(2) ax(2)+bx(1) ax(1)+bx(0) cx(-1) y(1)
3 x(3) ax(3)+bx(2) ax(2)+bx(1) cx(0) y(2)

Pipelining of FIR Digital Filters (cont’d)
• In a pipelined system:
– In an M-level pipelined system, the number of delay elements in any path
from input to output is (M-1) greater than that in the same path in the
original sequential circuit
– Pipelining reduces the critical path, but leads to a penalty in terms of an
increased latency
– Latency: the difference in the availability of the first output data in the
pipelined system and the sequential system
– Two maindrawbacks: increase in the number of latches and in system
latency
• Important Notes:
– The speed of a DSP architecture ( or the clock period) is limited by the
longest path between any 2 latches, or between an input and a latch, or
between a latch and an output, or between the input and the output

Pipelining of FIR Digital Filters (cont’d)
– This longest path or the “critical path” can be reduced by suitably placing
the pipelining latches in the DSP architecture
– The pipelining latches can only be placed across any feed-forward cutset
of the graph
• Two important definitions
– Cutset: a cutset is a set of edges of a graph such that if these edges are
removed from the graph, the graph becomes disjoint
– Feed-forward cutset: a cutset is called a feed-forward cutset if the data
move in the forward direction on all the edge of the cutset
– Example 3: (P.66, Example 3.2.1, see the figures on the next page)
• (1) The critical path is A3→A5→A4→A6, its computation time: 4 u.t.
• (2) Figure (b) is not a valid pipelining because it’s not a feed-forward cutset
• (3) Figure (c) shows 2-stage pipelining, a valid feed-forward cutset. Its critical
path is 2 u.t.

Pipelining of FIR Digital Filters (cont’d)
Assume:
The computation time
for each node is assumed
to be 1 unit of time
Signal-flow graph representation of Cutset
• Original SFG
• A cutset
• A feed-forward cutset



Halaman berikut>>

0 komentar:

Post a Comment