Saturday 3 May 2014

Pipelining of FIR Digital Filters (cont’d)


Pipelining of FIR Digital Filters (cont’d)

• Transposed SFG and Data-broadcast structure of FIR filters
– Transposed SFG of FIR filters
– Data broadcast structure of FIR filters
Z−1 Z−1
a b c
x(n)
y(n)
Z−1 Z−1
a b c
y(n)
x(n)
D D
c b a
x(n)
y(n)


Pipelining of FIR Digital Filters (cont’d)
• Fine-Grain Pipelining
– Let TM=10 units and TA=2 units. If the multiplier is broken into 2 smaller units
with processing times of 6 units and 4 units, respectively (by placing the latches on
the horizontal cutset across the multiplier), then the desired clock period can be
achieved as (TM+TA)/2
– A fine-grain pipelined version of the 3-tap data-broadcast FIR filter is shown
below.
Figure: fine-grain pipelining of FIR filter



Parallel Processing
• Parallel processing and pipelining techniques are duals each other: if a
computation can be pipelined, it can also be processed in parallel. Both of
them exploit concurrency available in the computation in different ways.
• How to design a Parallel FIR system?
– Consider a single-input single-output (SISO) FIR filter:
• y(n)=a*x(n)+b*x(n-1)+c*x(n-2)
– Convert the SISO system into an MIMO (multiple-input multiple-output)
system in order to obtain a parallel processing structure
• For example, to get a parallel system with 3 inputs per clock cycle (i.e., level
of parallel processing L=3)
y(3k)=a*x(3k)+b*x(3k-1)+c*x(3k-2)
y(3k+1)=a*x(3k+1)+b*x(3k)+c*x(3k-1)
y(3k+2)=a*x(3k+2)+b*x(3k+1)+c*x(3k)


Parallel Processing (cont’d)
– Parallel processing system is also called block processing, and the number
of inputs processed in a clock cycle is referred to as the block size
– In this parallel processing system, at the k-th clock cycle, 3 inputs x(3k),
x(3k+1) and x(3K+2) are processed and 3 samples y(3k), y(3k+1) and
y(3k+2) are generated at the output
• Note 1: In the MIMO structure, placing a latch at any line produces an
effective delay of L clock cycles at the sample rate (L: the block size). So,
each delay element is referred to as a block delay (also referred to as L-slow)
x(n) SISO y(n) MIMO
x(3k)
x(3k+1)
x(3k+2)
y(3k+1)
y(3k)
y(3k+2)
Sequential System 3-Parallel System


Parallel Processing (cont’d)
– For example:
D
x(2k) x(2k-2)
When block size is 2, 1 delay element = 2 sampling delays
D
x(10k) X(10k-10)
When block size is 10, 1 delay element = 10 sampling delays


Parallel Processing (cont’d)
Figure: Parallel processing architecture for a 3-tap FIR filter with block size 3


Parallel Processing (cont’d)
• Note 2: The critical path of the block (or parallel) processing system remains
unchanged. But since 3 samples are processed in 1 (not 3) clock cycle, the
iteration (or sample) period is given by the following equations:
– So, it is important to understand that in a parallel system Tsample ≠ Tclock, whereas in
a pipelined system Tsample = Tclock
• Example: A complete parallel processing system with block size 4 (including
serial-to-parallel and parallel-to-serial converters) (also see P.72, Fig. 3.11)
3
2
2
clock M A
iteration sample
clock M A
T T
L
T T T
T T T
= = ≥ +
≥ +



Parallel Processing (cont’d)
– Parallel processing system is also called block processing, and the number
of inputs processed in a clock cycle is referred to as the block size
– In this parallel processing system, at the k-th clock cycle, 3 inputs x(3k),
x(3k+1) and x(3K+2) are processed and 3 samples y(3k), y(3k+1) and
y(3k+2) are generated at the output
• Note 1: In the MIMO structure, placing a latch at any line produces an
effective delay of L clock cycles at the sample rate (L: the block size). So,
each delay element is referred to as a block delay (also referred to as L-slow)
x(n) SISO y(n) MIMO
x(3k)
x(3k+1)
x(3k+2)
y(3k+1)
y(3k)
y(3k+2)
Sequential System 3-Parallel System


Parallel Processing (cont’d)
Figure: Parallel processing architecture for a 3-tap FIR filter with block size 3

 

Parallel Processing (cont’d)
• Note 2: The critical path of the block (or parallel) processing system remains
unchanged. But since 3 samples are processed in 1 (not 3) clock cycle, the
iteration (or sample) period is given by the following equations:
– So, it is important to understand that in a parallel system Tsample ≠ Tclock, whereas in
a pipelined system Tsample = Tclock
• Example: A complete parallel processing system with block size 4 (including
serial-to-parallel and parallel-to-serial converters) (also see P.72, Fig. 3.11)
3
2
2
clock M A
iteration sample
clock M A
T T
L
T T T
T T T
= = ≥ +
≥ +




Halaman berikut>>
masuk ke-perpustakaan umum
facebook me

0 komentar:

Post a Comment