# A VLSI Architecture for High Speed Comb Decimation Filters with Power-of-two Decimation Ratios

YONGHONG GAO, LIHONG JIA AND HANNU TENHUNEN Electronic System Design Laboratory Royal Institute of Technology, Stockholm Electrum 229, Isafjordsgatan 22, SE-164 40 Kista, Stockholm

**SWEDEN** 

Abstract: - Non-recursive carry-save-adder-based structure for CIC (cascaded-integrator-comb) decimation filters are proposed in this paper. The main advantage of the proposed structure is that it can achieve more higher speed than Hogenauer's CIC filters since the critical path has been reduced to 1-bit full-adder delay between pipeline registers by breaking up the recursive loop in the Hogenauer's CIC filters. This also implies that the decimation ratio, filter order and number of input bits will not affect the circuitry speed. To demonstrate the feasibility of the proposed structure, a 3rd-order CIC decimation filter with a decimation factor of 8 has been designed using a  $0.6 \ \mum \ 3.3 \ V$  CMOS technology and a sampling rate of 380 MHz has been achieved from the simulated results.

Key-Words: - High-speed, Comb decimation filters, Power-of-two

CSCC'99 Proceedings, Pages:5401-5404

### **1** Introduction

In digital radio receivers digitizing IF signals at the highest frequencies is preferred due to the many advantages of digital technology. In wide-band communications signal bandwidth has increased to tens of MHz. Oversampling frequency will increase to hundreds of MHz due to the wide bandwidth of input analog signals [10]. High speed decimation filters are required.

For VLSI implementations of multistage decimators [11][12][13][14][15], a computationally efficient first stage is provided by Hogenauer's CIC filter [6][8] with following transfer function:

$$H(z) = \left(\frac{1-z^{-N}}{1-z^{-1}}\right)^k = \left(\sum_{i=0}^{N-1} z^{-i}\right)^k \tag{1}$$

where *N* is the decimation factor and *k* indicates the filter order. A 250 Msample/sec cascaded integratorcomb (CIC) decimation filter has been fabricated in a triple-metal 0.8  $\mu$ m CMOS process [2]. To speed up the circuit, carry-save adders were utilized to implement the recursive integrators. However, the critical path can only be reduced to two carry-save adder de-



Fig. 1. Carry-save implementation of the recursive integrator stages in Ref. [2].

lays between pipeline registers due to the recursive loop, shown in Fig. 1 [2].

To further speed up the CIC filter, a non-recursive carry-save-adder-based structure for CIC (cascadedintegrator-comb) decimation filters are proposed in this paper.

#### **2** Non-Recursive Architecture

Usually, the decimation factor is chosen to be M-th power-of- two. Then refer to (1), we have:

$$H(z) = \left(\sum_{i=0}^{N-1} z^{-i}\right)^{k} = \left(\sum_{i=0}^{2^{M}-1} z^{-i}\right)^{k} = \prod_{i=0}^{M-1} \left(1 + z^{-2^{i}}\right)^{k} \quad (2)$$

By applying the commutative rule, the non-recursive structure is resulted, shown in Fig. 2. The switches in

the figure indicate the reduction in the sampling rates by a factor of 2. Every stage has the same low-order FIR filter but with a different sampling rate. The input x(n) has a word length  $W_d$  of m bits. The word length increases through every stage by k bits but the word rate (sampling rate) decreases through every stage by a factor of 2 starting from the oversampling rate  $f_{os}$ . The observations are summarized in Table 1.

Table 1: Sampling rate and word length of the stage *i* in Fig. 2 (*i* -1 2 -4 -2 -4)

| -1, 2,, NI).  |                      |                  |  |  |
|---------------|----------------------|------------------|--|--|
|               | Input                | Output           |  |  |
| Sampling rate | $f_{os}$ / $2^{i-1}$ | $f_{os}$ / $2^i$ |  |  |
| Word length   | m+k*(i - 1)          | m+k*i            |  |  |

From Table 1 it's seen that frequency limitation is relaxed in the non-recursive architecture. The word length is short when the sampling rate is high; and when the word length increases the sampling rate decreases. At this moment, the filter speed is only limited by the first stage which only has a maximum word length of m + k bits. It means that decimation factor will have no impact on the circuit speed in the non-recursive structure. On the other hand, reducing the sampling rates as early as possible helps to reduce the workload and thus the power consumption.

To speed up each stage, pipeline registers has been inserted between the  $(1+z^{-1})$  computational elements to break up the long adder chain which is shown in Fig. 3.

## **3** Carry-save Implementation

Since the first stage dominates the speed of the comb filters, carry-save arithmetic is used to facilitate highspeed operation. Fig. 4(a) is a block diagram of the first stage with  $k (1+z^{-1})$  computational elements. x(n)is the input of the comb filter.  $S_1$  and  $C_1$  are the output sum and carry vectors of the first  $(1+z^{-1})$  computational element.  $S_k$  and  $C_k$  are the output sum and carry vectors of the first stage. Fig. 4(b) shows the computational element  $(1+z^{-1})$  in detail with two carry-save adders. By adding pipeline between the two carrysave adders, the critical path is reduced to a 3-input carry-save adder delay.

Fig. 5 shows a block diagram of a 3-input carrysave adder with a word length of j+1 bits. The adder



Fig. 2 The non-recursive structure for CIC filters



Fig. 3 Realization of  $(1+z^{-1})^k$  with pipeline registers.



Fig. 4. Carry-save implementation of first stage (a) Block diagram of the first stage (b)  $(1+z^{-1})$  computational element with carry-save adders.

consists of j+1 1-bit full-adder. A1[j:0], A2[j:0 and A3[j:0] are the three bus inputs. S[j:0] and C[j:0] are output sum vector and carry vector respectively. The carry-save adder perform the function (A1 + A2 + A3)= S + C). Note that C[0] is directly connect with the carry input of the carry-save adder, and that the carry output of the carry-save adder will not be used in this paper. From Fig. 5 it's clearly seen that the critical path in a carry-save adder is a 1-bit full-adder delay. It implies that the speed of a carry-save adder has no relation with its internal word length. Hence we conclude that by employing the non-recursive structure, carry-save adder and pipeline technique, the critical path has been reduced to 1-bit full-adder delay between pipeline registers by breaking up the recursive loop in the Hogenauer's CIC filters. This also implies that the decimation ratio, filter order and number of input bits will not affect the circuitry

speed.

At the output of the final stage (i.e. stage *M*), a fulladder will be utilized to add sum and carry vectors together at a frequency of  $f_{os}/N$  to get the final output of the CIC filters.

### **4 Design and Results**

To demonstrate the feasibility of this proposed structure, a 3rd-order comb filter with decimation factor 16 and 1-bit input has been designed in a 0.6  $\mu$ m 3.3 V CMOS process. Fig. 6 is the block diagram of the filter core. A is the 1-bit input. CLK is the input clock signal. Since decimation factor is 8, three stages are used to perform  $(1+z^{-1})^3$  function. The full-adder block is used to add the sum vector and carry vector together to get the final 10-bit output. Clock generator generates different frequency clock signals for stage 1, stage 2, stage 3 and the full-adder block (with pipeline registers at its input and output). All clock signals are shown in Fig. 7. CLK\_1 is directly derived from CLK by a clock buffer which has large driving ability. Then we use frequency divider to generate other clock signals. Note that to avoid timing problem, clock buffers should be carefully inserted between the clock generator and the computational blocks.

For testing purpose, a frequency-multiplier is used to increase the low frequency input clock to high frequency clock and a PN-generator is included on-chip to provide high-speed inputs to the filter core (Fig. 8). The filter's output bits will be examined using an oscilloscope or a logic analyser and compare to the known output pattern for the given PN sequence. The input clock frequency will be increased until the output bit pattern fail, which occurred at the highest



Fig. 5 Block diagram of a carry-save adder.



Fig. 7 Clock signals generated by the clock generator.

input clock rate.

Fig. 9 is the chip layout of the filter. It was designed using the COMPSS EDA tools. Datapath complier is utilized to generate the standard-cell netlists of all computational blocks. In layout design, the core area is splitted into 7 standard-cell areas to ensure the row length of each standard-cell area is within 517 $\lambda$  (here  $\lambda = 0.3\mu m$ , so  $517\lambda = 155\mu m$ ) for high-speed operation. Two pairs of power pads are specially used to supply power to internal core area. Other power pads supply power to the I/O pads. The chip contains 11470 transistors in a core area of 2.0 mm<sup>2</sup>. The simulated highest sampling rate is 380 MHz and the power dissipation is 215 mW. We summarize the chip data in Table 2.

| Table 2. | Chip | data | of the | non-recursive | CIC | filter. |
|----------|------|------|--------|---------------|-----|---------|
|----------|------|------|--------|---------------|-----|---------|

|                       | -         |                     |  |  |
|-----------------------|-----------|---------------------|--|--|
|                       | Process   | 0.6 µm 3.3 V CMOS   |  |  |
|                       | Chip Size | 8.3 mm <sup>2</sup> |  |  |
|                       | Core Size | 2.0 mm <sup>2</sup> |  |  |
| Number of Transistors |           | 11470               |  |  |
| Highest Sampling Rate |           | 380 MHz             |  |  |
| Power Consumption     |           | 215 mW (at 380 MHz) |  |  |
|                       |           |                     |  |  |

#### **5** Conclusions

In this paper a non-recursive carry-save-adderbased structure for CIC (cascaded-integrator-comb) decimation filters are proposed. The advantages of this structure are: 1) by breaking up the recursive



Fig. 8 Test structure of the filter.



Fig. 9 Test chip layout of the filter.

loop in the Hogenauer's CIC filters, the proposed structure can achieve more higher speed than Hogenauer's CIC filters since the critical path has been reduced to 1-bit full-adder delay between pipeline registers, which also implies that the decimation ratio, filter order and number of input bits will not affect the circuitry speed; 2) each stage in the nonrecursive architecture is a simple FIR filter which can be easily realized into silicon without stability problems; 3) low power consumption due to the fact that the sampling rate through each stage is reduced by a factor of 2. To demonstrate the feasibility of the proposed structure, a 3rd-order CIC decimation filter with a decimation factor of 8 has been designed using a 0.6 µm 3.3 V CMOS technology and contains 11470 transistors in a core area of 2.0 mm<sup>2</sup>. The simulated highest sampling rate is 380 MHz and the power dissipation is 215 mW.

#### Acknowledgments

This work is financially supported by SSF (Foundation for Strategic Research in Sweden).

#### References:

- J. C. Candy and G. C. Temes, Oversampling Delta-Sigma Data Converters: Theory, Design and Simulation, IEEE Press, 1992.
- [2] A. Kwentus, O. Lee, and A. N. Willson, Jr., "A 250 Msample/sec programmable cascaded integrator-comb decimation filter," in *Proc. VLSI Signal Processing*, IX, pp. 231-240, Oct.-Nov. 1996.
- [3] Kei-Yong Khoo, Zhan Yu and Alan N. willson, Jr., "Efficient High-Speed CIC Decimation Filter," in *Proc. 11th Annual IEEE International ASIC Conference*, pp. 251-254, Sep. 13-16, 1998, USA.
- [4] Yonhhong Gao, Lihong Jia, Jouni Isoaho and Hunnu Tenhunen, "A comparison design of comb decimators for deltasigma ADC," Proc. of IEEE Norchip'98, Nov.10-12, 1998, Sweden.
- [5] Nianxiong Tan, Switched-Current Design and Implementation of Oversampling A/d Converters, Kluwer International Series in Engineering and Computer Science, 1997.
- [6] E. Dijkstra, O. Nys, C. Piguet and M. Degrauwe, "On the use of modulo arithmetic comb filters in sigma delta modulators," *IEEE Proc. ICASSP*'88, pp.2001-2004, April 1988.
- [7] S.Chu and C. S. Burrus, "Multirate filter designs using comb filters," *IEEE Trans. Circuits and Sys.*, vol. CAS-31, pp. 913-924, Nov. 1984.
- [8] T. Saramäki and H. Tenhunen, "Efficient VLSI-realizable decimators for sigma-delta analog-to-digital converters," *IEEE Proc. ISCAS*'88, pp. 1525-1528, June 1988.
- [9] R. E. Crochiere and L. R. Rabiner, *Multirate Digital Signal Processing*. Prentice-Hall, Inc., 1983.
- [10] Jensen-JF, Raghavan-G, Cosand-AE and Walden-RH, "A 3.2-GHz second-order delta-sigma modulator implemented in InP HBT technology," *IEEE Journal of Solid State Circuits*, vol.30, no.10, Oct. 1995, pp. 1119-27.
- [11] Brian P. Brandt and Bruce A. Wooley, "A low-power, areaefficient digital filter for decimation and interpolation," *IEEE Journal of Solid-State Circuits*, Vol. 29, No. 6, pp. 679-687, June 1994.
- [12] Tapio Saramäki, Hilkka Palomäki, and Hannu Tenhunen, "Multiplier-free decimators with efficient VLSI implementation for sigma-delta A/D converters," Robert W. Brodersen and Howard S. Moscovitz (Eds.), VLSI Signal Processing III, IEEE Press, pp. 523-534, 1988.
- [13] Tapio Saramäki, Teppo Karema, Tapani Ritoniemi, and Hannu Tenhunen, "Multiplier-free decimator algorithms for superresolution oversampled converters," *IEEE Proc. ISCAS'90*, pp. 3275-3278, May 1990.
- [14] Teppo Karema, Tapani Ritoniemi, and Hannu Tenhunen, "An oversampled sigma-delta A/D converter circuit using two-stage fourth order modulator," *IEEE Proc. ISCAS'90*, pp. 3279-3282, May 1990.
- [15] T. Karema, T. Husu, and H. Tenhunen, "A filter processor for interpolation and decimation," in *Proc. of IEEE Custom Integrated Circuit Conference*, Boston, USA, May 1992.