## Low Power Input/Output Port Design Using Clock Gating Technique

Hyeon-Mi Yang, Sea-Ho Kim, Keun-Sik Park, Hi-Seok Kim<sup>1</sup> The Department of Electronics Engineering Cheongju University Republic of Korea

nurcupid@parna.com kensean@paran.com kyesung702@nate.com khs8391@cju.ac.kr

*Abstract:* - Clock gating is a well-known technique to reduce chip dynamic power. This paper propose a modified clock gating techniques based on ACG(Adaptive Clock Gating) and instruction level clock gating. The proposed clock gating technique reduces not only switching activity of functional blocks in IDLE state but also dynamic power in running state. Our modified ACG can automatically enable or disable the clock of the functional block. The experimental results on some I/O port core in SoC show an average of 19.45% dynamic power reduction comparing to previous ACG technique.

Key-Words: - ACG, UART, I/O Port, Clock, Low-power

### **1** Introduction

The ubiquitous acceptance of portable devices such as cell phones, PDAs and mp3 players has fueled much research in the development of technique for low-power SOC. The continuous decrease in the minimum feature size of transistors has originated a significant increase of both device density and design complexity. Recent device have reached such a high level of complexity which is implemented as a single chip. Hence, this has come at the cost of an extremely high power demand. A large fraction of the overall power dissipation on a chip is due to clock and data-path. The largest power consumption of a synchronous system is represented by clock distribution network, which is typically responsible for 30%~40% of total dynamic power dissipation[1][3]. In order to limit clock power, the clock- gating approach can be reduced power dissipation, lowering not only the switching activity at the function unit level, but also the switched capacitive load on the clock distribution network. For these reasons, the clock -gating is regarded as one of the effective logic in RTL and architectural power reduction [2]. Clock gating is an effective technique to reduce dynamic power [3]. Because individual IP usage varies within and across applications, not all IP cores are used all the time, giving rise to opportunity for reducing the unused IP cores' power. By combining(AND gate) the clock with a gate-control signal, clock gating essentially disables the clock to an IP core when that IP is not used, avoiding power dissipation due to unnecessary charging and discharging of the unused circuits. In this paper, a new clock gating technique for low power IP core design is introduced to

avoid the limitations of traditional techniques.

The remaining sections of this paper are organized as follows. Section 2 describes the clock gating techniques, and the limitations in AGC method. Section 3 describes the proposed clock gating technique and Application and implementation details .Section 4 described experimental results. Section 5 concludes the paper.

### 2 Clock Gating Technique

Several previous clock gating methods is applied in design of micro-processor and synchronous system. Deterministic clock-gating (DCG) [1] exploits this advance knowledge to clock gate the unused circuits but it is fine-grained clock gating and strictly depends on the computer architecture. The designers have to be accomplished in the architecture, especially the pipelines. That leads to making the design verification more complex. Hence, DCG is difficult to use to reduce power in the SoC design, which mainly integrates many separated IP cores by bus interconnections.

ACG(Adaptive Clock Gating)[2] analyze the IP model first. Any IP core (except combinational circuit) can be modeled as an Finite State Machine (FSM) which includes several states: Idle, Ready, Run and so on, as shown in the dashed box of Fig.2. Each circle is a state and each arrow shows a transition from a state to another. The state and the transition will be mapped to the sequential circuit and the combinational circuit respectively by synthesis. When an IP core finishes the work, it enters the idle state and stay there until it accepts another request from the system bus. We call each of those states except Idle State Working State. Hence, all

<sup>&</sup>lt;sup>1</sup> This research was financially supported by the Ministry of Education, Science Technology (MEST) and Korea Industrial Technology Foundation (KOTEF) through the Human Resource Training Project for Regional

states in an IP core are classified to two classes: IS and WS.



But ACG is only considered if any state of FSM enters the IDLE state in Fig. 1. In case of the output control signals are generated in running state, these output signals combine with main FSM clock signal through the clock gating technique. The clock is disabled automatically not need the clock. Therefore, ACG disables the IP clock during the output signal is an active "high"; otherwise, the clock is enabled. It will reduce more dynamic power consumption in comparing with previous AGC method. Hence, in this paper, we have proposed modified AGC method in order to reduce more dynamic power consumption. To prove our proposed clock gating technology, for an example, we have presented a sample synchronous IP called I/O port including in UART for low power IP design is described in Section 3.

# **3** Application of Proposed Clock Gating Technique

Dynamic power management (DPM) has been very successful in low power design area. The main idea of DPM is to reduce switching activity as much as possible, clock gating technique is used for this case. We used instruction level clock gating technique to control the clock of UART and I/O port. The basic procedure is shown in Fig. 2. where PORT I/O is I/O port operation, ID is instruction decode operation, IF is instruction fetch operation, and SIO is UART operation. Some instruction do not need SIO operation but need PORT operation, such as IN SIO, OUT SIO instruction. We can see that a power control logic is used in different state of the FSM of PORT I/O functional block. The control logic makes the decision for every instruction and every clock period, so that the clock to each component will be active mode or non active mode corresponding to the decision. The clock gating efficiently reduces switching of the clock and register operation in the functional blocks. When

some of the functional blocks are in the IDLE state as much as possible, it reduce more the clock power consumption. Hence, the main idea of our proposed clock gating technique is to reduce not only the switching activity of a functional block in IDLE state, but also the power consumption of a functional block in running state. The block diagram of a sample PORT I/O functional block is shown Fig. 3. The functional block is divided into three blocks: SIO(UART) block, PORT I/O block, AC(Accumulator) block. All of the operation are synchronized by the system clock, and both of the clock edge used. When a select input of SIO MUX is set to 1, PORT blocks are not used but SIO block is used. We can adopt clock gating technique to the clock of PORT block. It can disable the clock to the PORT blocks in running state. Hence, we can reduce power dissipation of the unused logics. Besides of measuring the power consumption of PORT blocks, we can also consider the power dissipation of UART block in Fig. 3.





Figure 3. I/O PORT of Functional Block

The sample UART is a fully functional, synthesizable, universal asynchronous receiver transmitter core in Fig. 3. The core is configurable and extremely compact. The receiver and the transmitter operates independently, and each can be selectively disabled for synthesis. Most UART uses 8bits for data, no parity and 1 stop bit. Thus, it takes 10 bits to transmit a byte of data. In the UART protocol, the transmitter and the receiver do not share a clock signal. Since no common clock is shared, a known data transfer rate (baud rate) must be agreed upon prior to data transmission. That is, the receiving UART needs to know the transmitting UART's baud rate. In almost all cases the receiving and transmitting baud rates are the same. The transmitter shifts out the data starting with the LSB first. The transmitter of micro-UART is composed of bit cell counter, transmitted bit counter, a serializer and a state machine which is shown as in Fig. 4 and Fig. 5.



Figure 4. UART Transmitter Block



Figure 5. FSM of UART Transmitter

In the case of STOP state in Fig. 5., if an output TX-total EN signal is set to 1(high), all data bits of transmitter are transmitted to the receiver of UART. This control output signal use gated signal bv combining(AND) the internal baud rate clock(Master clock), is shown as Fig. 6. That is, the output signals, TX\_total\_EN and TX\_LP\_EN are active High can be resulted that there is no more clock generation and dynamic power consumption since it disables the clock of transmitter of UART. Hence, our proposed clock gating technique has no more power consumption during the period of an output signals is to disable the main clock of transmitter block by using AND gate logic in IDLE state described in AGC.

One of the simulation results of low power transmitter of UART, In Fig. 6, CLK\_gen, is disabled the clock to the transmitter block when a signal, TX\_LP\_EN, is set to 1 and reduce dynamic power in transmitter block. Therefore, this result verify our proposed clock gating technique in IDLE state. The receiver of micro-UART is composed of a control state- machine, de-serializer, and support logic. The main goal of the receiver is to detect the start-bit, then de-serialize the following bit-stream, detect the stop-bit, and make the data available to the host. The low power functional block diagram of the receiver is shown in Fig. 8. By similar above approach, the FSM of receiver of UART, The output signal in STOP state, RX\_LP\_EN, is set to 1, the gated clock, CLK\_gen, is disabled the clock of the receiver block. This effect also reduce the clock power dissipation during the period of RX\_LP\_EN is set to 1. In other case, there is no more power consumption in receiver block of UART, as shown in Fig. 9.







Fig. 8. Low Power Receiver Functional Block of UART



### **4** Experimental

The experimental environment is described as below. The Synopsys power compiler and design compiler and SunOS Solaris 6.0. We use Modelsim to conduct simulation and use Design Compiler of Synopsys to do synthesis and timing analysis. Prime Power of Synopsys is also used to estimate power consumption. Typical I/O PORT benchmarks, including SIO(UART) is used to evaluate the influence on power for designs using different clock gating techniques, is shown Fig. 3. Hynix 0.25-µm technology library is used to map.

We use the following flow to calculate the power of I/O PORT design. First, we synthesize the RTL source codes with same timing constraints to two version gate level netlists, a original version and an modified ACG version.

Table 1. shows that our proposed ACG reduces 16.86% dynamic power of UART, compared with the Model 1 is not adopted clock gating technique. Table 2 also shows that Model 2 is an application result of Prime power compiler and our proposed ACG reduce 25.5% compared with the result of Prime compiler Model 1 only. In addition, Table 3 shows that our result reduced more power dissipation(13.4%) than model 1 of PIC I/O PORT[3]. Hence, our modified ACG technique is significant for low dynamic power design when sub–micron technology is used in SoC design.

Table 1. Comparison of Model 1 and Model 2 performances of UART

| Power dissipation of benchmark | Model 1      | Model2<br>(ACG) | Experiment<br>Result |
|--------------------------------|--------------|-----------------|----------------------|
| Cell Area                      | 731.936 (um) | 755.957(um)     | +3%                  |
| Dynamic Power                  | 100.1871(mW) | 83.2954 (mW)    | -16.86%              |

Table 2. Comparison of Model 1 andModel 2 performances of I/O PORT

| Power dissipation of benchmark | Model 1 (Synopsys) | Model2<br>(Synopsys and our<br>ACG) | Experiment<br>Result |
|--------------------------------|--------------------|-------------------------------------|----------------------|
| Cell Area                      | 1218.72(um)        | 1225.88(um)                         | +5%                  |
| Dynamic Power                  | 164.3255(mW)       | 122.3254(mW)                        | -25.5%               |

Table 3. Comparison of Model 1 and Model 2 performances of PIC PORT I/O

| power dissipation<br>of benchmark | PIC Model 1<br>(Synopsys) | PIC Model2<br>(Synopsys and our<br>ACG) | Experimental<br>Results |
|-----------------------------------|---------------------------|-----------------------------------------|-------------------------|
| Cell Area                         | 2321.33(um)               | 2355.67(um)                             | +1.5%                   |
| Dynamic Power                     | 311.244(mW)               | 269.45(mW)                              | -13.4%                  |
| Average of<br>Dynamic Power       | -                         | -                                       | -19.45%                 |

### **5** Conclusion

We have proposed a modified clock gating techniques in functional block design. From the simulation results, we can see that the functionality and behavioral characteristics of sample UART works correctly. Using the instruction level clock gating and ACG, UART and I/O PORT, we also proved that our clock gating technique is efficient for low power design in real SOC.

#### References:

- [1] Hai Li; Bhunia, S. Yiran Chen Roy, K. Vijaykumar, T.N. DCG: deterministic clock-gating for low-power microprocessor design; IEEE Transactions on Very Large Scale Integration(VLSI) Systems; Volume 12, Issue 3, March 2004 Page(s):245-254
- [2] Xiaotao Chang, Adaptive Clock Gating Technique for Low Power IP Core in SoC Design, Circuits and Systems, 2007. ISCAS 2007. IEEE International Symposium on, 2120-2123, May 2007
- [3] www.opencor.org