# An Area-efficient DLL based on a Merged Synchronous Mirror Delay Structure for Duty Cycle Correction

SEOK-YONG HONG, SEONG-IK CHO, HANG-GEUN JEONG

Department of Electronic Engineering Chonbuk National University 664-14, 1ga, DuckJin-Dong, Jeonju, Jeonbuk KOREA, SOUTH

*Abstract:* A DLL(Delay Locked Loop) with DCC(Duty Cycle Correction) has become an essential block in high speed memory and digital circuits. An SMD(Synchronous Mirror Delay) structure is widely used both for skew reduction and for DCC. In this paper, an area-efficient DLL structure based on the merged dual SMD is proposed. The merged structure allows the forward delay array to be shared between the DLL and the DCC, yielding a 25% saving in the number of the required delay cells. The designed chip was fabricated using a 0.25- $\mu$ m one-poly, four-metal CMOS process. Measurement of the fabricated chip showed that the duty cycle of the output clock is corrected to within  $\pm 3\%$  for the input duty variation of  $\pm 30\%$  in the frequency range from 400MHz to 600MHz with the lock time within three clock cycles.

Key-Words: SMD(Synchronous Mirror Delay), DCC(Duty Cycle Correction), DLL(Delay Locked Loop)

# **1** Introduction

In most digital integrated circuits, the clock signal is distributed across the entire chip. So the synchronization of the clock is very important for proper operation of such cicuits. DLLs are widely used in high-speed memory and digital circuits for clock synchronization. DCC or DCCL(Duty Cycle Control Loop) is also frequently applied when both the rising and falling edges of the clock are employed in many high-speed circuits[1]. Recently, the increasing operating frequency of the memory and digital circuits puts more stringent requirements on the skew reduction and the duty cycle correction. In addition, locking and duty correction must occur within a few clock periods in the systems adopting the power saving mode[2].

DLL and DCC can be implemented by either analog or digital circuit. Digitally implemented DLL and DCC suffer from inherent quantization. While the analog version can avoid the quantization error, it can exhibit unstable transient behavior[3-4].

Recently the SMD(Synchronous Mirror Delay) structure is widely used both for skew reduction and

for DCC. The SMD circuit is an open loop circuit. It synchronizes the output clock to the input clock in only two clock cycles. Therefore the SMD can be used when fast locking is needed. The simple structure of the SMD circuit also reduces the effort needed to design it. The maximum operating frequency of a DLL is limited by the delay time of the unit delay cell while the minimum operating frequency is restricted by the length of the delay line. In order to realize a DLL with wide operating frequency range, the delay time of the unit delay cell must be small and the length of the delay line must be increased.

In this paper, an area-efficient DLL structure is proposed by merging the two SMDs used for phase alignment and for duty cycle correction. The new DLL with DCC can be applied to the high-speed memory such as XDR DRAMs(eXtreme Data Rate DRAM) that operate at 500MHz operating frequency with the input duty cycle variation from 20% to 80%. Conventional DLLs using the SMD structure for phase lock consist of an FDA(Forward Delay Array), a BDA(Backward Delay Array), and an MCC(Mirror Control Circuit). And DCCs using the SMD structure are usually made up of an FDA, a HCDL(Half Cycle Delay Line), and an MCC. So conventional DLLs with DCC capability using the SMD structure required four delay lines and two control circuits. But the proposed

<sup>\*</sup> This research was supported by the IDEC(IC Design Education Center), IT-SoC(Information Technology -System on Chip) and the second stage of Brain Korea 21.

DLL with DCC can be implemented by one FDA, a BDA, and an MCC by sharing the FDA between the phase alignment block and duty cycle correction block. As a result, the proposed SMD can significantly reduce the chip area.

# 2 SMD Type DLL and DCC

The SMD-type DLL synchronizes the delayed clock to the external reference clock by inserting some delay time within one clock period. The block diagram and circuits of SMD-type DLL is shown in Fig. 1 and Fig. 2, respectively. The SMD-type DLL aligns the phase of the internal clock to that of the external clock. The SMD uses a measurement delay line to output a signal with a desirable delay time during the first period. This output signal is sent to a mirror delay line to generate another signal after the same desirable delay time during the second clock period. So the SMD-type DLL produces a synchronous clock from third clock period[5]. The SMD-type DLL consists of FDA, BDA, and MCC. Let the delay time for the unit of FDA stage be  $t_{dF}$ , the delay time for the unit of BDA be  $t_{dB}$ , the phase difference between the external reference clock(CLK EXT) and delayed clock (CLK\_DLY) be  $t_{E-D}$ , and the clock period be  $t_{CLK}$ . Then the number(N) of unit delay stage for the phase alignment can be expressed by (1)

$$t_{E-D} + N \mathbf{g}_{dF} \cong t_{CLK} \tag{1}$$



Therefore, FDA is activated up to the Nth stage and that activated signal is propagated to BDA through MCC. Assume that  $t_{dF}$  is equal to  $t_{dB}$ , BDA is activated up to the N-th stage. So the total delay time of BDA is  $t_{CLK} - t_{E-D}$ . As a result, the total delay time of the external reference clock CLK\_EXT can be expressed by (2). The clock travels forward for the time of  $t_{CLK} - t_{E-D}$  through the FDA. The clock pulse is propagated backward through the BDA for as it is propagated forward through the FDA. The clock skew is suppressed in two clock cycles. The maximum clock skew is the unit delay time of the FDA or BDA.



Fig. 2 The delay array circuits of SMD type DLL

A backward pulse in N-th BDA is triggered when both the delayed clock input node and N-th FDA node are high. In other words, the propagated clock inside the FDA block is inverted into the backward clock when it is passed through the MCC. The backward clock is propagated through the BDA. The diagram and circuits of SMD-type DCC is shown in Fig. 3 and Fig. 4. The input clock which duty cycle is not 50% goes into a HCDL and an MDL(Measured Delay Line) that consists of SMDs. Similar to the SMD, the HCDL is composed of an FDA, a BDA and an MCC. However unlike the SMD, the FDA is used for the cycle time detection and the BDA is used for the half cycle time mirroring. Assume that the signal enters the FDA at the input node "A" in Fig.3 or Fig. 4, it will be delayed cell by cell. The cycle time measured by FDA is the total gate delay of N delay cells and the half of FDA's delay time was produced by BDA. Therefore, the signal will arrive at the output "B" after the half of the cycle time elapses[3]. On the whole, the total delay time from the first clock entering the HCDL to the first clock leaving the HCDL is the integer multiple of a half cycle time. From now on, this circuit behaves as a half cycle time delay line. If the input clock and the half cycle delayed clock are used as "B" and "C" input of a latch, respectively, the output of the latch is just the clock that possesses a 50% duty cycle. When the input clock passes through FDA and BDA, the number of unit delay stage corresponding to clock period N cannot be equal to the integer multiple of the clock period in an SMD-type DCC. It is the inherent quantization error as expressed in (3), (4), and (5).

$$N\mathbf{g}_{dF} \le t_{CLK} < (N+1)t_{dF} \tag{3}$$

$$(N-1)t_{dF} \le t_{CLK} < N\mathbf{g}_{dF} \tag{4}$$

$$E \le \frac{(N-1)t_{dF} - N\mathbf{g}_{dF}}{2} = \frac{t_{dF}}{2}$$
(5)



Fig 3 The diagram of SMD type DCC



The total number of the units in the FDA is determined by the  $t_{CLK}$  because the time during which the external reference clock is propagated through the FDA should be  $t_{CLK}$  -  $t_{E-D}$ . As the  $t_{CLK}$  increases, the number of the required unit delay cells in the FDA increases.

#### **3** Merged SMD-based DLL with DCC

The conventional SMD-type DLL and DCC consisting of FDAs, BDAs, and MCCs as shown in Fig. 5 requires a significant chip area .



Fig. 5 The structure of SMD type DLL and DCC

The external reference clock is fed to the DLL and DCC at the same time and the DCC operation starts after the DLL operation. Note that the same FDA and MCC blocks are used for the DCC block. In order to reduce the chip area, a new SMD-based DLL with DCC is proposed as shown Fig. 6. The FDA and MCC are shared. So the DLL and the DCC operates simultaneously. After the clock signal goes through

DC(Duty Cycle Control Circuits), the output clock has 50% duty cycle within quantization error. DC is implemented by TSPC(True Single Phase Clocked logic) latches[3]. The circuit of the delay line is shown in Fig. 7. In the first clock period, the delayed clock (CLK\_DLY) propagates through the FDA. Then the MCC delivers a signal that activates up to the Nth stage, to BDA and HCDL according to the external reference clock (CLK\_EXT). The BDA generates a signal that is synchronized with the external reference clock. The HCDL sends a half cycle delayed signal in the second clock period. The output signals of BDA and HCDL are supplied to DC in the third clock period. DC invert the output of BDA at the rising edge of HCDL's output which has half cycle delay time in the fourth clock period. Then, DC generates the output signal with 50% duty cycle which is synchronized with the external reference clock from the fifth clock.



Fig. 6 The diagram of SMD based DLL with DCC using reduced delay line



Fig. 7 The delay line circuit of SMD based DLL with DCC using reduced delay line

The Merged SMD-based DLL with DCC to align the phase of the delayed clock to that of the external reference clock. The FDA uses a measurement delay line to output a signal with a desirable delay time, which is measured automatically by hardware. This output signal is sent to a BDA to generate another signal after the same desirable delay time and half cycle delay time. When the pulse enters into the FDA, it will be delayed cell by cell. The output of each delay cell is NAND-ed with the external reference signal. After passing one cycle time t<sub>CLK</sub>, the second pulse rises. If the first pulse arrives at the output of the N-th cell, at this time, the output of the N-th NAND gate should also have a pulse, but the outputs of other NAND gates will remain low. The cycle time is then measured by hardware to be the total gate delay of N delay cells and the output of the N-th NAND gate indicates the measurement result. Now, the only one output pulse of the NAND-gate array is designed to traverse back through only N/2 delay cells in the BDA. Therefore, the pulse will arrive at the DCC\_OUT node after another half of the cycle time. On a whole, the total delay time from the first pulse entering the FDA to the first pulse leaving the HCDL is 1.5 cycle times. From now on, the pulse occurs at DCC\_OUT periodically with a cycle time of  $t_{CLK}$ , and this circuit behaves as a half cycle time delay line. Because N may be an odd number, there will be a quantization error of the duty cycle.

### 4 **Design Consideration and Simulation**

#### 4.1 The delay line

The delay line consists of digital gates in a recursive structure. The delay time of each delay stage should be the same and the layout has to be done so that each circuit may operate equally. Each delay stage is designed to have the same delay time of 103[ps]. The FDA and BDA consist of 27 delay stages and HCDL consist of 14 delay stages.

#### 4.2 The delay line length

The delay line length is designed to operate at the minimum frequency that has a longest cycle time. When a high frequency clock is applied, the delay line length may be longer than twice the cycle time, and there will be more than two pulse outputs that separate one cycle time and cause a harmonic-frequency error. The minimum frequency was determined to be 400MHz considering the chip area and characteristics of the delay line.

#### 4.3 The operating frequency range

The operating frequency range is determined by the characteristics of propagation delay and quantization error. When the frequency is low, the delay line length increases. If the frequency goes up, the duty cycle error grows mainly due to the quantization error and the delay time between the reference input and the output increases. By a given delay line length, the lower boundary of the operation frequency will be defined. On the other hand, if the total clock skew is longer than the clock cycle time, no delay cell will be activated to compensate the clock skew, and the function of the SMD will fail. This fact sets the upper boundary of the operation frequency.[6] Consequently, if N represents the number of delay cells and  $t_{dF}$  represents the delay time of cell the minimal operation frequency  $f_{min}$  is determined by (6)

$$f_{\min} \approx \frac{1}{N \mathbf{g}_{dF} + t_{E-D}} \tag{6}$$

And the maximum operation frequency  $f_{max}$  is determined by (7)

$$f_{\max} \approx \frac{1}{t_{E-D}} \tag{7}$$

#### 4.4 The acceptable duty cycle of the input

The duty cycle for the low frequency clock that goes through more delay stages decreases severely. In the case high frequency, the duty cycle correction error increases due to the quantization error.

#### 4.5 The speed and accuracy of the DCC

The synchronized signal is displayed from the third cycle due to the characteristics of SMD-based DLL. This signal is input to the latch in the fourth cycle. Finally, the signal that is locked with the corrected duty cycle is displayed in the fifth cycle. The output signal is displayed from the fifth clock regardless of the frequency within the input frequency range. The duty cycle control error increases as the frequency goes higher.

#### **4.6 Simulations**

When the 600MHz reference signal CLK\_EXT has 50% duty cycle and 500[ps] delayed signal CLK\_DLY is applied, the lock function of the DLL is completed in the third clock cycle and the locked signal with 49%-51% duty cycle is displayed from fifth cycle, as shown in Fig. 8. When a 600MHz clock signal, CLK\_DLY with 80-20% duty cycle is applied, the delay time is 51[ps] between the reference signal and the output signal.



Fig. 8 The Simulation waveform of SMD based DLL with DCC using reduced delay line. The duty cycle and delay time of delayed input lock(CLK\_DLY):30%, 500[ps]

# **5** Experimental Results

The chip fabricated by 0.25-µm one-poly, fourmetal CMOS process consists of an FDA, a BDA, an MCC, a HCDL, a DC and circuits for test(test), as shown in Fig. 9.



Fig. 9 Chip microphotograph.

The experimental results of the SMD-based DLL with DCC using reduced delay line are shown in Fig. 10. We found from the waveforms of the first row that 500MHz reference clock(CLK\_EXT) has 50% duty cycle. The delayed input(CLK\_DLY) has a delay time of 1ns and a duty cycle of 70% in the second row. The locked signal(DLL\_out) is synchronized to the reference signal (CLK\_EXT) in the third row, finally duty corrected signal(DCC\_out) that is synchronized to the reference signal (CLK\_EXT) has a delay time of 120ps compared to the reference signal in the fourth row. The Table 1 shows the experimental results according to the frequency and the duty cycle(DC).



Table 1. The experimental results

| Input     |     |                  | Output |                  |
|-----------|-----|------------------|--------|------------------|
| Frequency | DC  | t <sub>E-D</sub> | DC     | t <sub>I-O</sub> |
| 600MHz    | 80% | 400ps            | 47%    | 80ps             |
|           | 20% | 1ns              | 52%    | 120ps            |
| 500MHz    | 80% | 1ns              | 51%    | 120ps            |
|           | 20% | 400ps            | 52%    | 140ps            |
| 400MHz    | 80% | 400ps            | 53%    | 140ps            |
|           | 20% | 2ns              | 47%    | 170ps            |

The reference signal has a duty cycle of 50% at each frequencies 400MHz, 500MHz, and 600MHz. If the duty cycle of the delayed input at such frequencies is 80-20%, the external reference input and the delayed input have time lags ( $t_{E-D}$ ). The time lag( $t_{I-O}$ ) of the external reference input and the output signal and duty cycle of the output signal are shown in Table 1.

# 6 Conclusion

A new SMD-based DLL with DCC using fewer number of delay cells were proposed and verified through chip fabrication and measurements for applications in high-speed digital circuits and high-speed memory. The designed circuit consists of only digital gates. The locking and duty correction operation are achieved separately and requires only five clock cycles. The designed SMD-based DLL with DCC was fabricated with a 0.25-µm one-poly, fourmetal CMOS process. The test result is that the acceptable duty cycle of the input signal ranges from 20% to 80% with the corrected duty cycle varying from 47% to 53%. The designed DLL had a duty cycle error of  $\pm 1\%$ , but each unit delay's delay time may be different due to the PVT(Process, Voltage, and Temperature) variations. When the Merged SMD-based

DLL with DCC runs at 400MHz ~ 600MHz, experimental results for duty-cycle of the output clock for different frequency input clocks with different duty-cycles are shown in Fig. 11. If the duty-cycle of the input clock ranges from 20 to 80%, the Merged SMD-based DLL with DCC has a performance of nearly 3% deviation.



Fig. 11 The experimental results at each frequency

The experimental results show that the duty cycle error is within  $\pm 3\%$  when compared with the design. But, these characteristics satisfy the condition that the XDR DRAM demands. And this circuit can be used for high-speed circuits that need short lock time. Especially, the area is 25% smaller than the conventional SMD-type DLL and DCC, and locking and duty cycle correction occur at the same time.

#### Reference

- [1] Toru Ogawa and Kenji Taniguchi, A 50% Duty-Cycle Correction Circuit for PLL output, IEEE International Symposium on Circuits and Systems, Vol.4, May. 2002, pp. IV-21 -IV-24.
- [2] SAMSUNG Electronics, 256Mbit XDR DRAM, 2M x 16(/8/4) bit x 8s Banks, Version 1.0, 2005, pp. 55-61.
- [3] Yi-Ming Wang and Jinn-Shyan Wang, An ALL Digital 50% Duty-Cycle Corrector, ISCAS, 2004, pp. II925-II928.
- [4] J. M. Rabaey, Digital Integrated Circuits, Prentice Hall,
- [5] Takanori Saeki, et al, A 2.5ns Clock Access, 250 MHz, 256-Mb SDRAM with Synchronous Mirror Delay, IEEE Journal of Solid-State Circuits, Vol. 31, Nov. 1996, pp. 1656-1668.
- [6] Kuo-Hsing Cheng, Chen-Lung Wu, Yu-Lung Lo and Chia-Wei Su, A Phase-Detect Synchronous

Mirror Delay for Clock Skew Compensation Circuits, Circuits and Systems, 2005. ISCAS 2005. IEEE International Symposium, Vol.2, May. 2005, pp. 1070-1073.