### Performance Analysis of Direct Digital Synthesizer Architecture with Amplitude Sequencing

I. JIVET, B.DRAGOI Electronics and Telecommunications Department University 'Politechnica' Timisoara V Parvan No. 2, Timisoara ROMANIA ioan.jivet@etc.uot.ro, beniamin.dragoi@etc.uot.ro, http://www.etc.upt.ro

*Abstract:* -The paper presents an analysis of a novel DDS architecture based on direct amplitude algorithmic generation in its capacity to generate pure and high frequency signals. The algorithm implementation requires very few resources and is linear in digital precision length. In its simplest form includes one compare, two additions and several increments per iteration step. A solution to compensate the known major drawback of the algorithm, sample phase non uniformity in time, is presented. The versatility of direct digital synthesizers in frequency switching and phase modulation applications is shown to be preserved in the new architecture. The paper also presents an analysis of the potential of the proposed architecture performance as reflected in the low harmonic spectrum of the quadrature output signals. A FPGA implementation of the architecture was synthesized and simulated using a VHDL description to validate the solution as proposed. Synthesis reports are compared with other FPGA direct frequency synthesizer implementations as reported in recent literature.

Key-Words: - amplitude centered DDS architecture, generated sine/cosine spectrum, FPGA implementation

### **1** Introduction

Waveform generation in industry and research shifted to digital solutions in the last period of time. Sinusoidal signals at high frequency and spectral purity were among the last domain to resist digital generation. The latest development of the DDS architecture has changed this domain as well.

The analog solution of sinusoidal signal generation using PLL's was replaced by DDS based solutions in most areas of applications [1] [8].

The major problem of the DDS architecture is the necessity to do phase truncation in order to use a reasonable amount of ROM on the chip although some alternative solutions have been reported to contain the problem [2][3][4].

The proposed architecture basic principle is the replacement of the phase to sine look up table (LUT) module with direct amplitude values calculated algorithmically. The amplitude values are obtained from a extended variant of the Jordan's nonparametric circle generator algorithm. The algorithm provides values as precise as necessary for the amplitude of sine and cosine functions. The values are the coordinates of a generic approximating point that tracks the circle following a rectangular grid according to a distance minimization criterion [5] [6].

The x and y sequences of coordinates determined algorithmically for curve generation purposes can be used as sampled sine and cosine functions values.

Amplitude value sequences obtained using the proposed algorithm need to be positioned non uniformly in time requiring a minimum period in between samples to accommodate precise adjustments for minimal phase error.

A compensation method is presented to solve this inherent problem of the algorithm. The direct amplitude generation with phase compensated introduces a limitation in the generated signal maximum frequency.

The two main advantages of the novel architecture are the low gate count of the hardware implementation and the absence of the phase truncation issues of the phase centered architectures.

The performance of the proposed architecture in its capacity to generate pure and high frequency signals is presented and the FPGA efficiency of implementation is compared with other DDS architecture recent implementations as reported in the literature [9] [10] [11]

# 2 Extended algorithm with phase timing compensation

According to the original Jordan's algorithm the coordinates of the tip of the rotating vector are determined sequentially as incremental steps. The steps are taken on one or the other axis corresponding to the minimal distance to the circle based on calculated implicit function value.

The drawback of Jordan's curve generating algorithm is that the approximating point tracking the circle does not move at a constant speed.

The non uniformity is given by the following formula as indicated in the original paper:

$$Ni/R = 1 + sin(sigma(t)) - cos(sigma(t))$$
(1)  
(1)

where Ni is the i-th approximating step, R is the generating vector length and sigma the phase.

A MathLab implementation of the algorithm used in simulations is presented below:

%START AT: xo=2\*Circle\_Radius and yo=0 %INIT: Flast=0,fx=2\*Circle\_Radius,fy=0 %STEP: counter\_clockwise= 1st Quadrant

Incy=1; Incx=-1; % Loop Position Approximating

while i <= 2\* Circle\_Radius; %Update Implicit Eq. values for possible steps in x and y direction using Taylor series expansion;

ImpFx= Flast + fx\*Incx + Incx\*Incx; ImpFy= Flast + fy\*Incy + Incy\*Incy; %Determine step parameters

if abs(ImpFy) <= abs(ImpFx); Flast = ImpFy and Incx = 0; else Flast = ImpFx and Incy = 0; (Undate coord and the values of

%Update coord., and the values of directional derivatives computed by Taylor series expansion;

fx= fx + 2\*Incx; and fy = fy + 2\*Incy;% New Position Absolute Coordinates;

```
\begin{aligned} x &= x + Incx; \quad y &= y + Incy; \\ xo(i) &= xa + x; \quad yo(i) = ya + y; \\ end . \end{aligned}
```

The step increments Incx and Incy take the values  $\{1, -1\}$  if selected for a move along respective axis and zero if not selected. Variables fx, fy are the calculated directional derivative values of the circle implicit function. IncFx and IncFy are the implicit function values calculated on each iteration on the basis of its previous value Flast. The coordinates are variables x, y and absolute coordinates xo, yo.

A manually instantiated block diagram for a hardware implementation of the algorithm is given in Fig. 2. The implementation is not minimal in silicon area but is balanced to have symmetric time latency with respect to phase advancements in both x and y directions. Also some registers have been



implemented using counters to save computation time when unity additions/subtractions are required.

Fig.1 Block diagram of the hardware implementation of the generator algorithm.

Immediate use of the algorithmically determined values with uniform timing in the generation of sine and cosine functions results in phase distortions that need to be compensated to become acceptable.

The solution for the compensation of the phase distortions in signal generation is the preservation of the same time non uniformity distribution of the samples as when the values where determined.

For the determination of the correct time distribution of samples at signal generation we propose a method based on formula (1) interpretation.

The coordinates of the approximating point to the circle are interleaved proportional actual phase sine and cosine functions values.



Fig. 2 Phase non uniformity compensation method: sample delay proportional to step chord lengths is approximated by cosine (current phase).

A alternative geometrical derivation of the same result is represented in Fig. 2. The phase angle advanced at each step of the algorithm along one axis is proportional to the sine or the cosine value at the corresponding current phase advancement as expressed by equations (2) - (4),

$$AC = AB \cos(sigma)$$
 (2)

$$\cos(\text{sigma}) = Ax/R$$
 (3)  
AC ~ Ax (4)



Fig 3. A compensated 10 bit resolution sequence is not distinguishable from a calculated cosine function to the order of the quantization error.

The MathLab simulation as presented in Fig 3. proves the efficiency of phase non uniformity compensation method as proposed.

The simulation shows that for a 10 bit amplitude signal generator implemented in a FPGA the compensation method proposed is adequate. For higher resolution signals a more elaborate sample timing distribution method must be developed [10].

### **3 Amplitude core DDS Architecture**

### 3.1 Proposed new architecture principle of operation

The work reported in the paper was focused on the study of the new class of DDS architecture using the trigonometric function algorithmic generation as the core module.

The main characteristic of the new class is the absence of the phase to sine look up table translator module. The phase to frequency transfer function core is replaced by the direct algorithmic amplitude generating core. The output of the module represents a sequence of values that tracks in space the co-ordinate of the tip of the rotating vector rather then its phase. The phase can be accounted for separately in a separate counter for reference in applications where phase information is need.



Fig.4 Amplitude based DDS architecture block diagram illustrating the principle.

The block diagram in Fig. 4 presents the principle of the proposed DDS architecture. Due to the fact that the algorithm computes both the sine and cosine values of successive central angles the quadrature output is obtained. The co-ordinate generator core implementing the algorithm in hardware receives as input AT(n) and FT(m), the amplitude and frequency tuning words of length n, m. Register rt and counters ct implement amplitude storage and general system clock Ck counting. The phase compensations counters c can be set to any resolution smaller then n trading accuracy to maximum signal frequency.

The supplementary phase counter not really necessary in the architecture as proposed can be used for parallel phase accounting. The phase value is not used by the signal generating algorithm. It needs its own circuitry if phase dependent events must also be implemented in particular applications.

## **3.2 Main advantages of the proposed architecture**

The defining characteristic of the proposed architecture is the linear dependence of complexity with resolution. Exponential ROM capacity resource explosion of classical DDS is not a characteristic of the new architecture. The only remaining main limiting factor is the rank of the DAC converter, very pure signals generation being possible with minimal resources. The spur frequencies in the output due to phase truncation have also been eliminated entirely.

<u>Frequency tunning</u> is immediate as seen in the diagram by using a period counter on the master frequency clock loaded at cycle start with the tuning word. The frequency tuning word can change at any moment during the cycle and all the other module will follow with ought any precision loss in the generated signals and preserving the phase accuracy.

The upper frequency limit is given by the maximum clock frequency available in the target technology scaled down by the phase compensation counters length depending on the desired phase accuracy.

The very accurate and versatile FSK capacity of DDS architecture is preserved with frequency hopping speed dependent on frequency tuning counter loading speed only.

<u>The quadrature output</u> with a very high match inherent to DDS is preserved since the algorithm and there fore the hardware implementation determines both sine and cosine values simultaneously. In fact the two can not be determined independently.

<u>Amplitude tuning</u> is simple to implement on a cycle basis by changing the central vector length. The amplitude of the generated signal and its output frequency can be adjusted to any value but must be correlated when a amplitude change. Most PSK common modulation methods are thus supported with ought supplemental circuitry.

Both the amplitude and the phase can be modified easily at the beginning of each octant. More complex phase and amplitude modulation schemes are possible with additional circuitry for the jumps.

#### **3.3** Spectrum of the generated signals

The characteristics of the output generated signal are very important for successful application of the architecture in practice.

It is known from the original paper that the sample values are non uniformly distributed with respect to phase [5]. Immediate use with uniform timing of the algorithmically determined values for the generation of sine /cosine functions results in phase distortions.

The FFT analysis of the generated signals using the original algorithm reveals a power spectrum as presented in Fig. 5. It can be seen that the first odd harmonics are a factor of about -35 dB below the central frequency.



Fig 5. The power spectrum of the generated signal using the original Jordan's algorithm with no phase timing compensation.

High quality signals can be generated with the extended algorithm that includes phase timing compensation. The non uniformity of the signal distribution makes spectrum determination difficult.

The spectrum presented in Fig. 5 is a estimation based on a signal with the same form and uniform sampling. The first harmonic which are the odd harmonics are estimated at low power levels in excess of -50 dB.

The estimation was done on a conservative uniformly sampled wave interpolated from the original one.



Fig. 5 Estimated power spectrum of the signal generated with the extended algorithm as proposed using phase timing compensation.

The phase timing compensation is obtained by adjusting the sample timing. As a result the maximum frequency of the generated signal is reduced accordingly. This is a draw back with respect to recent DDFS generators reported in literature with advanced frequency range and precision [7], [9].

A trade off can be made between sample delay accuracy and maximum generated signal frequency. Using a small word length truncated sample delay results in an increased upper frequency limit for the generated signal.



Fig.7 A VHDL structural description simulation of an implementation of the DDS architecture as proposed in a Xilix ICE development environment.

An implementation in Xilinx FPGA was studied using a VHDL description of the architecture as proposed. The motivation was a comparison for calibration with recent FPGA solutions for generator applications [10] [14].

The synthesis results for 16 bit version of the architecture confirm the main advantage of the proposed architecture. The implementation requires low gate count compared with classical DDS implementations [11] [12] [13].



|                                                |             | VAR1 P            | roject Status   |              |  |
|------------------------------------------------|-------------|-------------------|-----------------|--------------|--|
| Project File:                                  | var1.ise    |                   | Current State:  | Placed and I |  |
| Module Name:                                   | et2         |                   | • Errors:       |              |  |
| Target Device:                                 | xc2v500-6   | ilg256            | • Warnings:     |              |  |
| Product Version:                               | ISE 9.1.01  | i                 | Updated:        | Wed Mar 28   |  |
|                                                |             | VAR1 Par          | tition Summary  |              |  |
| No partition information was found             | i.          |                   |                 |              |  |
|                                                |             | Device Util       | ization Summary |              |  |
| Logic Utilization                              |             | Used              | Available       | Utilization  |  |
| Number of Slice Flip Flops                     |             | 26                | 6,144           | 1%           |  |
| Number of 4 input LUTs                         |             | 136               | 6,144           | 2%           |  |
| Logic Distribution                             |             |                   |                 |              |  |
| Number of occupied Slices                      |             | 71                | 3,072           | 2%           |  |
| Number of Slices containing only related logic |             | 71                | 71              | 100%         |  |
| Number of Slices containing unrelated logic    |             | 0                 | 71              | 0%           |  |
| Total Number of 4 input LUTs                   |             | 136               | 6,144           | 2%           |  |
| Number of bonded IOBs                          |             | 17                | 172             | 9%           |  |
| Number of GCLKs                                |             | 1                 | 16              | 6%           |  |
| Total equivalent gate count for design         |             | 1,167             |                 |              |  |
| Additional JTAG gate count for IOBs            |             | 816               |                 |              |  |
|                                                |             | Performa          | nce Summary     |              |  |
| Final Timing Score:                            | 0           |                   | Pinout Data:    | Pinout Repo  |  |
| Routing Results:                               | All Signals | Completely Routed | Clock Data:     | Clock Repor  |  |
| Timing Constraints:                            | All Constra | ints Met          |                 |              |  |

| Table 2      |        |         |       |          |  |  |
|--------------|--------|---------|-------|----------|--|--|
| Reference    | Reso-  | Slices/ | SFDR  | Out fre- |  |  |
|              | lution | Cells   |       | quency   |  |  |
| Gonçalves[7] | 16 bit | 1618    | 64 dB | 12 Mhz   |  |  |
| Andraka[12]  | 14 bit | 600     | -     | 52 Mhz   |  |  |
| Altera[11]   | 24 bit | 315+RAM | 100dB | 1 Mhz    |  |  |
| Present work | 16 bit | 88      | 50 dB | 1 Mhz    |  |  |

Table 2 shows comparable performance with other DDS implementations as reported in the literature. These results are however modest in frequency range when compared with recent reports of dedicated ASIC solutions for signal generation and processing applications [15] [16].

#### **4** Conclusion

A novel DDS architecture based on algorithmically generated amplitudes is proposed and analyzed in its capacity to produce accurate high speed sine/cosine signals.

The main advantage of the proposed architecture over classical DDF architecture is the economy of resources in the implementation.

The principle of operation of the architecture is presented as well as its main advantages. It is also shown that most advantages of DDS architecture are preserved. The phase truncation problem inherent to phase core DDS architectures is eliminated.

The purity of the generated signal and the maximum output signal frequency are shown to be inversely related and remain a trade off option.

Numerical results of the simulations indicate that using the proposed architecture in today's FPGA's can be used to generate low harmonic signals up to frequencies in the MHz range.

The resources necessary in FPGA implementation for 16 bit resolution is less then 100 slices/cells factor of up to 10 below other architecture FPGA implementations in use today.

Further work is necessary to include the phase accumulator into the architecture for extended support of phase based events.

References:

- Analog Devices, A Technical Tutorial on Digital Signal Synthesis, Analog Devices, Inc., 1999.
- [2] E. Grayver and B. Daneshrad, "Direct Digital Frequency Synthesis Using a Modified CORDIC", Proceedings of the IEEE International Symposium on Circuits & Systems, Monterey, CA, USA, 31 May -- 3 June, 1998 pp.241-244.
- [3] R. Meitzler, W. Millard, "A direct digital frequency synthesizer prototype for space applications," Proceedings of the NASA Symposium on VLSI Design, 2003.
- [4] J. M. P. Langlois and D. Al-Khalili, "Novel approach to the design of direct digital frequency synthesizers based on linear interpolation," IEEE Transactions on Circuits and Systems II, vol. 50, September 2003, pp. 567–578.
- [5] B.W. Jordan, W.J. Lennon, and B.D Holm. An Improved Algorithm for the Generation of Nonparametric Curves. IEEE Transactions on Computers, Vol.C-22, No. 12, 1973, pp. 1052-1060.
- [6] M. M. Saleem, M. S. Saleem, Bresenham type fast algorithm for 3-D linear and helical movement in CNC machines, WSEAS Transactions on Systems, Issue 4, Volume 3, June 2004 pp. 1548-1554.
- [7] J. Gonçalves, J. R. Fernandes, M. M. Silva, A Reconfigurable Quadrature Oscillator Based on a Direct Digital Synthesis System, *Design of Circuits and Integrated Systems (DCIS'06)*, November 2006.
- [8] J Whittington, J. Devlin, and T. Salime, Evaluation of Digital Generation and Phasing Techniques for Transmitter Signals of the

TIGER N.Z. Radar, WARS'02 Workshop on Applications of Radio Science, 20-22 February 2002, Leura, NSW.

- [9] J. Vankka, M. Waltari, M. Kosunen, and K. A. I. Halonen, A Direct Digital Synthesizer with an On-Chip D/A-Converter, IEEE Journal of Solid-State Circuits, Vol. 33, No. 2, February 1998, pp.218-22.
- [10]W. Gemin, R. Rivera, R. Hidalgo and J. Fernandez, CPLD-based arbitrary waveform generator, WSEAS Transactions on Systems, Issue 4, Volume 3, June 2004 pp. 1570-1574.
- [11]NCO Megacore Function/User Guide, http://www.altera.com/literature/ug/ug\_nco.pdf, Altera, May 2007
- [12]R. Andraka, A survey of the CORDIC algorithms for the FPGA computers, Proceedings of the ACM/SIGDA Sixth International Symposium on FPGA, 3/1998, pp 191-200.
- [13]V. F. Kroupa (Ed.) Direct Digital Frequency Synthesizers, Wiley-IEEE Press, ISBN: 978-0-7803-3438-0, November 1998.
- [14]J. G. Mailloux, S. Simard, R. Beguenane FPGA implementation of induction motor vector control using xilinx system generator, The 6th WSEAS Intern. Conf. on Circuits, Systems, Electronics, Control and Signal processing (CSECS'07), Cairo, Egypt, Dec. 29-31, 2007, pp.252 -258
- [15]S. Thuries, Conception et intégration d'un synthétiseur digital direct micro-onde en technologie silicium SiGe: C 0.25um, Master Thesis, Universite Paul-Sabatier de Toulouse, 2006.
- [16]M. Sherif A Programmable ASIC Design of a Low Sensitivity Sampled Data Filter, The 6th WSEAS Intern. Conf. on Circuits, Systems, Electronics, Control and Signal Processing (CSECS'07), Cairo, Dec., 2007, pp.52-57.