#### UNIVERSITY OF CALGARY

### Linearization of RF Power Amplifiers using Digital Predistortion Technique

Implemented on a DSP/FPGA Platform

by

Andrew Ka Chung Kwan

#### A THESIS

# SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE

#### DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

#### CALGARY, ALBERTA

#### AUGUST, 2009

© Andrew Ka Chung Kwan 2009

## UNIVERSITY OF CALGARY FACULTY OF GRADUATE STUDIES

The undersigned certify that they have read, and recommend to the Faculty of Graduate Studies for acceptance, a thesis entitled "Linearization of RF Power Amplifiers using Digital Predistortion Technique Implemented on a DSP/FPGA Platform" submitted by Andrew Ka Chung Kwan in partial fulfilment of the requirements of the degree of Master of Science.

Supervisor, Dr. Fadhel Ghannouchi.

Department of Electrical and Computer Engineering

Co-Supervisor, Dr. Michael Smith, Department of Electrical and Computer Engineering

Dr. Vahid Garousi, Department of Electrical and Computer Engineering

Dr. Richard Barton Hicks, Department of Physics and Astronomy

Dr. Yaoping Hu,

Department of Electrical and Computer Engineering

\$ 23, 2009 Date

#### Abstract

The thesis deals with the design of a digital predistortion system. This was implemented on a digital signal processor/field programmable gate array (DSP/FPGA) to provide an environment suitable for characterizing and linearizing power amplifiers to improve their operating efficiency. The characterization of the power amplifier is first outlined in simulation, and two behavioural models are used to characterize the power amplifier nonlinearity: the memoryless, and the memory polynomial models. To decrease the computational complexity of the memory polynomial algorithm in the DSP, two adaptive filter algorithms were introduced to solve for the polynomial coefficients. Both were shown to reduce the number of processing cycles while being able to achieve the same performance compared to the singular value decomposition algorithm. The DSP/FPGA solution was able to achieve the 3GPP linearity requirements for both a mildly nonlinear class AB power amplifier, and a highly nonlinear Doherty power amplifier.

 $t^{\gamma}$ 

#### Acknowledgements

I would like to express my gratitude towards the people that have assisted and supported me during the course of this research project.

First, I would like to thank my supervisor, Dr. Fadhel Ghannouchi, and cosupervisor, Dr. Michael Smith, for allowing me to pursue my Master's degree at the University of Calgary. Their constant encouragement and valuable advices were essential in the completion of this thesis. I would also like to thank the examination committee, consisting of Dr. Vahid Garousi, Dr. Yaoping Hu, and Dr. Bart Hicks, for their time and effort to evaluate the thesis and for their feedbacks.

Next, I would like to thank the present and former members of the iRadio Lab at the University of Calgary for their technical assistance and moral support, especially Dr. Mohamed Helaoui and Dr. Oualid Hammi. I would also like to thank Dr. Slim Boumaiza and Albert Tran for their helpful technical expertise towards this project.

This work would not have been achieved without the aid of technicians Christopher Simon, Warren Flaman and John Shelley. I would like to thank them for taking their time to help me complete this project.

Finally, I would like give special thanks to my family for their unconditional love and support.

iii

## Dedication

.

•

To my parents

Sze Loong and Corazon Kwan

| Abstract                                                             | ii                |
|----------------------------------------------------------------------|-------------------|
| Acknowledgements                                                     | iii               |
| Dedication                                                           | iv                |
| Table of Contents                                                    | v                 |
| List of Tables                                                       | vii               |
| List of Figures and Illustrations                                    | . viii            |
| List of Symbols, Abbreviations and Nomenclature                      | xi                |
| CHAPTER ONE: INTRODUCTION                                            | 1                 |
| 1 1 Wireless Transmitters                                            | 1                 |
| 1 2 PA Nonlinearity Effects                                          | 1<br>A            |
| 1.3 Nonlinearity Compensation Techniques                             | <del>-</del><br>5 |
| 1.3.1 Feedforward Linearization (Analog Correction)                  | 5<br>6            |
| 1.3.2 Cartesian Feedback Linearization (Analog Correction)           | 00<br>6           |
| 1.3.2 Cartesian receiver Encanzation (Analog Concetion)              | 0<br>7            |
| 1.3.4 Comparison of Linearizers                                      | /<br>ي            |
| 1.4 Thesis Organization                                              | ٥<br>۵            |
| 1.4 Thesis Organization                                              | ۶۶<br>۱۵          |
|                                                                      | 10                |
| CHAPTER TWO: CHARACTERIZATION AND BEHAVIOURAL MODELING               |                   |
| OF TRANSMITTERS AND PAS                                              | 12                |
| 2.1 Evaluation Metrics                                               | 12                |
| 2.2 PA Characterization Setup                                        | 14                |
| 2.3 Power Amplifier Behavioural Modeling                             | 16                |
| 2.3.1 Delay Compensation                                             | 16                |
| 2.3.2 Memoryless Model                                               | 17                |
| 2.3.3 Memory Polynomial Model                                        | 19                |
| 2.3.3.1 Optimized Identification Techniques                          | 21                |
| 2.4 Results from Model Validation.                                   | 21                |
| 2.5 Conclusion                                                       | 25                |
|                                                                      |                   |
| CHAPTER THREE: BASEBAND DIGITAL PREDISTORTION                        | 26                |
| 3.1 Digital Predistortion                                            | 26                |
| 3.1.1 Theory of Memoryless Digital Predistortion                     | 27                |
| 3.1.1.1 Model Validation for Memoryless Digital Predistortion        | 27                |
| 3.1.2 Theory of Memory Polynomial Digital Predistortion              | 28                |
| 3.1.2.1 Model Validation for Memory Polynomial Digital Predistortion | 29                |
| 3.1.3 Simulation of Digital Predistortion Model's Performance        | 29                |
| 3.2 Digital Predistortion Experimental Results                       | 31                |
| 3.3 Conclusion                                                       | 36                |
| CHAPTER FOUR: BASEBAND DPD IDENTIFICATION ON A DSP PI ATFORM         | 37                |
| 4.1 DSP Implementation                                               |                   |
| 4.1.1 Predistortion Identification                                   | 38                |
| 4.2 Embedded Software Testing                                        | 39                |
|                                                                      |                   |

.

## **Table of Contents**

•

,

| 4.3 DSP and Simulation Model Accuracy Results                             |    |
|---------------------------------------------------------------------------|----|
| 4.4 Linearization Performance using DSP generated predistortion models    | 45 |
| 4.5 Conclusion                                                            | 50 |
| CHAPTER FIVE: ARBITRARY WAVEFORM GENERATORS AND                           |    |
| LINEARIZED TRANSMITTERS FOR 3G APPLICATIONS                               | 51 |
| 5.1 FPGA Implementation                                                   | 53 |
| 5.2 DSP/FPGA Communication Link                                           |    |
| 5.3 DSP Predistortion Synthesis Issues for FPGA Predistortion Correction. |    |
| 5.3.1 Memoryless Predistortion                                            | 56 |
| 5.3.2 Memory Polynomial Predistortion                                     | 57 |
| 5.4 Baseband Digital Signal to Analog RF Waveform                         | 59 |
| 5.4.1 AD9779A Spectrum Analysis                                           | 60 |
| 5.5 Linearization Results with Class AB PA at 1.96 GHz                    | 61 |
| 5.6 Linearization Results with Highly Nonlinear Doherty PA at 2.14 GHz.   | 65 |
| 5.7 Conclusion                                                            | 67 |
| CHAPTER SIX: CONCLUSIONS AND FUTURE WORKS                                 | 68 |
| 6.1 Summary and Conclusions                                               | 68 |
| 6.2 Directions for Future Work                                            | 71 |
| REFERENCES                                                                | 72 |

.

,

,

### List of Tables

| Table 1.1 Comparison of different linearization methods in terms of complexity,         efficiency and bandwidth                                                                                                                                                                                                                       |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Table 2.1 List of NMSE values calculated between captured output and simulation models output                                                                                                                                                                                                                                          |
| Table 3.1 Mean and variance of the power amplifier output spectrum for 27 differentWCDMA waveforms with memory polynomial digital predistortion applied                                                                                                                                                                                |
| Table 3.2 ACPR results for multicarrier WCDMA waveforms using a class AB PA, with and without digital predistortion. The memoryless DPD does not meet the 3GPP requirements for a four carrier signal at 10 MHz offset. The memory polynomial DPD meets all the 3GPP requirements for one to four carrier signals35                    |
| Table 4.1 Performance evaluation in simulation and DSP for power amplifiermodelling of Class AB PA                                                                                                                                                                                                                                     |
| Table 4.2 Memory usage for each model in the DSP    44                                                                                                                                                                                                                                                                                 |
| Table 4.3 Performance evaluation in simulation and DSP for digital predistortion of<br>Doherty amplifier                                                                                                                                                                                                                               |
| Table 4.4 ACPR results for multicarrier WCDMA waveforms using a Doherty PA, with and without digital predistortion. The memoryless DPD fails to meet the requirements for the three and four carrier signals, and the memory polynomial DPD fails to meet the requirements for the four carrier signal                                 |
| Table 5.1 Number representations resource utilization    54                                                                                                                                                                                                                                                                            |
| Table 5.2 Total resource utilization in the FPGA    56                                                                                                                                                                                                                                                                                 |
| Table 5.3 ACPR results for multicarrier WCDMA waveforms using a class AB PA using DSP/FPGA transmitter, with and without digital predistortion. The memoryless DPD fails to meet the requirements for three and four carrier signals, while the memory polynomial DPD meets the requirements for one to four carrier signals           |
| Table 5.4 ACPR results for multicarrier WCDMA waveforms using a Doherty PA<br>using DSP/FPGA transmitter, with and without digital predistortion. The<br>memory polynomial DPD using the DSP/FPGA is able to meet the requirement<br>for the four carrier signal, where it was unable to meet the requirement for the<br>VSG/VSA setup |

.

## List of Figures and Illustrations

.

| Figure 1.1 Block diagram of a conventional direct conversion wireless transmitter                                                                                    |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Figure 1.2 Baseband multi-carrier signal generation                                                                                                                  |
| Figure 1.3 Gain (solid trace) and drain efficiency (circle markers) versus input drive level P <sub>in</sub> for a simulated class AB power amplifier                |
| Figure 1.4 Frequency spectra of the input signal (circle markers), and the output signal (square markers) are compared with the PA in a non-linear mode of operation |
| Figure 1.5 System diagram of an analog feedfoward linearizer (Rummery and Branner 1997)                                                                              |
| Figure 1.6 System diagram of an analog Cartesian feedback linearizer (Briffa and Faulkner 1996)                                                                      |
| Figure 1.7 System diagram of digital baseband predistortion (Cavers 1990)                                                                                            |
| Figure 2.1 The Adjacent Channel Power Ratio (ACPR) can be calculated as the ratio of the main channel power over the adjacent channel power in the frequency domain  |
| Figure 2.2 Experimental power amplifier characterization setup                                                                                                       |
| Figure 2.3 In the memory polynomial model diagram, the output is constructed by multiplying delayed versions of the input signal by a delay dependent function 19    |
| Figure 2.4 Calculating the NMSE between the captured output, and the simulation model output                                                                         |
| Figure 2.5 (a) AM AM Characteristics and (b) AM DM characteristics of a class AP                                                                                     |
| power amplifier using the memoryless model                                                                                                                           |
| <ul> <li>Figure 2.6 (a) AM-AM Characteristics and (b) AM-PM characteristics of a class AB power amplifier using the memoryless model</li></ul>                       |
| <ul> <li>Figure 2.5 (a) AM-AM Characteristics and (b) AM-FM characteristics of a class AB power amplifier using the memoryless model</li></ul>                       |

| Figure 3.2 (a) AM-AM Characteristics and (b) AM-PM characteristics for the memoryless digital predistortion model                                                                                                                     | 8 |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|
| Figure 3.3 (a) AM-AM Characteristics and (b) AM-PM characteristics for the memory polynomial digital predistortion model                                                                                                              | 9 |
| Figure 3.4 (a) AM-AM Characteristics and (b) AM-PM characteristics for the cascade of the memoryless DPD/memoryless PA models and the cascade of the memory polynomial DPD/memory polynomial PA models compared with the PA models 30 | 0 |
| Figure 3.5 Output spectra of digital predistortion models, showing that the memoryless model has more spectral regrowth compared to memory polynomial models                                                                          | 1 |
| Figure 3.6 Validating digital predistortion using the experimental setup                                                                                                                                                              | 2 |
| Figure 3.7 (a) One carrier (b) Two carrier (c) Three carrier and (d) Four carrier WCDMA linearization results using a class AB PA                                                                                                     | 4 |
| Figure 4.1 Graphical user interface developed for predistortion testing between the DSP and MATLAB software                                                                                                                           | 0 |
| Figure 4.2 Example code listing used to control execution of different algorithms implemented in the DSP                                                                                                                              | 1 |
| Figure 4.3 Calculating NMSE between the captured output, model output in simulation, and the DSP model output                                                                                                                         | 2 |
| Figure 4.4 Validating DSP generated digital predistortion models using the experimental setup                                                                                                                                         | 5 |
| Figure 4.5 (a) AM-AM Characteristics and (b) AM-PM characteristics of a Doherty based Power Amplifier                                                                                                                                 | 5 |
| Figure 4.6 (a) One carrier (b) Two carrier (c) Three carrier and (d) Four carrier WCDMA linearization results using a Doherty PA                                                                                                      | 3 |
| Figure 5.1 Proposed system block diagram of standalone digital predistorter 52                                                                                                                                                        | 2 |
| Figure 5.2 System diagram of a predistortion look up table implemented in the FPGA 55                                                                                                                                                 | 5 |
| Figure 5.3 Relation scheme from polynomial coefficients to LUT                                                                                                                                                                        | 3 |
| Figure 5.4 Connectivity between FPGA and DAC evaluation board                                                                                                                                                                         | 9 |
| Figure 5.5 Texas Instruments DAC5667 output of a one carrier WCDMA signal                                                                                                                                                             | ) |
| Figure 5.6 Analog Devices AD9779A output of a one carrier WCDMA signal with 4x interpolation enabled                                                                                                                                  | 1 |

.

| Figure 5.7 DSP/FPGA system prototype used for baseband digital predistortion                                                                              | 52 |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| Figure 5.8 Validating the DSP/FPGA digital predistortion prototype                                                                                        | 52 |
| Figure 5.9 (a) One carrier (b) Two carrier (c) Three carrier and (d) Four carrier WCDMA linearization results for Class AB PA using DSP/FPGA transmitter  | 53 |
| Figure 5.10 (a) One carrier (b) Two carrier (c) Three carrier and (d) Four carrier WCDMA linearization results for Doherty PA using DSP/FPGA transmitter6 | 55 |

.

•

.

ı

## List of Symbols, Abbreviations and Nomenclature

.

| Symbol | Definition                             |
|--------|----------------------------------------|
| 3G     | Third Generation                       |
| 3GPP   | Third Generation Partnership Project   |
| ACPR   | Adjacent Channel Power Ratio           |
| ADC    | Analog to Digital Converter            |
| AM     | Amplitude Modulation                   |
| AWG    | Arbitrary Waveform Generator           |
| DAC    | Digital to Analog Converter            |
| dB     | Decibels                               |
| dBc    | Decibels relative to carrier           |
| dBm    | Power referenced to 1 milliwatt        |
| DC     | Direct Current                         |
| DPD    | Digital Predistortion                  |
| DPCH   | Dedicated Physical Channel             |
| DSP    | Digital Signal Processor               |
| FIFO   | First In, First Out                    |
| FPGA   | Field Programmable Gate Array          |
| GPIB   | General Purpose Interface Bus          |
| GUI    | Graphical User Interface               |
| JTAG   | Joint Test Action Group                |
| LO     | Local Oscillator                       |
| LUT    | Look Up Table                          |
| ms     | Milliseconds                           |
| NMSE   | Normalized Mean Squared Error          |
| PA     | Power Amplifier                        |
| PAE    | Power Added Efficiency                 |
| PCB    | Printed Circuit Board                  |
| PLL    | Phase Locked Loop                      |
| PM     | Phase Modulation                       |
| QR-RLS | QR based Recursive Least Squares       |
| RAM    | Random Access Memory                   |
| RF     | Radio Frequency                        |
| RLS    | Recursive Least Squares                |
| SVD    | Singular Value Decomposition           |
| TI-EVM | Texas Instruments Evaluation Module    |
| VSA    | Vector Signal Analyzer                 |
| VSG    | Vector Signal Generator                |
| W      | Watts                                  |
| WCDMA  | Wideband Code Division Multiple Access |

#### **Chapter One: Introduction**

Wireless communications have become ubiquitous and mainstream in today's world. The transition from normal voice communications to multimedia and Internet data access push the need for higher data rates on a cellular phone, while requiring the same convenience of mobility and signal quality (Haykin 2001). The latest third generation (3G) communication schemes, including Wideband Code Division Multiple Access (WCDMA), makes use of complex modulation techniques in order to maximize data throughput, and subsequently spectral efficiency, in a wireless channel (Holma, Toskala et al. 2004). However, these optimized waveforms have high varying envelope signals which reduce the operating efficiency of the power amplifier (PA), translating into decreased battery life for mobile terminals and increased operating costs for base stations (Reynaert and Steyaert 2006). Driving the PA using a high efficiency mode requires linearization techniques to compensate for its nonlinear output. The digital predistortion technique provides a high PA linearization capability at the cost of a more complex implementation.

#### **1.1 Wireless Transmitters**

A simplified wireless communications transmitter is shown in Figure 1.1 (Glisic 2004). The digital information carrying the signal's in-phase (I) and quadrature-phase (Q) components are passed into Digital to Analog Converters (DACs). The direct upconversion stage translates the baseband analog signal to the Radio Frequency (RF) carrier by multiplying the baseband signal with a local oscillator (LO), and the result is

applied to the input of the PA. The output of the PA is directed into an antenna for propagation into the wireless channel.



Figure 1.1 Block diagram of a conventional direct conversion wireless transmitter

A method of increasing the data communication rate is to modulate different information signals with several sub-carriers in baseband (Holma, Toskala et al. 2004). This multi-carrier technique requires multiplying an information signal with one or more complex sinusoids to shift the information's frequency spectrum to offset frequencies  $f_1$ to  $f_N$  (Figure 1.2). While this multi-carrier method increases the data rate, the cost is the additional processing power needed to modulate the signals, as well as the need for a wideband PA.



Figure 1.2 Baseband multi-carrier signal generation

The power amplifier component is typically the most inefficient component of the wireless transmitter. When designing a power amplifier, the key criteria include the linear signal conversion quality between the input and output (linearity), and the amount of power consumed to deliver the required output power (efficiency) (Kenington 2000). There naturally exists a trade-off between linearity and efficiency, where operating in the high linear region results in low efficiency, while the high efficiency region requires operating in the nonlinear region of the PA. Figure 1.3 shows the power response and efficiency of a simulated class AB power amplifier. As the efficiency of the power amplifier increases with the input drive level,  $P_{in}$ , the PA moves into a nonlinear mode, where an input signal will show distortion at the output port of the PA.



Figure 1.3 Gain (solid trace) and drain efficiency (circle markers) versus input drive level P<sub>in</sub> for a simulated class AB power amplifier

#### **1.2 PA Nonlinearity Effects**

The power amplifier response experiences gain compression close to the saturation point. Operating in this nonlinear region of the power amplifier affects both the amplitude and phase information at the PA output. A baseband analysis may be conducted to view the power amplifier characteristics. This involves comparing the Amplitude Modulation of the input drive signal against the Amplitude Modulation of the amplified output signal (AM-AM), and the Amplitude Modulation of the input signal against the Phase Modulation of the amplified output signal (AM-AM), and the Amplitude Modulation of the input signal against the Phase Modulation of the amplified output signal (AM-AM).

The nonlinear output of a PA also produces significant out of band distortion, as demonstrated in Figure 1.4. The trace with square markers is the output of the PA signal when the input signal is driven into the nonlinearity gain region of the PA. This spectral regrowth causes unwanted interference in other user's frequency channels, and its effect is also more prominent when wider bandwidth signals are used.



Figure 1.4 Frequency spectra of the input signal (circle markers), and the output signal (square markers) are compared with the PA in a non-linear mode of operation

Wideband signals operating under the nonlinear region experience another undesired phenomenon known as memory effects (Boumaiza and Ghannouchi 2003). Memory effects mean that, for a given input power level, the instantaneous power gain of the PA is no longer constant over time, and varies with the current and previous input signal values. These memory effects come from a number of areas including non ideal electrical components and thermal change variations in the transistor of the PA. Memory effects cause complications for compensation techniques below, and must be taken into consideration when large bandwidth signals are used.

#### **1.3 Nonlinearity Compensation Techniques**

Operating in the high efficiency region of the PA incurs a nonlinear output; and, there are a number of ways to compensate for this PA nonlinearity. In the next section we discuss three linearization techniques (1) feedforward linearization, (2) Cartesian feedback linearization, and (3) digital predistortion.

#### 1.3.1 Feedforward Linearization (Analog Correction)

In feedforward linearization (Figure 1.5) (Rummery and Branner 1997), the signal is sent through a main power amplifier, and an error signal is computed by subtracting the original output from the attenuated output of the power amplifier. The error signal is passed through an error correcting PA, and added 180° out of phase to the original distorted output. The result is a system capable of providing distortion cancelling effects at the PA output. However, the drawbacks of this system include the need for the additional error correcting power amplifier, which increases the cost of the system, and adaptive accurate gain and phase alignment at the output of the RF combiner of both loops of the feedforward power amplifier.



Figure 1.5 System diagram of an analog feedfoward linearizer (Rummery and Branner 1997)

#### 1.3.2 Cartesian Feedback Linearization (Analog Correction)

For the case of a Cartesian feedback linearizer (Briffa and Faulkner 1996), the nonlinear PA output signal is downconverted and recombined with the input at the baseband stage with proper gain for the inphase and quadrature phase components, shown in Figure 1.6. The combination of the input and the negative feedback of the nonlinear PA output signal results in a complementary signal generated at the output of the subtraction stage that cancels out the PA distortion. The drawback is a potential compensation mismatch due to the delay introduced at the feedback path, which can cause instability in the system.



Figure 1.6 System diagram of an analog Cartesian feedback linearizer (Briffa and Faulkner 1996)

#### 1.3.3 Digital Predistortion

Digital predistortion (Cavers 1990) removes the compensation feedback path in Cartesian feedback linearization, and instead attempts to compensate for the nonlinearity in the PA using baseband signal processing algorithms. The baseband complex inputs and outputs of the PA are sampled in the digital domain, where a high speed signal processor can compute the characteristics of the PA, and numerically invert these characteristics to synthesize the predistortion complex gain function to be cascaded downstream from the PA in the baseband domain. The system block diagram is shown in Figure 1.7.



Figure 1.7 System diagram of digital baseband predistortion (Cavers 1990)

#### 1.3.4 Comparison of Linearizers

A comparison of the aforementioned linearizers mentioned is shown in Table 1.1 (Vuolevi and Rahkonen 2003). Although the digital predistortion method has a high implementation complexity, it offers moderate efficiency improvement and allows use of higher bandwidth input signals compared with the Cartesian feedback linearizer.

 
 Table 1.1 Comparison of different linearization methods in terms of complexity, efficiency and bandwidth

| Linearization Method  | Complexity | Efficiency | Bandwidth |
|-----------------------|------------|------------|-----------|
| Feedforward           | High       | Moderate   | High      |
| Cartesian Feedback    | Moderate   | High       | Narrow    |
| Digital Predistortion | High       | Moderate   | Moderate  |

Based on the previous sections, it is important to maintain a linear amplified signal at the PA output to reduce the amount of spectral regrowth in adjacent frequency channels. However, the inverse relationship between PA efficiency and PA linearity requires a compensation method that favours high efficiency while maintaining linearity. Digital predistortion offers advantages as an ideal solution because its primary advantage is it may be easily adaptable to a variety of PAs due to its software reconfigurability aspect, while allowing for compensation of wideband signals.

There are several challenges dealing with the implementation of digital predistortion in an embedded environment. First, the system needs to be adaptive, and suitable for real-time predistortion of the signal with minimal latency. This is required because of critical synchronization and timing constraints posed by wireless communication standards (Glisic 2004). Secondly, the predistortion synthesis algorithm must be computed fast and accurately enough to characterize the behaviour of the PA to ensure high performance in different operating conditions (Boumaiza, Helaoui et al. 2007). Finally, the DACs and ADCs that interface the digital domain to the analog environment must be of high precision for a higher signal to noise ratio, and consequently require more accurate signal processing in the embedded processor.

#### **1.4 Thesis Organization**

The core of this thesis is divided into five parts.

Chapter 2 discusses the behavioural modeling approaches used to compensate for the nonlinearity effects of the PA. A characterization procedure for a PA is outlined using an experimental setup, and the memoryless, and memory polynomial models are introduced to model the PA behaviour. These models are analyzed in simulation to assess their capability for digital predistortion.

Chapter 3 describes the concept of digital predistortion for linearizing PAs. Both the memoryless and memory polynomial models are used to synthesize the digital predistortion function, and its linearization capability is assessed in simulation. Next, the experimental setup is used to generate the predistorted signal to validate linearization capability of a PA. The linearization results using both models with a class AB power amplifier are presented for different multi-carrier WCDMA waveforms.

Chapter 4 explains the process for developing digital predistortion synthesis algorithms on a DSP platform. The DSP model accuracy is compared to the simulation model's accuracy using an automated test framework. The DSP generated model is used to predistort the original signal, and the experimental setup is used to validate the linearization of a highly nonlinear Doherty PA.

Chapter 5 illustrates a hybrid DSP/FPGA platform used to linearize 3G transmitters. Issues regarding the real-time predistortion of the signal in the FPGA were identified. The DSP/FPGA platform was used to linearize the aforementioned class AB and Doherty power amplifiers to validate system operation.

Chapter 6 concludes with a summary of the research project and future work.

#### **1.5 Thesis Contributions**

The following is a summary of the contributions for this thesis.

 A testing framework designed for the use of validating MATLAB simulation, and DSP/FPGA implemented signal processing algorithms was presented in the Signal Information and Processing Systems conference 2006 (SIPS '06)<sup>1</sup>.

<sup>&</sup>lt;sup>1</sup> A. Kwan, S. Boumaiza, et al. (2006). Automating the Verification of SDR Base band Signal Processing Algorithms Developed on DSP/FPGA Platform. Signal Processing Systems Design and Implementation, 2006. SIPS '06. IEEE Workshop on.

- 2. The implementation and validation of adaptive filter based algorithms for optimization of digital predistortion algorithms were published in the Journal of Signal Processing Systems<sup>2</sup>.
- 3. The discussion of critical issues and effectiveness of digital predistortion were presented in the 38<sup>th</sup> European Microwave Conference 2008 workshop<sup>3</sup>.

 <sup>&</sup>lt;sup>2</sup> A. Kwan, M. Helaoui, et al. "Wireless Communications Transmitter Performance Enhancement Using Advanced Signal Processing Algorithms Running in a Hybrid DSP/FPGA Platform." Journal of Signal Processing Systems, from http://dx.doi.org/10.1007/s11265-008-0225-3
 <sup>3</sup> Ghannouchi, F., Hammi, O., Liu, T., Kwan, A. (2008). Digital Predistortion: An Enabling Technology for 3G+/4G Transmitters Design. 38<sup>th</sup> IEEE European Microwave Conference Workshop, 2008. EuMC 2008.

#### Chapter Two: Characterization and Behavioural Modeling of Transmitters and PAs

Behavioural modeling is used to characterize the power amplifier's input-output behaviour. The power amplifier (PA) characterization procedure involves using a signal source to generate the test waveform for the PA input excitation, and a data acquisition unit to capture the waveform at the output port of the PA. Further post-processing techniques are used on a computer to compensate for the delay between the input and output, as well as to model the behaviour of the system. In this thesis, two models are presented: the memoryless, and the memory polynomial model. The memoryless model attempts to compensate for the static nonlinearity of the system, and was chosen because of its simple algorithm implementation. However, the memoryless model accuracy decreases when a wideband input waveform is used. The memory polynomial model is introduced to improve the accuracy by taking into account the memory effects exhibited by the transmitter, as was chosen because of its reasonable modeling accuracy and performance when modeling PAs that exhibits memory behavior.

#### **2.1 Evaluation Metrics**

There are many factors that contribute to the performance evaluation of power amplifier and transmitters; however, linearity and efficiency are the most important.

Linearity is achieved when the output signal is an amplified replica of the input signal with a constant gain factor. When nonlinearity is introduced into a system, the phenomenon can be viewed in the frequency domain as spectral regrowth into adjacent channels. In Figure 2.1, the Adjacent Power Channel Ratio (ACPR) can be defined as the ratio of the integrated power in the main channel bandwidth  $P_{ch}$  over either the integrated power in the lower adjacent channel,  $P_{adj,l}$ , or the integrated power in the upper adjacent channel,  $P_{adj,u}$ . The lower ACPR is calculated as  $\frac{P_{ch}}{P_{adj,l}}$ , and the upper ACPR is

calculated as 
$$\frac{P_{ch}}{P_{adj,u}}$$
.



Frequency



For Wideband Code Division Multiple Access (WCDMA) applications, the integration bandwidth of all channels is 3.84 MHz, while the adjacent channel offsets typically are at 5 MHz, 10 MHz, and 15 MHz from the carrier. In multicarrier WCDMA applications, the offsets are measured from the first carrier for lower adjacent channel measurements, and the last carrier for upper adjacent channel measurements.

The efficiency of the power amplifier  $(\eta)$  can be calculated as the ratio of the RF output power from the PA  $(P_{out})$ , and the DC power delivered to the PA  $(P_{DC})$  (Kenington 2000).

$$\eta = \frac{P_{out}}{P_{DC}} \tag{2.1}$$

Power Added Efficiency (PAE) metric takes into account the input drive level into the PA ( $P_{in}$ ), and can be described by

$$PAE = \frac{P_{out} - P_{in}}{P_{DC}} = \eta \left(1 - \frac{1}{G}\right)$$
(2.2)

where G can be described as the gain of the power amplifier.

#### 2.2 PA Characterization Setup

Figure 2.2 shows the power amplifier characterization setup, where an Agilent E4438C Vector Signal Generator (VSG) allows the generation of a digitally modulated baseband waveform, and an Agilent E4440A Vector Signal Analyzer (VSA) will be used to capture and down convert the RF PA output waveform into a baseband waveform for signal processing (Agilent Technologies 2000). Both the Signal Generator and Vector Signal Analyzer are connected to a computer using the General Purpose Interface Bus (GPIB). A trigger pulse is sent from the VSG to the VSA to synchronize the analyzer with the beginning of a waveform.



Figure 2.2 Experimental power amplifier characterization setup

A class AB amplifier suitable for WCDMA transmitters operating at 1.96 GHz was used as the initial validation for the extracted behaviour models of the amplifier. The characterization procedure for power amplifier characterization is as follows. First, a one carrier, Test Model 1 type WCDMA baseband signal sampled at 92.16 MHz (oversampled by 24 times the sample rate of the original signal) is downloaded into the VSG. The time length of this signal is 2 milliseconds (ms), which equates to 3 slots of 1 WCDMA frame and results in 184,320 complex data points. Then, the waveform is upconverted to 1.96 GHz and is used to drive the PA into the nonlinear region by adjusting the output power of the VSG, making sure not to exceed the saturation point of the PA. Finally, the output of the PA is attenuated, downconverted to baseband, and sampled at 92.16 MHz by the VSA instrument, and read into the Personal Computer (PC) for offline processing. It is important to note that no specialized training signals are used to excite the power amplifier, and the actual transmitted WCDMA signal is used to

characterize the PA. This results in no unnecessary offline calibration mode for linearization techniques described in the next chapter.

#### 2.3 Power Amplifier Behavioural Modeling

The behaviour model of the power amplifier can be calculated in simulation using computing software such as MATLAB. There are three steps to be performed: delay compensation between the input waveform and the captured waveform at the output of the device, model characterization of the device, and validation of the model.

#### 2.3.1 Delay Compensation

The signal propagation delay between the input and output of the device under test causes a problem in generating the correct AM-AM and AM-PM measurement waveforms for the device. In Liu, et.al (Liu, Yan et al. 2008), the authors show that there are dispersion effects in the AM-AM and AM-PM when the delay is not accurately compensated. To compensate for the delay between the input and output of the device, interpolation and cross correlation signal processing techniques are used as described in the next section.

Interpolation is used to increase the sampling rate of the captured input and output waveforms, giving a finer time resolution when compared with the original signals. The Lagrange interpolation method is used to upsample the input and output waveforms (Berrut and Trefethen 2004). The following equations describes the Lagrange interpolation

17

$$p(x) = \sum_{j=0}^{N} y_j l_j(x)$$
(2.3)

$$l_{j}(x) = \frac{\prod_{k=0, k \neq j}^{N} (x - x_{k})}{\prod_{k=0, k \neq j}^{N} (x_{j} - x_{k})}$$
(2.4)

where p(x) is the polynomial function defined for the interval, x is the interpolation node,  $y_i$  is the known value for  $x_i$ , and  $x_k$  is the known node interval.

Performing a cross correlation between the interpolated input and output signals will give the estimated time delay (Vaseghi 2006). The complex cross-correlation can be described as

$$R_{xy}(m) = E\{x_{n+m}y^*_n\}$$
(2.5)

where  $E\{.\}$  is the expectation operator, and x and y are the input and output waveforms, respectively. The interpolated input and output waveforms are shifted relative to each other by the value of m at which the maximum value of the cross correlation  $R_{xy}(m)$  occurs.

#### 2.3.2 Memoryless Model

The memoryless model takes the instantaneous complex gain characteristics of the power amplifier and generates the static nonlinearity of the PA (Cripps 2006). The complex gain characteristics (AM-AM and AM-PM) can be calculated using the following equations:

$$P_{dBm}(z) = 10\log_{10} \frac{\left|\frac{z}{\sqrt{2}}\right|^2}{50} + 30$$
(2.6)

$$P_{in,dBm}(n) = P_{dBm}(x(n))$$
(2.7)

$$P_{out,dBm}(n) = P_{dBm}(y(n))$$
(2.8)

$$G_{PA}(n) = P_{out,dBm}(n) - P_{in,dBm}(n)$$
(2.9)

$$\phi_{PA}(n) = \arctan\left(\frac{y(n)}{x(n)}\right) \tag{2.10}$$

where x(n) is the complex modulated input waveform driven into the PA, y(n) is the complex modulated output waveform at the output of the PA and z is a complex waveform with units in Volts.

After calculation of the AM-AM and AM-PM characteristics, a moving average algorithm is performed to remove the residual dispersive behaviour (Ben Nasr, Boumaiza et al. 2005). The moving average can be performed by the following:

$$\tilde{g}(n) = \hat{g}(n-1) + \frac{x(n) - x(n-1)}{x(n+1) - x(n-1)} (g(n+1) - \hat{g}(n-1))$$
(2.11)

$$\hat{g}(n) = \lambda(n)g(n) + (1 - \lambda(n))\tilde{g}(n)$$
(2.12)

where  $\hat{g}$  represents the gain or phase after averaging, x(n) represents the input power, and  $\lambda(n)$  is the regression factor, chosen to be a value between 0 and 1. A  $\lambda(n)$  with value close to 1 results in small change to the signal and a non smoothed curve. Selecting a value of  $\lambda(n)$  proportional to the second derivative of the gain function leads to better averaging results (Kwan, Helaoui et al. 2008).

#### 2.3.3 Memory Polynomial Model

The memory polynomial model (Kim and Konstantinou 2001) is based on the optimization of the Volterra series. It takes into account the dynamic memory effects introduced by the PA. The model can be described by the following expression

$$y(n) = \sum_{m=0}^{M} \sum_{k=0}^{K} h_{m,k} x(n-m) \left| x(n-m) \right|^{k}$$
(2.13)

where x(n-m) is the complex modulated input waveform driven into the PA with delay m, y(n) is the complex modulated output waveform with the output attenuation applied, K is the polynomial order number, M is the memory tap length and  $h_{m,k}$  are the memory polynomial coefficients. Figure 2.3 shows the model diagram of the memory polynomial model.



Figure 2.3 In the memory polynomial model diagram, the output is constructed by multiplying delayed versions of the input signal by a delay dependent function

The memory polynomial coefficients  $h_{m,k}$  can be solved by taking the time aligned input and output waveforms, x(n) and y(n), and generating a system of linear equations. In this case, the following equation needs to be solved

$$\mathbf{A}\mathbf{b} = \mathbf{y} \tag{2.14}$$

where

$$\mathbf{A} = \begin{bmatrix} \beta_0(n) & \beta_1(n) & \cdots & \beta_M(n) \\ \beta_0(n-1) & \beta_1(n-1) & \cdots & \beta_M(n-1) \\ \vdots & \vdots & \ddots & \vdots \\ \beta_0(n-N-1) & \beta_1(n-N-1) & \cdots & \beta_M(n-N-1) \end{bmatrix}$$
$$\boldsymbol{\beta}_m(n) = \begin{bmatrix} x(n-m) \ x(n-m) | x(n-m) | \dots & x(n-m) | x(n-m) |^K \end{bmatrix}$$
$$\mathbf{b} = \begin{bmatrix} h_{0,0} \ h_{0,1} \dots & h_{1,0} \ h_{1,1} \dots & h_{M,K} \end{bmatrix}^T$$
$$\mathbf{y} = \begin{bmatrix} y(n) \ y(n-1) \dots & y(n-N-1) \end{bmatrix}^T$$

An approximate solution to the coefficients  $h_{m,k}$  can be found by generating an overdetermined system of linear equations (where  $N > (M+1) \times (K+1)$ ), and subjecting a minimization of the mean squared error, e, where

$$e = \left\| \mathbf{y} - \mathbf{A} \mathbf{b} \right\|^2 \tag{2.15}$$

The approach used to solve for the system of linear equations is to compute the Moore-Penrose pseudo-inverse of the matrix A (Haykin 2001), which is based on decomposing the matrix using Singular Value Decomposition (SVD) and multiplying with the output vector y as follows

$$\mathbf{b}_{opt} = pinv(\mathbf{A}) \times \mathbf{y} \tag{2.16}$$

#### 2.3.3.1 Optimized Identification Techniques

The identification of the polynomial coefficients can also be derived using filter convergence techniques. Two techniques used include the Recursive Least Squares (RLS) and QR based Recursive Least Squares (QR-RLS) filter identification techniques (Muruganathan and Sesay 2006). The RLS algorithm reduces the data memory requirements with the system, particularly the construction of the memory intensive matrix **A**, and is replaced with a smaller  $(M+1)(K+1) \times (M+1)(K+1)$  dimension matrix, plus some temporary matrices needed for the algorithm. The QR-RLS method further reduces the matrix requirements by eliminating temporary matrices, and uses QR decomposition on the data matrix. Both are intended to reduce the memory usage and computational time associated with solving for the polynomial coefficients.

#### 2.4 Results from Model Validation

To validate the model's accuracy, the entire input waveform of x(n) (approximately 184000 points gathered over a time period of 2 ms) is passed through the model to generate the model output. Given the number of points used for model characterization is only 1000 samples, generating the model output on the entire input signal reflects the model robustness based on different signal characteristics. The Normalized Mean Squared Error (NMSE) can be used as a metric to evaluate the performance of the models, figuratively demonstrated in Figure 2.4. The NMSE can be computed using the following equation

$$NMSE_{dB} = 10 \log_{10} \left( \frac{\sum_{n=1}^{N} |y(n) - \tilde{y}(n)|^2}{\sum_{n=1}^{N} |y(n)|^2} \right)$$
(2.17)

where  $\tilde{y}(n)$  is the output of the model and y(n) is the time aligned output waveform.



Figure 2.4 Calculating the NMSE between the captured output, and the simulation model output

Table 2.1 lists the NMSE for the memoryless model, and memory polynomial models using the different methods of solving for the polynomial coefficients. The parameters used for the memory polynomial model were polynomial order K = 12, memory depth order M = 2, and sample depth N = 1000. The memory polynomial methods more correctly characterize the behaviour of the power amplifier compared with the memoryless model.

| NMSE (dB) |
|-----------|
| -41.50    |
| -47.06    |
| -45.98    |
| -47.03    |
|           |

 Table 2.1 List of NMSE values calculated between captured output and simulation models output

The result of the memoryless model is a Look Up Table function which has a 1:1 relationship between the input power, and complex gain. Figure 2.5 shows the AM-AM and AM-PM characteristics of the memoryless model. The dispersion of the measurement waveforms can be attributed to the memory effects caused by the PA, and to a lesser extent, noise from the measurement instrument.



Figure 2.5 (a) AM-AM Characteristics and (b) AM-PM characteristics of a class AB power amplifier using the memoryless model

For the memory polynomial model, an analysis of the model performance can be determined by substituting the coefficients into the memory polynomial equation to compute the estimated output. Figure 2.6 shows the AM-AM and AM-PM curves for the memory polynomial model using the SVD coefficient solving method, along with the optimized coefficient solving approaches. Unlike the memoryless model, the memory polynomial model is able to replicate the scatter at the output, which is caused by memory effects (where the current output is dependent on current and previous values of the input signal).



Figure 2.6 (a) AM-AM Characteristics and (b) AM-PM characteristics of a class AB power amplifier using the memory polynomial model, where the NMSE of these models are between 4-6 dB higher compared with the memoryless model

The time domain response of the output waveform is shown in Figure 2.7. Although the NMSE calculation shows a good performance for the QR-RLS, and RLS algorithms, it is not reflected in the figure. The figure is centered around the maximum of
the input signal, where models tend to have difficulty tracking because the probability of the input signal having this output value is minimal (approximately 0.01%).



Figure 2.7 The measurement and model output in time domain centered around the peak of the signal. The SVD follows the measured PA output signal well; however, the QR-RLS and RLS algorithms do not follow the peak well even though their NMSE performance is high.

# **2.5 Conclusion**

In this chapter, the power amplifier characterization system was introduced. Two behavioural models were presented, the memoryless, and memory polynomial model. The memory polynomial model using the SVD method solving technique showed a more accurate performance in the time domain compared with the memoryless model, and showed approximately a 6 dB improvement in NMSE. In the next chapter, both behavioural models will be used for the application of digital predistortion, and their linearization capabilities will be analyzed.

#### **Chapter Three: Baseband Digital Predistortion**

The Digital Predistortion (DPD) method can be considered as the identification of the normalized Power Amplifier's (PA's) inverse function, and its application on the original input waveform to produce a linear gain at the output of the PA. The capability of using the memoryless and memory polynomial models in a DPD scenario to compensate for distortion will be evaluated in the time and spectral domains for the same class AB PA as described in Chapter 2. Results show that a significant improvement in ACPR was achieved for all multi-carrier WCDMA waveforms when using a digital predistorter.

# **3.1 Digital Predistortion**

The PA's nonlinear transfer function can be represented by an arbitrary function g(.). To correct for the nonlinearities, the input waveform needs to be ideally passed through the normalized inverse function of g(.). The cascade (Figure 3.1) of the PA inverse (predistortion) function and the PA produces a linear relationship between x(n) and y(n), where  $x_a(n)$  is an intermediate signal produced by the predistortion function. The cascade should satisfy the following equation

$$g(f(x(n)) = G \times x(n)$$
(3.1)

where G is the required constant gain.



Figure 3.1 The cascade of the predistortion function and power amplifier should result in a linear amplification of the input signal

# 3.1.1 Theory of Memoryless Digital Predistortion

The digital predistortion necessary for the memoryless model can be directly computed from the PA model. First, a normalization of the gain is needed since the predistortion algorithm operates on the input PA power values. The normalization can be done by averaging the small signal gain and small signal phase of the PA.

The following equations describe the memoryless predistortion model. The function  $P_{dBm}(z)$  is previously defined in Equation 2.6.

$$P_{in,DPD}(n) = P_{dBm}(x_a(n)) - (P_{dBm}(y(n)) - P_{dBm}(x_a(n)) - G_{SS})$$
(3.2)

$$G_{DPD}(n) = -(P_{dBm}(y(n)) - P_{dBm}(x_a(n)) - G_{SS})$$
(3.3)

$$\phi_{DPD}(n) = -\left(\arctan\left(\frac{y(n)}{x_a(n)}\right) - \phi_{SS}\right)$$
(3.4)

where  $G_{ss}$  and  $\phi_{ss}$  are the magnitude and phase of the small signal complex gain.

# 3.1.1.1 Model Validation for Memoryless Digital Predistortion

Figure 3.2 shows the AM-AM and AM-PM transfer characteristics of the normalized inverse PA function and the memoryless prediction function versus the input drive level  $P_{in}$ .

27



Figure 3.2 (a) AM-AM Characteristics and (b) AM-PM characteristics for the memoryless digital predistortion model

For the memoryless digital predistortion model, the gain and phase are inverses of the memoryless power amplifier model, except the small signal gain and phase are normalized. Similar to the memoryless power amplifier model, the complex gain is averaged over input power, resulting in its inability to track the memory effects.

# 3.1.2 Theory of Memory Polynomial Digital Predistortion

Computing the digital predistortion memory polynomial model requires knowledge of the small signal gain level. However, the equation to solve (Equation 2.14) is changed to reflect the DPD characterization,

$$\beta_{m}(n) = \left[ y_{norm}(n-m) y_{norm}(n-m) | y_{norm}(n-m) | \dots y_{norm}(n-m) | y_{norm}(n-m) |^{K} \right]$$
$$\mathbf{y} = \left[ x_{a}(n) x_{a}(n-1) \dots x_{a}(n-N-1) \right]^{T}$$

where  $y_{norm}$  is the output normalized by the small signal gain,  $G_{ss}$ , and small signal phase shift,  $\phi_{ss}$ , and  $x_a$  is the waveform at the input port of the power amplifier.

# 3.1.2.1 Model Validation for Memory Polynomial Digital Predistortion

Figure 3.3 shows the power characteristics of the memory polynomial digital predistortion model with different solving techniques versus the input drive level  $P_{in}$ .



Figure 3.3 (a) AM-AM Characteristics and (b) AM-PM characteristics for the memory polynomial digital predistortion model

The simulation results for the memory polynomial digital predistortion model tracks the memory effects more accurately compared with the memoryless digital predistortion model.

# 3.1.3 Simulation of Digital Predistortion Model's Performance

The cascade of the digital predistortion model upstream from the power amplifier should produce a linear response. In MATLAB, to verify the linear response, a cascade of both the digital predistortion model and the power amplifier model can be used. This simulation of the cascade is only a preliminary to see if the expected output is linear since either the DPD model or PA model may have modeling errors. Two cascades were simulated: one with the memoryless DPD model cascaded with the memoryless PA model, and one with the memory polynomial DPD model cascaded with the memory polynomial PA model using the SVD to solve for the coefficients. The AM-AM and AM-PM characteristics of both cascades are shown in Figure 3.4. The memoryless cascade is very linear since the formulation of the DPD model is directly associated with the PA model; however, the memory polynomial has minor variations from the linear response. The AM-PM of the cascades are shifted by -30° to avoid overlap with the AM-PM of the PA model.



Figure 3.4 (a) AM-AM Characteristics and (b) AM-PM characteristics for the cascade of the memoryless DPD/memoryless PA models and the cascade of the memory polynomial DPD/memory polynomial PA models compared with the PA models

A plot of the model's output spectrum can be used to estimate the linearity performance for each of the DPD models. Figure 3.5 demonstrates the capability of linearization for the memoryless and memory polynomial models using SVD, RLS and QR-RLS. In this figure, the memoryless trace has an inability to compensate for the memory effects, shown by the residual spectral regrowth in the bands adjacent to the signal of interest.



Figure 3.5 Output spectra of digital predistortion models, showing that the memoryless model has more spectral regrowth compared to memory polynomial models

# 3.2 Digital Predistortion Experimental Results

To experimentally validate the linearization capability of digital predistortion algorithms, the predistorted signal,  $x_a(n)$ , is downloaded into the VSG, upconverted and sent into the power amplifier using the characterization setup defined in Chapter 2. The system diagram is shown in Figure 3.6.



Figure 3.6 Validating digital predistortion using the experimental setup

To examine the variance of the spectral measurements using the Agilent E4440A spectrum analyzer, 27 single carrier WCDMA signals using Test Model 1 were generated in Advanced Design Software (ADS). These signals were generated with 16 Dedicated Physical Channels (DPCH), 32 DPCH, and 64 DPCH, and are 2 ms in length. A single memory polynomial digital predistortion model was characterized using one reference input signal, and the model was applied to the 26 other WCDMA signals for the entire 2 ms length (approximately 184000 points). This was done to test both the model under different signal excitation and the linearization capability over time. For each 27 different predistorted waveforms, the 2 ms signal was repeated for approximately 2 minutes and then the spectral measurements were recorded. The spectrum mean and standard deviation values are listed in Table 3.1.

Table 3.1 Mean and variance of the power amplifier output spectrum for 27different WCDMA waveforms with memory polynomial digital predistortionapplied

| Carrier<br>Offset | Lower ACPR<br>Mean (dBc) | Lower ACPR<br>Standard<br>Deviation | Upper ACPR<br>Mean (dBc) | Upper ACPR<br>Standard<br>Deviation |
|-------------------|--------------------------|-------------------------------------|--------------------------|-------------------------------------|
| 5 MHz             | 60.89                    | 0.56                                | 60.29                    | 0.48                                |
| 10 MHz            | 62.45                    | 0.55                                | 62.63                    | 0.57                                |
| 15 MHz            | 62.64                    | 0.58                                | 62.04                    | 0.60                                |

The linearization results for the one carrier WCDMA waveform is shown in Figure 3.7 (a). Additionally, two carrier, three carrier, and four carrier WCDMA linearization results are shown in Figure 3.7 (b), (c), and (d) respectively to demonstrate the effectiveness of both the memoryless model, and the memory polynomial model when attempting to compensate for nonlinearity in wide bandwidth scenarios. In the four spectrum diagrams, the trace with square markers shows the non-linearized output, the trace with circle markers shows the output with memoryless digital predistortion applied to the input waveform, and the solid trace shows the memory polynomial digital predistortion applied to the input waveform. For the three memory polynomial solving techniques (SVD, RLS, and QR-RLS), the results were nearly identical, thus only one trace is illustrated.



Figure 3.7 (a) One carrier (b) Two carrier (c) Three carrier and (d) Four carrier WCDMA linearization results using a class AB PA

In the experimental results, the memoryless algorithm linearization performance degrades as the signal bandwidth increases. Table 3.2 shows the ACPR performance achieved without predistortion and after linearization of the transmitter using both memoryless and memory polynomial predistortion. The ACPR requirements for the third generation partnership project (3GPP) WCDMA signals require that for base station terminals, the first adjacent carrier must meets a 45 dBc limit, and the second and third

adjacent carrier requires over 50 dBc (European Telecommunications Standards Institute 2008).

. •

# Table 3.2 ACPR results for multicarrier WCDMA waveforms using a class AB PA, with and without digital predistortion. The memoryless DPD does not meet the 3GPP requirements for a four carrier signal at 10 MHz offset. The memory polynomial DPD meets all the 3GPP requirements for one to four carrier signals.

|                               | 5 MHz          | Offset         | 10 MHz Offset  |                | 15 MHz Offset  |                |
|-------------------------------|----------------|----------------|----------------|----------------|----------------|----------------|
| Waveform Type                 | Lower<br>(dBc) | Upper<br>(dBc) | Lower<br>(dBc) | Upper<br>(dBc) | Lower<br>(dBc) | Upper<br>(dBc) |
| One Carrier (No DPD)          | 40.78          | 40.36          | 57.97          | 57.85          | 63.69          | 63.90          |
| One Carrier (Memoryless)      | 57.86          | 58.37          | 60.68          | 60.82          | 61.08          | 61.41          |
| One Carrier (M. Polynomial)   | 60.02          | 60.22          | 60.95          | 61.49          | 61.41          | 61.32          |
| Two Carrier (No DPD)          | 35.17          | 34.66          | 45.05          | 44.42          | 52.72          | 52.69          |
| Two Carrier (Memoryless)      | 52.14          | 52.05          | 55.88          | 56.40          | 58.40          | 58.70          |
| Two Carrier (M. Polynomial)   | 58.28          | 58.48          | 59.01          | 59.30          | 59.43          | 59.60          |
| Three Carrier (No DPD)        | 34.70          | 34.16          | 39.31          | 38.35          | 47.01          | 46.97          |
| Three Carrier (Memoryless)    | 48.40          | 49.97          | 49.97          | 51.01          | 53.93          | 55.32          |
| Three Carrier (M. Polynomial) | 56.03          | 56.25          | 56.36          | 56.32          | 56.85          | 57.01          |
| Four Carrier (No DPD)         | 35.29          | 34.37          | 38.34          | 37.08          | 42.52          | 41.47          |
| Four Carrier (Memoryless)     | 47.97          | 48.58          | 48.87          | 49.50          | 51.25          | 52.19          |
| Four Carrier (M. Polynomial)  | 54.01          | 53.27          | 54.19          | 53.38          | 54.22          | 53.42          |

Operating the class AB power amplifier without predistortion in its linear region, gives a resulting mean power added efficiency of less than one percent (1%) and a peak

output power of 21.4 dBm (0.13 W). However, for the same quality of signal where the ACPR levels meet the requirement of the 3GPP standards, when using the memory polynomial DPD and driving the PA to its maximum level without clipping (the saturation point of the PA), it results in a mean power added efficiency of 15% and a peak output power of 38.1 dBm (6.46 W) with the signal's peak to average power ratio of approximately 9.5 dB.

## **3.3 Conclusion**

The digital predistortion linearization capabilities of both the memoryless model and memory polynomial model were presented. The memoryless DPD model was able to meet the 3GPP ACPR requirements for all waveforms except for the four carrier WCDMA waveform, while using the memory polynomial DPD model was able to meet the linearity requirements for all single-carrier and multi-carrier waveforms. Both linearization techniques were able to improve the maximum operating power and efficiency of the PA in comparison with the case of operating the PA in far back-off region to meet the linearity requirement of the 3GPP standard. The next chapter will discuss the implementation of the digital predistortion algorithms on a digital signal processor platform, with testing and validation using a highly nonlinear PA.

## **Chapter Four: Baseband DPD Identification on a DSP Platform**

In Chapter 3, the experimental results show the potential of digital predistortion in correcting for the nonlinearity of the transmitter. This chapter presents the design and implementation of digital predistortion identification algorithms on a Digital Signal Processor (DSP) platform capable of compensating for the distortion attributed by the PA. The accuracy of the identification is critical, and an automated testing framework is proposed to analyze accuracies between the simulation generated model, and the model generated in the DSP. Finally, the digital predistortion model generated in the DSP is used to linearize a highly nonlinear PA to demonstrate its linearization capability.

# **4.1 DSP Implementation**

The DSP selected was an Analog Devices TigerSHARC ADSP-TS201 EZ-Kit Lite board (Analog Devices Inc. 2007). The evaluation board consists of two 64-bit TS201s floating point processors running at a 600 MHz core clock rate, configured in a multiprocessor environment. For the prototype, only one ADSP-TS201 was used to simplify software debugging. The requirement for the DSP is high speed signal processing targeted for PA characterization. The predistortion algorithms are computationally intensive, especially when singular value decomposition used for large matrices.

### 4.1.1 Predistortion Identification

The input, x(n), and output, y(n), waveforms are directly downloaded into the DSP using the VisualDSP++ development environment through the use of a Joint Test Action Group (JTAG) cable that allows the connection between a computer and the DSP for testing and debugging of software.

The time delay compensation technique, and the memoryless and memory polynomial models described in Chapter 2 are implemented on the DSP using the C++ language. For the memoryless algorithm, there was little performance gain when using 32-bit and 64-bit floating point numerical representations; therefore, a 32-bit floating point version of the memoryless algorithm was implemented for a better performance-toaccuracy trade-off. The necessary Singular Value Decomposition (SVD) method used to derive the pseudo-inverse of the matrix is part of the CLAPACK (Anderson, Bai et al. 1999) library. The pseudo-inverse function contained in this library is very similar to the one in MATLAB, and by using the 64-bit double floating point precision methods, similar results computed in both the simulation and embedded hardware environments can be achieved. However, the processing time for such a computationally intensive function requires a different approach to promote the real time application of the platform. The Recursive Least Squares (RLS) algorithm and the QR decomposition based RLS (QR-RLS) introduced in Chapter 2 can alleviate these constraints, especially for the inversion of a large sized matrix.

# 4.2 Embedded Software Testing

The embedded system design requires the numerical and program flow accuracy that is present in the simulation based design. The simulation results need to be accurately reproduced (Kwan, Boumaiza et al. 2006). Therefore, a system is proposed to semi-automate the testing facility using the same test vectors. The testing of the software involves comparing the results produced in MATLAB, with the results produced in VisualDSP++ development environment for the TigerSHARC. An application was developed to semi-automate the testing done on both simulation and hardware by using the Component Object Model (COM) (Troelsen 2002) interfaces available for both MATLAB and VisualDSP++.

The Graphical User Interface (GUI) application developed for testing is shown in Figure 4.1. The user is able to select the model parameters, and the test vectors used for digital predistortion. The parameters and data are downloaded into the TigerSHARC DSP using the JTAG cable. To control the program flow of the software, jump tables are used to execute the correct algorithm for implementation. An example code listing is shown in Figure 4.2.

|                                                                      | /isualDSP++ 4.0                                            |                                                 |               |
|----------------------------------------------------------------------|------------------------------------------------------------|-------------------------------------------------|---------------|
| Unlink fron                                                          | n VisualDSP++ 4.0                                          | Memoryless 😽                                    |               |
| Model Parame                                                         | ters · · · · · · ·                                         |                                                 |               |
| Polynomial (                                                         | rrler l                                                    |                                                 | ]             |
| Tano                                                                 |                                                            |                                                 | j Bit         |
| Method                                                               | -                                                          | Truncate LUT                                    |               |
|                                                                      | ¥ .                                                        |                                                 |               |
| Input Ip                                                             | I_In_PA.txt                                                |                                                 | ] .<br>T      |
| tanut On                                                             | 10 To D& 64                                                |                                                 | J             |
| Input Qp                                                             | Q_In_PA.txt                                                |                                                 | ~             |
| Input Qp<br>Output If                                                | Q_In_PA.txt<br>I_Out_PA.txt                                |                                                 | ]             |
| Input Qp<br>Output If<br>Output Qf                                   | Q_In_PA.txt<br>I_Out_PA.txt<br>Q_Out_PA.txt                |                                                 | ]             |
| Input Qp<br>Output If<br>Output Qf                                   | Q_In_PA.txt I_Out_PA.txt D_Out_PA.txt                      |                                                 | ]<br>]        |
| Input Qp<br>Output If<br>Output Qf<br>/isualDSP++ A<br>Applied Offse | Q_In_PA.txt I_Out_PA.txt D_Out_PA.txt ctions et Applied St | nall Signal Farameters. Applied Delay Correcti: | , m<br>1<br>1 |

Figure 4.1 Graphical user interface developed for predistortion testing between the DSP and MATLAB software

.

.

```
// function prototypes
void execute time delay(void);
void execute memoryless(void);
void execute memory polynomial(void);
// algorithm selection variable
int cmd run = 0;
bool testing done = false;
// jump table
void (*fcn cmd table[3]) (void) =
{
      execute time delay,
      execute memoryless,
      execute memory polynomial
};
int main(void)
{
      // Initialize DSP
      while(testing done == false)
            fcn_cmd_table[cmd_run]();
      return 0;
}
```

# Figure 4.2 Example code listing used to control execution of different algorithms implemented in the DSP

The GUI sets a breakpoint at the beginning of the while loop statement to halt DSP execution. Once the user has set all the parameters in global memory space, the GUI sets the cmd\_run global variable to the algorithm targeted for testing, initiates a run command to the VisualDSP++ program to continue execution of the program, and waits until the algorithm is finished. Then, the coefficients are extracted from the DSP and transferred into MATLAB for post-processing and model validation.

# 4.3 DSP and Simulation Model Accuracy Results

The accuracy of the models was evaluated using the embedded testing software. In MATLAB, the NMSE between the simulation model  $(y_{sim}(n))$ , and the captured output (y(n)) were calculated. Then, using the testing software, the input and output waveforms were downloaded into the DSP using the JTAG cable, and the DSP calculated the model using the same parameters as in simulation. The model was then uploaded to the computer and the NMSE was calculated between the DSP generated model  $(y_{dsp}(n))$ and the captured output (y(n)), shown in Figure 4.3.



Figure 4.3 Calculating NMSE between the captured output, model output in simulation, and the DSP model output

The parameters used for the memory polynomial model were polynomial order K=12, memory depth order M=2, and sample depth N=1000, and the signal waveform used was the one carrier WCDMA signal used in Chapter 2 sent through the

class AB power amplifier. The simulation NMSE, DSP NMSE, and the computation time in the DSP are listed in Table 4.1

| Algorithm              | Simulation<br>NMSF | DSP<br>NMSF | DSP Computation Time                |         |  |
|------------------------|--------------------|-------------|-------------------------------------|---------|--|
|                        | (dB)               | (dB)        | million<br>clock cycles,<br>approx. | seconds |  |
| Memoryless             | -41.50             | -41.50      | 71                                  | 0.11    |  |
| M. Polynomial (SVD)    | -47.06             | -46.92      | 4,600                               | 7.67    |  |
| M. Polynomial (RLS)    | -45.98             | -45.98      | 3,100                               | 5.17    |  |
| M. Polynomial (QR-RLS) | -47.03             | -45.98      | 2,800                               | 4.67    |  |

Table 4.1 Performance evaluation in simulation and DSP for power amplifiermodelling of Class AB PA

The DSP NMSE values are close to the simulation NMSE values. In addition, the QR-RLS computational time in the DSP is the fastest compared to the RLS and SVD methods of solving for the memory polynomial coefficients but without a significant loss in accuracy.

| Algorithm              | Initial Memory<br>Allocation<br>(32-bit words) | Workspace Memory,<br>per iteration<br>(32-bit words) |  |
|------------------------|------------------------------------------------|------------------------------------------------------|--|
| Memoryless             | 4096                                           | 0                                                    |  |
| M. Polynomial (SVD)    | 156,000                                        | 160,585                                              |  |
| M. Polynomial (RLS)    | 6,240                                          | 312                                                  |  |
| M. Polynomial (QR-RLS) | 6,556                                          | 0                                                    |  |

Table 4.2 Memory usage for each model in the DSP

The memory allocation for each of the models is listed in Table 4.2. The memoryless model uses the least amount of memory on the DSP, only needing two arrays to store the AM-AM and AM-PM characteristics. The moving average algorithm used by the memoryless model can operate on the arrays directly; consequently, no workspace memory needed to be allocated. Using the LAPACK SVD algorithm, the **A** matrix (of dimension  $N \times (M+1)(K+1)$ ) needed to be pre-allocated, as well as several workspace variables. This causes a significant memory overhead for the SVD algorithm. The RLS algorithm had a significant reduction in memory, needing only pre-allocation of the inverse correlation matrix, weight vector, and two workspace arrays (Haykin 2001). Finally, the QR-RLS algorithm did not need temporary workspace calculations, reducing workspace memory to zero per iteration. Note for the memory polynomial algorithms, complex 64-bit double floating point numbers were used, and each complex number consumes 4 32-bit words of memory.

From Table 4.1 and Table 4.2, the SVD, RLS, QR-RLS methods delivered comparable NMSE values, all better than the memoryless algorithm. In addition, the RLS

and QR-RLS reduce the memory consumption and computation time compared to the SVD algorithm in the DSP.

## 4.4 Linearization Performance using DSP generated predistortion models

To evaluate the DSP accuracy, a high power, highly efficient LDMOS based Doherty power amplifier operating at 2.14 GHz was used. For initial validation, a two carrier WCDMA signal was used to excite the PA using the Agilent E4438C vector signal generator as the source and the Agilent E4440A vector signal analyzer as the signal acquisition instrument, the same characterization flow as in Chapter 2. Then, the captured waveforms were downloaded into the DSP for time delay compensation and predistortion synthesis. Finally, the predistorted signal,  $x_a(n)$ , is generated using the DSP model and downloaded into the VSG for linearization verification. The setup is shown in Figure 4.4.





The Doherty PA's complex gain characteristics are shown in Figure 4.5 (a) and (b) to highlight the extreme nonlinear region. The NMSE of the digital predistorted output models, and the measured time to compute the models in the DSP are also listed in Table 4.3.



Figure 4.5 (a) AM-AM Characteristics and (b) AM-PM characteristics of a Doherty based Power Amplifier

| Algorithm              | Simulation<br>NMSE | DSP<br>NMSF | DSP Computation Time                |         |  |
|------------------------|--------------------|-------------|-------------------------------------|---------|--|
|                        | (dB)               | (dB)        | million<br>clock cycles,<br>approx. | seconds |  |
| Memoryless             | -30.03             | -30.03      | 68                                  | 0.11    |  |
| M. Polynomial (SVD)    | -40.39             | -39.43      | 4,600                               | 7.67    |  |
| M. Polynomial (RLS)    | -39.60             | -38.93      | 3,100                               | 5.17    |  |
| M. Polynomial (QR-RLS) | -39.61             | -39.38      | 2,800                               | 4.67    |  |

Table 4.3 Performance evaluation in simulation and DSP for digital predistortion ofDoherty amplifier

Again, the QR-RLS algorithm has minimal computational time for memory based algorithms while producing adequate NMSE performance in the time domain. The QR-RLS method will be used as the primary method of solving for the memory polynomial coefficients in the DSP based on its accuracy, low computational time, and less consumption of memory. Four different waveforms (1, 2, 3, and 4 carrier WCDMA) were used, and the spectrum results are shown in Figure 4.6 (a), (b), (c), and (d). In addition, the ACPR performance achieved without predistortion, and after linearization of the transmitter using both memoryless and memory polynomial predistortion are listed in Table 4.4.



Figure 4.6 (a) One carrier (b) Two carrier (c) Three carrier and (d) Four carrier WCDMA linearization results using a Doherty PA

Table 4.4 ACPR results for multicarrier WCDMA waveforms using a Doherty PA, with and without digital predistortion. The memoryless DPD fails to meet the requirements for the three and four carrier signals, and the memory polynomial DPD fails to meet the requirements for the four carrier signal.

| •                             | 5 MHz | Offset       | 10 MH        | z Offset     | 15 MH | z Offset |
|-------------------------------|-------|--------------|--------------|--------------|-------|----------|
| Waveform Type                 | Lower | Upper        | Lower        | Upper        | Lower | Upper    |
|                               | (dBc) | <u>(dBc)</u> | <u>(dBc)</u> | <u>(dBc)</u> | (dBc) | (dBc)    |
| One Carrier (No DPD)          | 30.86 | 30.79        | 48.72        | 48.31        | 57.61 | 57.33    |
| One Carrier (Memoryless)      | 53.36 | 50.90        | 56.92        | 56.83        | 57.32 | 57.46    |
| One Carrier (M. Polynomial)   | 54.68 | 52.54        | 56.98        | 57.04        | 57.30 | 57.51    |
| Two Carrier (No DPD)          | 26.40 | 26.14        | 34.13        | 34.07        | 41.40 | 41.53    |
| Two Carrier (Memoryless)      | 49.10 | 47.27        | 53.05        | 52.87        | 54.86 | 54.70    |
| Two Carrier (M. Polynomial)   | 51.35 | 51.21        | 52.65        | 52.39        | 52.82 | 53.21    |
| Three Carrier (No DPD)        | 25.51 | 24.93        | 29.46        | 28.83        | 35.65 | 35.02    |
| Three Carrier (Memoryless)    | 46.38 | 46.88        | 47.08        | 48.87        | 49.64 | 51.56    |
| Three Carrier (M. Polynomial) | 49.07 | 49.36        | 50.97        | 51.22        | 51.84 | 51.46    |
| Four Carrier (No DPD)         | 25.70 | 25.26        | 28.71        | 28.07        | 32.31 | 31.58    |
| Four Carrier (Memoryless)     | 43.68 | 42.59        | 43.75        | 44.77        | 43.83 | 47.06    |
| Four Carrier (M. Polynomial)  | 47.62 | 46.17        | 47.69        | 47.65        | 47.89 | 47.45    |

Under normal operating conditions, to meet the linearity specifications, the peak input backoff power is approximately -25 dBm, producing a peak output power of approximately 42.693 dBm (18.6 W), and a mean PAE of 2.7%. However, with using a memory based digital predistortion method, a peak output power of 54.395 dBm (275.1 W), and a mean PAE of 20.8% was attained using complex modulated waveforms with peak to average power ratios of 9.5 dB, and was unable to not meet linearity specifications for the four carrier WCDMA waveform case.

# **4.5 Conclusion**

A digital baseband predistortion synthesis system was proposed and implemented on a digital signal processor. The time delay compensation, memoryless, and memory polynomial algorithms were implemented in C++. A framework was realized to facilitate testing of the algorithms designed and coded in both MATLAB and VisualDSP++ environments. Then, the DSP generated predistortion models were retrieved, and the predistorted signal was downloaded into the VSG for linearization of the Doherty power amplifier. Using digital predistortion with the Doherty power amplifier, the peak output power increased by 14.78 times in comparison to operating in the linear region of the Doherty PA without predistortion, and the mean PAE was increased by 18.1%. Using the DSP for predistortion synthesis, the next chapter will discuss the utilization of a FPGA for real-time predistortion correction.

# Chapter Five: Arbitrary Waveform Generators and Linearized Transmitters for 3G Applications

In Chapter 4, the experimental results show the potential of DSP based predistortion in correcting for the nonlinearity of the transmitter. This chapter presents the design and implementation of a hybrid digital predistortion Digital Signal Processor/Field Programmable Gate Array (DSP/FPGA) platform capable of compensating for the distortion attributed by the PA. The DSP identifies the inverse behaviour of the PA, while the FPGA is used for real time application of such model to digitally predistort the input signal.

The high-level system block diagram for the proposed baseband digital predistortion implementation is shown in Figure 5.1. The FPGA contains an arbitrary waveform generator that allows any baseband waveform to be downloaded. Implemented in the FPGA are predistortion Look Up Table (LUT) blocks, used to apply predistortion to the input waveform with minimal latency. The predistorted signal enters the Digital to Analog Converters (DACs), upconverted to an RF signal, and then passed into the PA input. Then, the signal at the output of the PA enters a feedback path, downconverted to baseband, and digitized using an Analog to Digital Converter (ADC). Both the digitized signals at the DAC input, and at the feedback loop (output of ADC) are stored into First In, First Out (FIFO) memory blocks needed for the PA characterization and predistortion function's identification.

The DSP is responsible for the predistortion synthesis. The DSP reads the FIFO buffers of the FPGA and calculates the inverse model of the PA described previously in Chapter 4, and uploads the required modification in the predistortion LUT on the FPGA.



Figure 5.1 Proposed system block diagram of standalone digital predistorter

The Digital to Analog Converter (DAC) analog output signal precision is directly related to the digital signal bit resolution. The increase in number of bits used primarily increases the analog resolution, which can improve the accuracy of the signal. An Analog Devices AD9779A 16-bit DAC is selected for the test system. Combined with a reconfigurable FPGA allows for a high precision arbitrary waveform generator. To validate the linearization capability of the system prototype, both the class AB power amplifier introduced in Chapter 2, and the Doherty power amplifier introduced in Chapter 4, are used to validate the system.

# **5.1 FPGA Implementation**

An Altera Stratix II EP2S180 FPGA hardware platform is selected for the online digital predistortion compensation, and the digital baseband arbitrary waveform generator. The EP2S180 is a high performance, high logic cell density FPGA containing approximately 179000 logic elements, and 9 megabytes of onboard system memory (Altera Corporation 2007). It also contains 768 9x9 DSP blocks with embedded multipliers, which optimizes multiplication performance in the FPGA. The main system blocks contained in the system are the Arbitrary Waveform Generator (AWG), digital predistortion block, and FIFO buffers for storing the input and output waveforms of the PA.

FPGAs tend to be utilized for their high integer computational performance, and the base numerical representation is a 1.15 (16-bit wide) fractional number format (Altera Corporation 1997). Table 5.1 shows the latency and resource usage between 1.15 fractional numbers and typical resource utilization using Altera's floating point megafunction (Altera Corporation 2007; Altera Corporation 2008). To reduce the latency and chip space, the 1.15 fractional number representation will be the principal number representation in the FPGA.

| Operation                             | 1.15 Fractional | <b>32-bit Floating Point</b> |
|---------------------------------------|-----------------|------------------------------|
|                                       |                 |                              |
| Addition/Subtraction Latency (cycles) | 1               | 11                           |
|                                       |                 |                              |
| Addition/Subtraction Resource Usage   | 32 ALUTs        | 743 ALUTs                    |
|                                       |                 |                              |
| Multiplication Latency (cycles)       | 3               | 5                            |
|                                       |                 |                              |
|                                       | 2 dsp_9bit      | 8 dsp_9bit +                 |
| Multiplication Resource Usage         |                 |                              |
|                                       |                 | 54 ALUT                      |
|                                       |                 |                              |

**Table 5.1 Number representations resource utilization** 

An arbitrary waveform generator block was designed in the FPGA to synthesize the baseband in-phase and quadrature components of the communication signal according to a specified communications modulation standard. Additional standards may be supported by using a Phase Locked Loop (PLL) to adjust the generator sampling rate. The AWG dimensions for the prototype is a  $2^{16}$  deep, 32-bit wide dual port RAM block, where the lower half of the 32-bit word contains the in-phase component, and the upper half of the 32-bit word contains the quadrature component of the waveform. The complex waveform values are magnitude normalized to the maximum positive number  $(1-2^{-15})$  to utilize the dynamic full bit resolution of the DAC.

The digital predistortion block contains a LUT, delay taps, and complex multipliers. For the prototype, a 2<sup>10</sup> deep, 32-bit wide LUT block is implemented. In Figure 5.2, the complex modulated input waveform is transformed into magnitude format  $x_i^2 + x_q^2$ , and the magnitude value is used to index the LUT. Once the complex predistortion coefficients have been read from the LUT, it is multiplied by the delayed

version of the input waveform used to index the LUT to produce the predistorted waveform, where  $x_{ai}$  is the in-phase portion of the predistorted waveform, and  $x_{aq}$  is the quadrature-phase portion of the predistorted waveform.



Figure 5.2 System diagram of a predistortion look up table implemented in the FPGA

The FIFOs are 32-bit wide RAM based blocks used to sample the digitized complex waveforms at the input and output of the PA  $(x_a(n) \text{ and } y(n))$ .

A summary of total resource utilization after compilation is shown in Table 5.2, showing that the resource usage is minimal for digital predistortion. The high memory usage can be attributed to the arbitrary waveform generator implementation in the FPGA.

| FPGA Element                   | Usage     | Total     | Percent Usage |
|--------------------------------|-----------|-----------|---------------|
| Logic Cells                    | 6,555     | 143,520   | 4.6 %         |
| Memory Bits                    | 4,464,164 | 9,383,420 | 47.5 %        |
| DSP Elements (9x9 Multipliers) | 64        | 768       | 8.3%          |

Table 5.2 Total resource utilization in the FPGA

### **5.2 DSP/FPGA Communication Link**

The connection between the FPGA and DSP is made possible by the Texas Instruments EValuation Module (TI-EVM) on the FPGA, and the External Bus on the Analog Devices TigerSHARC. The connections include the 32-bit Data Bus, 32-bit Address bus, and several control signals. A Printed Circuit Board (PCB) was manufactured to route the TI-EVM pins on the FPGA to the External Bus pins on the TigerSHARC. The FPGA contains glue logic coded in Verilog that allows it to be controlled by the TigerSHARC.

#### 5.3 DSP Predistortion Synthesis Issues for FPGA Predistortion Correction

#### 5.3.1 Memoryless Predistortion

The memoryless LUT model generated by the DSP has to be converted to a suitable format for processing in the FPGA. The generated  $P_{in,DPD}$  LUT indexing values by the memoryless model is converted into magnitude squared (the predistortion LUT is indexed in magnitude squared as shown in Figure 5.2), and its corresponding

predistortion table values,  $p_i$  and  $p_q$ , are converted from the memoryless model,  $G_{DPD}$ , and  $\phi_{DPD}$ , by the following equations:

$$p_{i} = 10^{\frac{G_{DPD}}{20}} \cos(\phi_{DPD})$$
$$p_{a} = 10^{\frac{G_{DPD}}{20}} \sin(\phi_{DPD})$$

The addresses to index the 2048 sized lookup table can be viewed as a 0.12 fractional number, with magnitude ranges between 0 and  $1-2^{-11}$ . An interpolation algorithm is used to compute the proper  $p_i$  and  $p_q$  LUT values, and the resulting LUT values are quantized to a 1.15 fractional number format when downloaded into the FPGA.

# 5.3.2 Memory Polynomial Predistortion

The memory polynomial algorithm requires a different implementation approach to derive its predistortion implementation in the FPGA. By analyzing Equation 2.13 and assuming that the magnitude of x is less than 1 due to magnitude normalization of the AWG, the higher k-order polynomials in  $x |x|^k$  tend to be significantly smaller than x because of the order term, and, their corresponding coefficients  $h_{m,k}$  tend to be really large (Raich, Hua et al. 2004). This will not be a problem if floating point precision is used in the FPGA, however as mentioned previously, the processing will experience severe latency and consume a large amount of logic cells. Another alternative is to use a fractional representation for computation, but it requires a delicate balance of both the number of bits used, and the potential overflow problem due to numbers higher than unity in the coefficients  $h_{m,k}$ .

To overcome the above problem, an alternative is to use multiple LUT (Gilabert, Cesari et al. 2008) approach, figuratively demonstrated in Figure 5.3, which consumes less space while still being able to use fractional representation for high processing throughput.



Figure 5.3 Relation scheme from polynomial coefficients to LUT

The approach requires the characterization signal x(n) to be passed through each memory branch of the memory polynomial algorithm, and saving the output samples before the summation,  $x_{a0}(n)$  to  $x_{aM}(n)$ . Then, a LUT can be derived for each branch because each branch only depends on the current sample, and experiences a 1:1 inputoutput relationship, similar to that of a look up table.

To assess the accuracy of the LUT method, the NMSE was calculated in simulation for a one carrier WCDMA waveform exciting the class AB power amplifier.

Using memory polynomial parameters M = 2 and K = 12 results in an NMSE of -47.06 dB, while using the multiple LUT results in a NMSE of -47.05 dB.

# 5.4 Baseband Digital Signal to Analog RF Waveform

An Analog Devices AD9779A evaluation board is used as the interface between the digital and analog domains. The evaluation board consists of the AD9779A DAC, as well as an Analog Devices ADL5372 direct conversion quadrature modulator to upconvert the signal into RF as shown in Figure 5.4. An external clock synthesizer is used, where the DAC clock output is used to synchronize the data. A 4x interpolation is used to enhance the analog resolution between sampling points; therefore, the operational clock frequency is four times that of the data generated by the FPGA (368.64 MHz). An external Local Oscillator (LO) is used for the ADL5372 to upconvert the signal to a desired RF carrier (up to a frequency of 2.5 GHz). Finally, a PC is used to control the configuration of the evaluation board.



Figure 5.4 Connectivity between FPGA and DAC evaluation board

# 5.4.1 AD9779A Spectrum Analysis

The Stratix II EP2S180 evaluation board offers a TI DAC5667 14-bit DAC. A performance assessment can be made by converting the signal to the IF domain for spectrum analysis on the VSA. Between Figure 5.5 and Figure 5.6, there is a 6 dB improvement in the noise floor at a 5 MHz offset from the carrier for the same WCDMA signal, which gives a lower spectrum noise floor for the input to the PA and results in a lower noise floor at the output of the PA.



Figure 5.5 Texas Instruments DAC5667 output of a one carrier WCDMA signal


Figure 5.6 Analog Devices AD9779A output of a one carrier WCDMA signal with 4x interpolation enabled

## 5.5 Linearization Results with Class AB PA at 1.96 GHz

The complete DSP/FPGA platform is shown in Figure 5.7. The initial prototype involves using the VSA to capture the RF output signal of the power amplifier. Then, the signal is downloaded into the output FIFO in the FPGA, simulating the capture of the proposed system. Predistortion synthesis is achieved in the DSP, and then the predistortion LUTs are uploaded into the FPGA. Figure 5.8 shows the system diagram for validating the DSP/FPGA predistortion prototype.



Figure 5.7 DSP/FPGA system prototype used for baseband digital predistortion



Figure 5.8 Validating the DSP/FPGA digital predistortion prototype

Figure 5.9 (a) - (d) shows the linearization results for 1-4 carrier WCDMA signals using the DSP/FPGA platform for predistortion, and Table 5.3 lists the ACPR results achieved for the class AB PA operating at 1.96 GHz.



Figure 5.9 (a) One carrier (b) Two carrier (c) Three carrier and (d) Four carrier WCDMA linearization results for Class AB PA using DSP/FPGA transmitter

Table 5.3 ACPR results for multicarrier WCDMA waveforms using a class AB PA using DSP/FPGA transmitter, with and without digital predistortion. The

memoryless DPD fails to meet the requirements for three and four carrier signals, while the memory polynomial DPD meets the requirements for one to four carrier signals.

|                               | 5 MHz Offset   |                | 10 MHz Offset  |                | 15 MHz Offset  |                |
|-------------------------------|----------------|----------------|----------------|----------------|----------------|----------------|
| Waveform Type                 | Lower<br>(dBc) | Upper<br>(dBc) | Lower<br>(dBc) | Upper<br>(dBc) | Lower<br>(dBc) | Upper<br>(dBc) |
| One Carrier (No DPD)          | 41.22          | 40.86          | 60.49          | 60.52          | 73.46          | 73.15          |
| One Carrier (Memoryless)      | 58.25          | 60.63          | 67.82          | 68.19          | 71.33          | 71.02          |
| One Carrier (M. Polynomial)   | 63.36          | 62.97          | 74.19          | 72.35          | 76.01          | 74.22          |
| Two Carrier (No DPD)          | 35.32          | 34.69          | 45.31          | 44.67          | 53.29          | 52.95          |
| Two Carrier (Memoryless)      | 50.60          | 52.50          | 55.92          | 58.45          | 62.05          | 63.25          |
| Two Carrier (M. Polynomial)   | 57.87          | 57.84          | 64.65          | 63.98          | 68.59          | 67.50          |
| Three Carrier (No DPD)        | 35.61          | 34.57          | 40.00          | 38.89          | 47.56          | 47.26          |
| Three Carrier (Memoryless)    | 45.61          | 50.60          | 47.27          | 51.90          | 51.45          | 57.00          |
| Three Carrier (M. Polynomial) | 54.67          | 55.43          | 57.87          | 59.40          | 62.24          | 63.09          |
| Four Carrier (No DPD)         | 36.37          | 35.51          | 39.29          | 38.13          | 43.69          | 42.23          |
| Four Carrier (Memoryless)     | 47.62          | 45.89          | 48.93          | 46.88          | 51.84          | 49.89          |
| Four Carrier (M. Polynomial)  | 52.88          | 52.51          | 54.25          | 54.20          | 56.05          | 55.81          |

The linearization results are comparable with the VSG/VSA linearization results shown in Chapter 3. Because of the low noise floor in the DACs, the ACPR values are higher in the further offsets from the carrier (10 MHz and 15 MHz offsets).

## 5.6 Linearization Results with Highly Nonlinear Doherty PA at 2.14 GHz

For the Doherty PA, the system was reconfigured to have an RF center frequency of 2.14 GHz. Figure 5.10 shows the linearization results achieved for the highly nonlinear Doherty PA, and Table 5.4 lists the ACPR results obtained.



Figure 5.10 (a) One carrier (b) Two carrier (c) Three carrier and (d) Four carrier WCDMA linearization results for Doherty PA using DSP/FPGA transmitter

Table 5.4 ACPR results for multicarrier WCDMA waveforms using a Doherty PA using DSP/FPGA transmitter, with and without digital predistortion. The memory polynomial DPD using the DSP/FPGA is able to meet the requirement for the four carrier signal, where it was unable to meet the requirement for the VSG/VSA setup.

|                               | 5 MHz Offset |       | 10 MHz Offset |       | 15 MHz Offset |       |
|-------------------------------|--------------|-------|---------------|-------|---------------|-------|
| Waveform Type                 | Lower        | Upper | Lower         | Upper | Lower         | Upper |
|                               | (dBc)        | (dBc) | (dBc)         | (dBc) | (dBc)         | (dBc) |
| One Carrier (No DPD)          | 30.97        | 30.97 | 49.47         | 48.78 | 62.46         | 61.89 |
| One Carrier (Memoryless)      | 52.39        | 51.61 | 63.04         | 62.51 | 68.28         | 66.97 |
| One Carrier (M. Polynomial)   | 55.03        | 54.95 | 64.90         | 62.97 | 68.17         | 66.00 |
| Two Carrier (No DPD)          | 26.16        | 25.57 | 33.73         | 33.31 | 40.66         | 40.25 |
| Two Carrier (Memoryless)      | 46.30        | 45.22 | 50.57         | 51.59 | 54.70         | 55.99 |
| Two Carrier (M. Polynomial)   | 50.18        | 50.23 | 57.56         | 56.89 | 61.61         | 61.25 |
| Three Carrier (No DPD)        | 25.43        | 24.82 | 29.55         | 28.75 | 34.99         | 34.61 |
| Three Carrier (Memoryless)    | 46.13        | 44.84 | 48.38         | 47.84 | 52.15         | 53.14 |
| Three Carrier (M. Polynomial) | 49.24        | 48.58 | 53.38         | 52.99 | 56.18         | 56.64 |
| Four Carrier (No DPD)         | 25.34        | 24.48 | 28.03         | 26.86 | 31.61         | 30.17 |
| Four Carrier (Memoryless)     | 44.92        | 43.55 | 46.32         | 45.24 | 48.89         | 47.73 |
| Four Carrier (M. Polynomial)  | 48.70        | 47.93 | 51.22         | 50.35 | 53.84         | 52.26 |

Again, the linearization results are close to the values obtained in Chapter 4, with the added benefit of higher ACPR values in further offsets.

# **5.7** Conclusion

A high precision digital predistortion transmitter was proposed in this chapter. By using the DSP/FPGA system to generate and apply the predistortion, similar or better performance can be achieved compared to using the experimental setup. In addition, the linearity specifications were achieved using this system for the four carrier WCDMA waveform, while it was not met using the experimental platform in Chapter 4.

#### **Chapter Six: Conclusions and Future Works**

#### **6.1 Summary and Conclusions**

In this thesis, a baseband digital predistortion system architecture implemented on a DSP/FPGA platform was proposed to compensate for the nonlinearity of the power amplifier. First, the performances of the memoryless and memory polynomial behavioural models were verified using a VSG/VSA experimental setup and a mildly nonlinear PA operating in class AB mode as a DUT. For the class AB PA, the memoryless model's NMSE for a one carrier WCDMA signal was shown to be -41.50 dB, while the memory polynomial algorithm using the SVD solution was shown to be -47.06 dB. Two adaptive filter algorithms were also introduced to solve for the memory polynomial coefficients: the RLS and QR-RLS algorithms. Both showed superior NMSE (-45.98 dB for RLS and -47.03 dB for QR-RLS) when compared with the memoryless algorithm.

The NMSE values are related to the digital predistortion capability of the models when memory effects are exhibited in the PA. With the class AB PA using the one carrier WCDMA signal, the upper and lower ACPR improvement at a 5 MHz carrier offset is approximately 18 dB for the memoryless algorithm, while using the memory polynomial algorithm showed an improvement of 20 dB. Noticeably, the memory polynomial algorithm showed to have a lower noise floor in the out of band regions of the spectrum plots. The memoryless algorithm showed to have sufficient linearization to meet the ACPR requirements of the 3GPP standard for one, two, and three carrier signals. However it was unable to meet the requirements when using a wideband four carrier WCDMA signal (the 10 MHz ACPR results in about 48 dBc; the requirement is over 50 dBc), while the memory polynomial was able to achieve these requirements.

In the DSP, the memory polynomial behavioural models were implemented using double precision (64-bit) floating point arithmetic, while the memoryless model used single precision (32-bit) floating point arithmetic. The choice of using 32-bit floating point for the memoryless model was to optimize performance, because there was only small performance degradation between choosing either single or double floating point. For the memory polynomial model, two additional methods were introduced to solve for the coefficients: RLS and QR-RLS. Both methods offered similar NMSE performance as using the singular value decomposition method, but with reduced computational time in the DSP; approximately half the clock cycles need with the QR-RLS method compared with the SVD using polynomial order K = 12, memory depth order M = 2, and sample depth N = 1000. In addition, the memory usage was decreased substantially using these two methods. The algorithm's accuracy was verified using a testing framework to compare the NMSE difference between the simulation results, and the DSP's results. All the models showed to have similar calculated performance between the DSP implementation, and its simulation counterpart. The models were also validated using the experimental setup and a highly nonlinear Doherty PA, where the DSP generated predistorted signal was downloaded into the VSG and the linearity of the PA output was analyzed. Results showed that the memory polynomial model was unable to meet the 3GPP requirements using the four carrier WCDMA signal.

The DSP/FPGA implementation of the predistortion scheme required partitioning the characterization of the power amplifier, and applying the predistortion to the signal. An arbitrary waveform generator, predistortion Look Up Table (LUT), and first in, first out buffers were implemented in the FPGA to facilitate the real-time data acquisition and data manipulation of the signal. The floating point capabilities of the FPGA proved to be computationally and resource intensive. Therefore, for real-time digital predistortion using the memory polynomial model, a LUT was developed for each memory depth of the coefficients. In simulation, the multiple LUT memory polynomial method's NMSE was calculated to only having a difference of 0.01 dB from the original method, meaning that there is little difference when using the multiple LUT for predistortion. The end result is a low-latency predistortion system capable of linearizing power amplifiers exhibiting memory effects.

The effect of the hardware transmitter used in the system is a significant improvement over the VSG, mainly due to the components used. For the wideband four carrier WCDMA signal, the experimental setup was unable to meet the linearization requirements set by the 3GPP at a 10 MHz and 15 MHz offsets for the Doherty PA. However, because the AD9779A DACs have a higher dynamic range compared to the VSG's DACs, the initial waveform at the input of the PA had a lower noise floor, translating into a higher improvement in ACPR at the output of the PA when using digital predistortion.

It was found that over a period of time, one digital predistortion characterization would be able to linearize the PA over a period of 54 minutes (each of the 27 WCDMA predistortion waveforms applied to the PA for 2 minutes). Since the experiments were done in a constant temperature environment, in reality, the PA may be installed in a base station application where it would be subjected to extreme temperature variations; needing an adaptive predistortion scheme. Investigation is needed to determine whether it is necessary for the optimization of the memory polynomial algorithm to perform in a real time environment. It is possible that a lower cost, integer-based processor can handle the predistortion characterization based on the power amplifier's behaviour fluctuation over time.

## **6.2 Directions for Future Work**

The system proposed is intended for a standalone operation, however, the feedback loop for use in the digital predistortion characterization of the PA was out of the scope of this project. By using a high precision ADC at the attenuated output of the PA, it will be possible to have a complete standalone baseband digital predistortion system capable of characterizing, and linearizing PAs.

The thesis used predetermined settings applied for the memory polynomial algorithm (3 taps 13 orders), which may or may not be the optimal requirements for a particular power amplifier. Further study is needed to determine the dimension of the model (M, N) based on the number of taps (M) and the order (N) needed for a PA to optimize the computational performance of the algorithm.

The linearization capability for the DSP/FPGA system resulted in an acceptable ACPR reduction for power amplifiers. However, the spectra for the memory polynomial algorithm did not show a flat noise floor in the out of band regions. Investigation is needed to determine whether this due to the algorithm limitation, or system hardware impairments such as hysteresis.

## References

- Agilent Technologies (2000). <u>Connected Simulation and Test Solutions Using the</u> <u>Advanced Design System</u>, Application Notes, no. 1394.
- Altera Corporation (1997). <u>Application Note 83: Binary Numbering Systems</u>, Altera Corporation.
- Altera Corporation (2007). <u>altfp\_add\_sub Megafunction User Guide</u>, Altera Corporation.
- Altera Corporation (2007). Stratix II Device Handbook, Volume 1, Altera Corporation.
- Altera Corporation (2008). <u>Floating Point Multiplier (ALTFP\_MULT) Megafunction</u> <u>User Guide</u>, Altera Corporation.
- Analog Devices Inc. (2007). <u>ADSP-TS201S EZ-KIT Lite Evaluation System Manual</u>, Analog Devices, Inc.
- Anderson, E., Z. Bai, et al. (1999). <u>LAPACK Users' Guide</u>. Philadelphia, PA, Society for Industrial and Applied Mathematics.
- Ben Nasr, H., S. Boumaiza, et al. (2005). <u>On the critical issues of DSP/FPGA mixed</u> <u>digital predistorter implementation</u>. Microwave Conference Proceedings, 2005. APMC 2005. Asia-Pacific Conference Proceedings.
- Berrut, J. P. and L. Trefethen (2004). "Barycentric Lagrange Interpolation." <u>SIAM</u> <u>Review</u> **46**(3): 501-517.
- Boumaiza, S. and F. M. Ghannouchi (2003). "Thermal memory effects modeling and compensation in RF power amplifiers and predistortion linearizers." <u>Microwave</u> <u>Theory and Techniques, IEEE Transactions on</u> 51(12): 2427-2433.

- Boumaiza, S., M. Helaoui, et al. (2007). "Systematic and Adaptive Characterization
  Approach for Behavior Modeling and Correction of Dynamic Nonlinear
  Transmitters." <u>Instrumentation and Measurement, IEEE Transactions on</u> 56(6):
  2203-2211.
- Briffa, M. A. and M. Faulkner (1996). "Stability analysis of Cartesian feedback linearisation for amplifiers with weak nonlinearities." <u>Communications, IEE</u> <u>Proceedings-</u> 143(4): 212-218.
- Cavers, J. K. (1990). "Amplifier linearization using a digital predistorter with fast adaptation and low memory requirements." <u>Vehicular Technology, IEEE</u> <u>Transactions on</u> **39**(4): 374-382.
- Cripps, S. C. (2006). <u>RF power amplifiers for wireless communications</u>. Boston, Artech House.
- European Telecommunications Standards Institute (2008). <u>3GPP TS 25.104 v8.2.0:</u> <u>Universal Mobile Telecommunications System (UMTS); Base Station (BS) radio</u> <u>transmission and reception (FDD)</u>.
- Gilabert, P. L., A. Cesari, et al. (2008). "Multi-Lookup Table FPGA Implementation of an Adaptive Digital Predistorter for Linearizing RF Power Amplifiers With Memory Effects." <u>Microwave Theory and Techniques, IEEE Transactions on</u> 56(2): 372-384.
- Glisic, S. G. (2004). <u>Advanced wireless communications : 4G technologies</u>. Chichester, England, Wiley.

Haykin, S. (2001). <u>Adaptive Filter Theory</u>. Upper Saddle River, NJ, Prentice Hall. Haykin, S. S. (2001). <u>Communication systems</u>. New York, Wiley. Holma, H., A. Toskala, et al. (2004). WCDMA for UMTS

radio access for third generation mobile communications. Chichester, Wiley.

Kenington, P. B. (2000). <u>High-linearity RF amplifier design</u>. Boston, Artech House. Kim, J. and K. Konstantinou (2001). "Digital predistortion of wideband signals based on

power amplifier model with memory." <u>Electronics Letters</u> **37**(23): 1417-1418.

- Kwan, A., S. Boumaiza, et al. (2006). <u>Automating the Verification of SDR Base band</u>
  <u>Signal Processing Algorithms Developed on DSP/FPGA Platform</u>. Signal
  Processing Systems Design and Implementation, 2006. SIPS '06. IEEE Workshop on.
- Kwan, A., M. Helaoui, et al. (2008). "Wireless Communications Transmitter
  Performance Enhancement Using Advanced Signal Processing Algorithms.
  Running in a Hybrid DSP/FPGA Platform." Journal of Signal Processing
  Systems, from <a href="http://dx.doi.org/10.1007/s11265-008-0225-3">http://dx.doi.org/10.1007/s11265-008-0225-3</a>
- Liu, T., Y. Yan, et al. (2008). <u>Accurate Time-Delay Estimation and Alignment for RF</u> <u>Power Amplifier/Transmitter Characterization</u>. Circuits and Systems for Communications, 2008. ICCSC 2008. 4th IEEE International Conference on.
- Muruganathan, S. D. and A. B. Sesay (2006). "A QRD-RLS-Based Predistortion Scheme for High-Power Amplifier Linearization." <u>Circuits and Systems II: Express Briefs</u>, <u>IEEE Transactions on 53(10): 1108-1112.</u>
- Raich, R., Q. Hua, et al. (2004). "Orthogonal polynomials for power amplifier modeling and predistorter design." <u>Vehicular Technology, IEEE Transactions on</u> 53(5): 1468-1479.

- Reynaert, P. and M. Steyaert (2006). <u>RF power amplifiers for mobile communications</u>. Dordrecht, Springer.
- Rummery, S. and G. R. Branner (1997). <u>Power amplifier design using feedforward</u> <u>linearization</u>. Circuits and Systems, 1997. Proceedings of the 40th Midwest Symposium on.
- Troelsen, A. W. (2002). COM and .NET interoperability. Berkeley, Calif., Apress.
- Vaseghi, S. V. (2006). <u>Advanced digital signal processing and noise reduction</u>. Chichester, West Sussex, England ; Hoboken, Wiley.
- Vuolevi, J. and T. Rahkonen (2003). <u>Distortion in RF power amplifiers</u>. Boston, Artech House.