CMOS digital transmitters for emerging applications

CMOS digital transmitters for emerging applications

Title	CMOS digital transmitters for emerging applications
Publication Type	dissertation
School or College	College of Engineering
Department	Electrical & Computer Engineering
Author	Azam, Ali
Date	2019
Description	Mobile data traffic is growing at a staggering rate of 75% since 2014. The monthly data traffic ramped up from 2.1 million TB to 3.7 million TB from 2014 to 2015, and it is expected to continue growing at an even faster rate. Many emerging applications related to the internet-of-things (IoT) and 5G communications have posed new challenges for wireless circuits and sensor systems in terms of power consumption, performance metrics such as linearity, signal to noise ratio (SNR), error vector magnitude (EVM), out-of-band noise (OOB), adjacent channel leakage ratio (ACLR), etc., flexibility, and data throughput. For instance, the output power of a wireless transmitter may vary from a few mW to a few W, and the regulatory requirement for out-of-band (OOB) noise may be as stringent as -160 dBc/Hz while the operating frequency range may span from 0.5-100 GHz, depending on the applications. In addition, the sensors should be low power (< μW), and they may need to operate in remote places and be adaptable to different complementary metal-oxide semiconductor (CMOS) processes. Novel circuit architectures and techniques to support these emerging applications are presented in this work. First, a high power (~2W) fully digital transmitter architecture to support enhanced license augmented access (eLAA) is presented using state-of-the-art CMOS (16 nm). The transmitter meets stringent linearity and OOB requirements for WiFi and cellular coexistence. eLAA is a promising protocol recently released by the third-generation partnership project (3GPP) to augment cellular access by utilizing the existing iv WiFi infrastructure. In this dissertation, I propose for the first time a multisegmented, fully-unary SCPA architecture in conjunction with integrated high-order power-combiners and a linear switching scheme to meet OOB noise requirements with minimal impact on energy efficiency, die area, and cost. Second, an additional efficiency and linearity enhancement technique has been presented using delta-sigma modulated SCPA. Third, a frequency tunable multiband digital power amplifier (DPA) in 65 nm CMOS is proposed. It allows coverage of multiple fragmented frequency bands using a single narrowband DPA, ultimately reducing cost and area, and increasing data throughput. It is the first time, a DPA has been used to cover a wideband using a single narrowband power amplifier, and it leverages the switched-capacitor power amplifier (SCPA) architecture. Finally, an ultra-low power (47.2 nW) pulse-width-modulated (PWM) temperature sensor suitable for wireless communication using a digital friendly modulation and demodulation technique is presented. The architecture is capable of being powered by energy scavenging; hence, it is a suitable candidate for IoT and 5G. It consumes extremely low power (<50 nW), while operating with a supply voltage as low as 450 mV.
Type	Text
Publisher	University of Utah
Dissertation Name	Doctor of Philosophy
Language	eng
Rights Management	© Ali Azam
Format	application/pdf
Format Medium	application/pdf
ARK	ark:/87278/s6wb15dv
Setname	ir_etd
ID	1671473
OCR Text	Show CMOS DIGITAL TRANSMITTERS FOR EMERGING APPLICATIONS by Ali Azam A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Electrical and Computer Engineering The University of Utah May 2019 Copyright © Ali Azam 2019 All Rights Reserved The University of Utah Graduate School STATEMENT OF DISSERTATION APPROVAL The dissertation of Ali Azam has been approved by the following supervisory committee members: Jeffrey S. Walling , Chair 12/10/2018 Date Approved Ross Walker , Member 01/23/2019 Date Approved Kenneth Stevens , Member 12/10/2018 Date Approved Pierre-Emmanuel Gaillardon , Member 12/10/2018 Date Approved Sneha K. Kasera , Member 01/17/2019 Date Approved Ashoke Ravi , Member Date Approved and by Florian Solzbacher the Department/College/School of , Chair/Dean of Electrical and Computer Engineering and by David B. Kieda, Dean of The Graduate School. ABSTRACT Mobile data traffic is growing at a staggering rate of 75% since 2014. The monthly data traffic ramped up from 2.1 million TB to 3.7 million TB from 2014 to 2015, and it is expected to continue growing at an even faster rate. Many emerging applications related to the internet-of-things (IoT) and 5G communications have posed new challenges for wireless circuits and sensor systems in terms of power consumption, performance metrics such as linearity, signal to noise ratio (SNR), error vector magnitude (EVM), out-of-band noise (OOB), adjacent channel leakage ratio (ACLR), etc., flexibility, and data throughput. For instance, the output power of a wireless transmitter may vary from a few mW to a few W, and the regulatory requirement for out-of-band (OOB) noise may be as stringent as 160 dBc/Hz while the operating frequency range may span from 0.5-100 GHz, depending on the applications. In addition, the sensors should be low power (< µW), and they may need to operate in remote places and be adaptable to different complementary metal-oxide semiconductor (CMOS) processes. Novel circuit architectures and techniques to support these emerging applications are presented in this work. First, a high power (~2W) fully digital transmitter architecture to support enhanced license augmented access (eLAA) is presented using state-of-the-art CMOS (16 nm). The transmitter meets stringent linearity and OOB requirements for WiFi and cellular coexistence. eLAA is a promising protocol recently released by the thirdgeneration partnership project (3GPP) to augment cellular access by utilizing the existing WiFi infrastructure. In this dissertation, I propose for the first time a multisegmented, fullyunary SCPA architecture in conjunction with integrated high-order power-combiners and a linear switching scheme to meet OOB noise requirements with minimal impact on energy efficiency, die area, and cost. Second, an additional efficiency and linearity enhancement technique has been presented using delta-sigma modulated SCPA. Third, a frequency tunable multiband digital power amplifier (DPA) in 65 nm CMOS is proposed. It allows coverage of multiple fragmented frequency bands using a single narrowband DPA, ultimately reducing cost and area, and increasing data throughput. It is the first time, a DPA has been used to cover a wideband using a single narrowband power amplifier, and it leverages the switched-capacitor power amplifier (SCPA) architecture. Finally, an ultralow power (47.2 nW) pulse-width-modulated (PWM) temperature sensor suitable for wireless communication using a digital friendly modulation and demodulation technique is presented. The architecture is capable of being powered by energy scavenging; hence, it is a suitable candidate for IoT and 5G. It consumes extremely low power (<50 nW), while operating with a supply voltage as low as 450 mV. iv To my parents. “Ever tried. Ever failed. No matter. Try again. Fail again. Fail better.” - by Samuel Beckett, Worstward Ho (1983) TABLE OF CONTENTS ABSTRACT ....................................................................................................................... iii LIST OF TABLES ............................................................................................................. ix ACKNOWLEDGMENTS .................................................................................................. x Chapters 1. INTRODUCTION ......................................................................................................... 1 2. HIGH POWER HIGH RESOLUTION SWITCHED-CAPACITOR POWER AMPLIFIER ARCHITECTURE FOR ENHANCED LICENSE AUGMENTED ACCESS......................................................................................................................... 6 2.1 Introduction .......................................................................................................... 6 2.2 Motivation .......................................................................................................... 11 2.3 Enhanced License Augmented Access (eLAA) ................................................. 12 2.3.1 WLAN Offload ...................................................................................... 12 2.3.2 Link Aggregation ................................................................................... 13 2.3.3 Carrier Aggregation ............................................................................... 13 2.4 Unary-Segmented Switched-Capacitor Power Amplifier (SCPA) .................... 13 2.4.1 Conventional SCPA ............................................................................... 13 2.4.2 Existing C-DAC Architectures .............................................................. 17 2.4.3 Effect of Capacitive Mismatch on C-DAC ............................................ 23 2.4.4 Proposed C-DAC Architecture .............................................................. 24 2.4.5 Phase Linearization in a Unary-Segmented Array................................. 28 2.5 Compact and Symmetrical Power Combiner Structures ................................... 29 2.5.1 Four-Way Power Combiner Structure ................................................... 30 2.5.2 Eight-Way Power Combiner Structure .................................................. 32 2.6 Switching Schemes in Higher Order Power Combiners .................................... 33 2.6.1 Four-Way Combiner .............................................................................. 33 2.6.2 Eight-Way Combiner ............................................................................. 34 2.7 Physical Design and Extracted Simulation Results ........................................... 35 2.8 Measurement and Troubleshooting.................................................................... 38 2.9 Future Work and Summary................................................................................ 40 3. A HYBRID DUAL RATE (/ NYQUIST) SWITCHED-CAPACITOR POWER AMPLIFIER ................................................................................................................. 81 3.1 Introduction ........................................................................................................ 81 3.2 Hybrid DSM/Nyquist SCPA Architecture ......................................................... 84 3.3 Hybrid SCPA Circuit Details ............................................................................. 86 3.4 Extracted Simulation Results ............................................................................. 87 3.5 Conclusion ......................................................................................................... 89 4. TUNABLE MULTIBAND DIGITAL POWER AMPLIFIER .................................... 97 4.1 Introduction ........................................................................................................ 97 4.2 Problem Statement and Motivation ................................................................. 100 4.3 Operation of a Switched-Capacitor Power Amplifier (SCPA) ........................ 102 4.4 Frequency Tunable Multiband SCPA .............................................................. 105 4.5 Implementation ................................................................................................ 106 4.5.1 Capacitor Selection and Sizing ............................................................ 107 4.5.2 Programmable Capacitor (DPC) and Matching Network .................... 108 4.5.3 Switch, Driver, and Logic Design........................................................ 109 4.6 Experimental Results ....................................................................................... 110 4.6.1 Measurement Test-Bench and Instrumentation ................................... 110 4.6.2 Static Measurements ............................................................................ 112 4.6.3 Dynamic Measurements....................................................................... 114 4.7 Future Directions ............................................................................................. 115 4.8 Comparison and Summary............................................................................... 116 5. AN ULTRA-LOW POWER FULLY INTEGRATED PULSE-WIDTH MODULATED CMOS TEMPERATURE SENSOR FOR INTERNET-OF-THINGS ...................... 133 5.1 Introduction ...................................................................................................... 133 5.2 Circuit Architecture and Theory of Operation ................................................. 137 5.2.1 CTAT Voltage Generation ................................................................... 137 5.2.2 Reference Voltage Generation ............................................................. 139 5.2.3 Sawtooth Wave Generation ................................................................. 144 5.2.4 Comparator Design .............................................................................. 145 5.3 Simulation and Measurement Results .............................................................. 146 5.4 Comparison and Summary............................................................................... 150 6. DISSERTATION SUMMARY ................................................................................. 166 REFERENCES ............................................................................................................... 168 viii LIST OF TABLES Tables 2. 1 Simulation result with extraction of different hierarchy. ........................................ 42 4..1 .Comparison to prior art for recent wideband/multi-band/multi-standard power amplifiers. .............................................................................................................. 118 5. 1 CTAT generator using different CMOS processes. ............................................... 151 5. 2 Optimized device sizes for a 65nm, 130nm, and 180nm CMOS........................... 152 5. 3 Comparison of this work with state-of-the-art. ...................................................... 152 ACKNOWLEDGMENTS I truly thank my adviser, Dr. Jeffrey S. Walling for his outstanding guidance and support during my graduate study at University of Utah. I have learned a great deal from him, not only on technical problems but also about a way of living. He will continue to be a great source of inspiration for the rest of my life. I would also like to thank Dr. Ashoke Ravi, Dr. Bassam Khamaisi, and Dr. Ofir Degani for their continuous guidance and cordial support throughout my internship at Intel Labs. The serenity Ashoke possess in his character truly amazes me. I would like to grab the opportunity to thank my group mates (Zhidong Bai, Kyle Holzer, and Wen Yuan) and class mates (Taufiq Ahmed, Shakir-ul-Haque Khan, and Benozir Ahmed) at the University of Utah for their valuable contributions and support. Special thanks to my wife (Rakhi) for bearing with all of my idiosyncrasies. I would like to dedicate this work to my late mother, Renu. Lastly, Alhamdulillah (all praise to God) for giving me the patience and endurance. CHAPTER 1 INTRODUCTION Power amplifiers continue to be an intense area of research to the radio-frequency integrated circuit designers’ community. They are the dominant power consuming block of the entire transceiver chain; hence, they dominate the global energy efficiency and many other relevant performance metrics. Mobile devices have limited battery capacity and lifetime; hence, it is critical that the limited available energy is used judiciously. Moreover, emerging applications related to the internet-of-things (IoT) have unboxed a gamut of new applications such as biomedical sensors, infrastructural health monitoring, energy scavenging, remote sensing, microelectromechanical actuators, etc. New facets of existing applications such as mobile computing, online gaming and gambling, wireless streaming of high-definition audio and video, social media applications, etc. also contribute to the heightened data traffic. In addition, conventional cellular data traffic is increasing at an unprecedented pace. It is expected to increase by at least 11 times by 2020. “Faster, cheaper for everyone, everywhere” has been the prime motivation for researchers to ceaselessly explore and innovate. A practical design constraint placed on modern wireless and sensor systems designers is the use of bulk silicon CMOS (complementary metal-oxide semiconductor) processes. This is because CMOS has the lowest cost per area among all common 2 integrated circuit technologies, owing to robust fabrication methods. Hence, it is highly desirable for commercial products. On the contrary, circuits based on discrete components or other integrated circuit technologies (e.g., III-V, SiGe, etc.) are either costly or area inefficient, making them less attractive for commercial adoption. Continued scaling of CMOS processes benefits digital circuits for reduction of power and improving speed. At the same time, it creates some difficult issues for the design of analog circuits in terms of voltage headroom, leakage current, reduced intrinsic gain, etc. Therefore, transmitter architectures employing devices as a switch rather than a voltage-dependent current source are highly desired to provide better performance as CMOS feature sizes are reduced. Increasing data throughput, improving signal quality, and providing flexibility to adapt to different applications and processes is among the highest priorities for wireless transmitters. Spectrally efficient modulations such as orthogonal frequency division multiplexing (OFDM) modulate the maximum amount of data within a given bandwidth, but the spectral efficiency comes at the expense of increased peak-to-average ratio (PAPR ~ 6dB) for the transmitted signal. Higher PAPR requires that the transmitter must be able to support peak output power even though it is expected to operate at much lower power (back-off) on average. All power amplifiers (PA) are most efficient when operating at peak output power. Hence, PAs show reduced global efficiency when using efficient modulation techniques. PAs offer higher efficiency when tuned to operate in a narrow band due to optimization of the circuits and higher quality factor of the passive components. However, the available wireless spectrum exists in multiple discrete bands across a wide range of frequencies, which requires a wideband PA if flexibility in 3 operation is to be provided. These contradictory requirements have in the past required a trade-off in terms of energy efficiency, spectral efficiency, output power, and the flatness of the frequency response across a wide bandwidth. In summary, new circuit and system architectures that can break the present paradigms are desired so that a single PA can support multiple modes of operation over wider frequency range with good efficiency. Improved signal quality and reduction of noise are fundamental to wireless communication. Dedicated frequency bands are allocated by the Federal Communications Commission (FCC) for different applications such as commercial, military, and personal. The out-of-band (OOB) noise for one frequency lies within the inband operation of another frequency. The FCC strictly regulates the OOB noise of transmission because it affects other applications. In addition, the vision for future transceivers is to co-optimize the transmitter and receiver and share common communication resource blocks. Full-duplex or even a truly half-duplex operation requires the transmitters and receivers to coexist in a noninvasive way. PAs transmit at high output power; hence, the OOB noise can significantly degrade the noise performance of the receiver. Isolators and filters provide some suppression of the noise from the transmitter to the receiver, but they are never a substitute to reduction of OOB noise in transmitters. Digital power amplifiers (DPA) are quantized systems. Quantized systems have an OOB spectrum that is additive white and Gaussian noise (AWGN). This broadband noise inherently reduces the signal-to-noise ratio (SNR) at the receiver. Increasing the resolution of the DPA decreases the OOB noise. Hence, having high resolution DPA is important. Signal quality and linearity are also important because they dictate the signal fidelity at the output of the transmitter (e.g., error vector magnitude 4 (EVM), bit error rate (BER), and adjacent channel leakage ratio (ACLR)). A nonlinear system can be corrected using digital predistortion (DPD) where the inverse of the nonlinear behavior is applied to the signal digitally so that the product of the predistorted input and the distortion of the system yield a linear response. However, higher nonlinearity necessitates the DPD to run at a faster clock rate, eventually degrading the global efficiency. Therefore, improving linearity and enhancing the effective resolution are two of the fundamental bottlenecks to realizing a high throughput communications systems where multiple radios can coexist on the same platform in future. Another key challenge is to offer flexibility and adaptability to different processes and applications. Most applications related to the IoT require circuits to be simple, low power and tolerant to process-voltage-temperature (PVT). They should also have wide operable temperature and supply-voltage range. Some applications require placement in remote areas (infrastructural health monitoring) or inside the body (retinal and cochlear implants, electrode arrays, pacemakers, etc.); hence, it is desired to able to operate using energy harvesters or to have low power consumption. They should also be able to communicate wirelessly using digital-friendly modulation schemes. On the contrary, some emerging applications such as enhanced licensed augmented access (eLAA) require the transmitter to be extremely high power (32 dBm) using bulk silicon in standard CMOS process. eLAA can improve service offering and increase data throughput significantly by using the unlicensed band (5-6 GHz) for both WiFi and cellular data transmission. It is a promising protocol recently released by the third- generation partnership project (3GPP) that can leverage unlicensed frequency bands to increase the network capacity significantly and support more users. The power amplifiers for eLAA need to support 5 both WiFi and cellular modes of operation in order to take advantage of the existing WLAN infrastructure. However, WiFi full power is ~26 dBm while cellular peak power is ~32 dBm. Designing a 32 dBm PA implies having a DC current of approximately 4 A through a chip occupying an area of less than 1mm2. This specification is extremely challenging in fine-line CMOS process. Therefore, power amplifier architectures should be adoptable by other applications needing a dynamic power range of >40 dB. This dissertation addresses multiple challenges related to providing energy efficient circuits for wireless communications and sensor systems. Chapter 2 presents a high power, high linearity, and high resolution (16 bits) DPA for eLAA applications. Chapter 3 presents an additional resolution enhancement technique using Sigma-Delta modulation, which can extend maximum achievable resolution even further. Chapter 4 presents a tunable frequency digital power amplifier, which can cover a wide bandwidth using single narrowband efficient switched-capacitor power amplifier. Chapter 5 presents an ultra-low power (<50 nW) temperature sensor with simple transmission capability. Measurements results of the fabricated chips along with theory and simulation are presented throughout this dissertation. 6 CHAPTER 2 HIGH POWER HIGH RESOLUTION SWITCHED-CAPACITOR POWER AMPLIFIER ARCHITECTURE FOR ENHANCED LICENSE AUGMENTED ACCESS 2.1 Introduction The unbridled growth of mobile data traffic has been mostly due to the increasing number of LTE subscribers that had already reached 635 million by the first quarter of 2015 [1], [2]. LTE has successfully managed to cope up with the demand by efficiently utilizing the available bandwidth. For instance, carrier aggregation in the LTE-advanced (LTE-A) and LTE-A pro has enabled to enhance network capacity while providing a peak data rate of 450 Mbps during 2015 [3]. On the contrary, it is becoming extremely difficult to keep abreast with the increasing number of subscribers and interest for higher data rates given the limited spectral availability. Therefore, mobile operators are very interested in the unlicensed spectrum to enhance the capacity as well as augment their service offering at the minimum price [4]. Recently released enhanced license assisted access (eLAA) is a promising protocol of third-generation partnership project (3GPP) where unlicensed WiFi spectrum can be used for LTE to augment the cellular access. Among all the frequency bands, 3GPP is particularly interested at 5-6 GHz mostly for two reasons. Firstly, this frequency band 7 has more unlicensed spectrum available than any other frequency range in the L-band (12 GHz) or C-band (4-8 GHz). For instance, it has approximately 800 MHz of unlicensed spectrum available in the United States. Secondly, it facilitates a larger operating area by allowing the transmitter power to exceed 30 dBm in the frequency range 5.47-5.725 GHz and 5.725-5.825 GHz. Therefore, it allows the cellular network to take the fullest advantage of the existing 5 GHz WiFi infrastructure operating in the same unlicensed band by transmitting at higher output power [5], [6]. The Power amplifier (PA) is the dominant energy consumer in the transmitter chain. Therefore, the performance and efficiency of the overall transmitter is mostly dictated by the performance of the PA itself. Hence, the transmitter and the PA have been used synonymously in this chapter. The design and implementation of the transmitter/PA for the newly released eLAA poses some tough challenges. Firstly, the transmitter needs to be able to transmit at the cellular maximum output power of approximately 32 dBm over the frequency range 5-6 GHz (High band-HB) [7]. In other words, it needs to deliver the same performance as the global system for mobile communications (GSM)/GSMEdge but at a six-fold/three-fold higher frequency [8], [9]. Therefore, it is desired to take advantage of fine-line CMOS processes with higher transit frequency to support good performance at such high frequency. On the contrary, most advanced CMOS processes from 22 nm and onwards have a supply voltage of approximately 1 volt or lower, making it very challenging to achieve the peak output power of 32 dBm with good efficiency due to reduced optimum impedance that needs to be presented to the power amplifier. Reduced impedance implies that excess direct current (DC) needs to be drawn from the power supply while higher operating radio-frequency (RF) incurs additional switching 8 loss [9]–[11]. Secondly, the transmitter needs to meet stringent out-of-band (OOB) spectrum mask requirements in order to ensure seamless coexistence of WiFi and cellular both operating at the same frequency band [7], [12], [13]. For instance, the transmitters for connectivity are required to meet a noise floor of better than -140dBm/Hz while delivering an output of 20 dBm, i.e., a power spectral density of -160dBc/Hz [14], [15]. This includes the impact of all impairments in the system, viz. phase noise/jitter, distortion, and quantization. The quantization noise contribution translates to a resolution of better than 12-bits. In addition, the transmitter will be required to support a power control dynamic range of ~60dB while meeting the EVM specifications of the highest modulation and coding schemes (MCS) modes [16]. All these factors imply that the digital power amplifier (DPA) may need to support a resolution of ~16bits. Although modern CMOS processes have better capacitive matching, achieving 65536 quantized voltage levels at a precise modulated phase angle is extremely challenging. Any CMOS process has inherent component mismatch while the overall transmitter undergoes through process-voltage-temperature (PVT) mismatch. All these factors combined cause amplitude and phase nonlinearity, which affect the overall error vector magnitude (EVM) and bit error rate. Lastly, the modulation and coding schemes have evolved a great deal over the last decade in an effort to embed more information using limited spectral bandwidth. As a consequence, the modulation operates at an average power much lower than peak power, which is generally quantified by the peak-to-average power ratio (PAPR) [13]. Large peak to average power ratio is problematic in terms of designing PA [17], [18]. The PA must support peak power but most often will operate at much higher back-off defined by PAPR. The PA offers peak efficiency at peak power while the 9 efficiency rolls off gradually at lower power, and hence aggravating the overall energy efficiency. Contrary to any transmitter working on OFDM modulation, the transmitter should perform with good efficiency at least up to 12 dB back-off. This is due to the fact that the transmitter needs to support both modes of operation: WiFi and cellular. Therefore, it should have good efficiency for at least WiFi average power (20 dBm) and cellular average power (26 dBm) while supporting peak cellular output power of 32 dBm in order to support good average efficiency. To summarize the abovementioned points, emerging applications such as eLAA demand the transmitter to achieve higher output power, better resolution, and better linearity at higher frequency using ever shrinking supply voltage of typical CMOS processes. Moreover, it needs to operate with good average efficiency even at modern MCS modes having higher peak-to-average power ratio (PAPR) [19]–[23]. Switched-capacitor power amplifier (SCPA) is a prime candidate for the application. Firstly, it uses Class-D architecture at its core. A Class-D amplifier operates in the voltage-mode as a switch and hence does not need to maintain a saturation voltage like current-mode switching amplifiers, making it more adaptable to ever shrinking voltage headroom. Secondly, SCPA is scalable and digital friendly. Amplitude modulation is performed by the ratio of the switching to nonswitching capacitance while modulation is performed by a modulated digital clock [24]. Therefore, it allows the SCPA to interface directly to the signal processing. Higher resolution can be achieved using switched-capacitor architecture as the amplitude is controlled by the ratio of capacitances rather than absolute capacitance [25]. Therefore, this architecture is less susceptible to PVT variations. However, the application of eLAA requires the transmitter/PA to meet 10 certain requirements such as peak output power of 32 dBm at a frequency range of 5-6 GHz, resolution of 16 bits and good efficiency up to 12 dB back-off. Meeting these specifications alone with conventional SCPA is energy and area inefficient, if not implausible. This chapter presents a novel SCPA architecture that is capable of delivering cellular power at good average efficiency and linearity using nominal devices and typical supply voltages for the upcoming eLAA and all future modes of operation with even tighter spectral mask regulations. This chapter presents four key concepts. Firstly, a segmented RF C-DAC architecture that enables the PA to meet a stringent spectral mask requirement while having negligible impact on the die area or the efficiency. Secondly, the design technique for compact power combining structures that enable the PA to deliver multiwatt level power with good efficiency using nominal CMOS supply voltage. Thirdly, a new switching scheme for the PA to achieve good linearity while providing good efficiency even at 12 dB back-off. Fourthly, a phase correction technique that enables to meet a stringent EVM specification by improving overall linearity. This chapter is subdivided into six sections. The motivation, problem statement, and goals are discussed in details in Section 2.2. Section 2.3 presents the novel multisegmented fully unary switched-capacitor array architecture, which is indispensable for achieving high resolution at a high operating frequency. Section 2.4 describes the high-quality factor and low insertion loss power combiner structures for achieving wattlevel power using conventional CMOS technology. Section 2.5 illustrates the novel switching scheme for achieving good linearity even using higher-order power combiner structures. Lastly, Section 2.6 presents the results and performance for the RF transmitter 11 designed in a 16 nm CMOS process. 2.2 Motivation Mobile data traffic continues to increase in an unprecedented rate primarily driven by the ubiquitous cellular and WLAN networks, cheaper mobile devices, new facets of communication such as video streaming, live video, chatting and texting applications, social networking, and the crave, especially among the young generation, for always being connected. Figure 2. 1 shows the data traffic per month every year, while Figure 2. 2 shows the data traffic monthly in different categories. It is evident from Figure 2. 2 that new applications such as video streaming and live video and file sharing have already taken the dominant role over the voice in terms of data consumption. The trend suggests that the pattern is expected to remain unchanged, but the traffic is expected to increase exponentially. Figure 2. 3 presents the data traffic per month by geography. North America has the second highest demand for consumer data. Mobile service providers are facing a daunting task to keep abreast with the sheer volume of data that need to be handled every day via LTE network. “Faster and cheaper, data everywhere for everyone” has been the primary driving force from the operators’ point of view. But it is becoming increasingly tougher to improve the service offering using the limited available spectrum. Therefore, operators are very keen to take advantage of the unlicensed spectrum in addition to the licensed spectrum to improve their service offering. The unlicensed spectrum available to use can greatly augment the service as the available bandwidth is comparable to the licensed bandwidth, as shown in Figure 2. 4. Spectrum availability varies depending on the region. It offers 800 MHz bandwidth in the 12 United States while allowing output power greater than 30 dBm in the band 5.4 GHz and 5.8 GHz. 2.3 Enhanced License Augmented Access (eLAA) eLAA utilizes carrier aggregation technology to facilitate seamless coexistence of both licensed and unlicensed spectrum to increase data throughput and ensure good indoor and outdoor coverage [22]. Unlicensed band refers to the band where there is no exclusive right as long as it does not disrupt existing services. The 5 GHz unlicensed band has been being used for wireless local area network (WLAN), commonly known as WiFi. Therefore, this band can also be used for augmenting LTE services as long as Wi-Fi service can coexist unhindered. 3GPP has mostly focused on three techniques for traffic offload in the unlicensed spectrum: WLAN offload, link aggregation, and carrier aggregation, as shown in Figure 2. 5 [3], [7], [26]. Recent releases by 3GPP (release 13) has defined the specifications, where the unlicensed carrier can be used as auxiliary component carrier in the LTE aggregation framework to augment the cellular access [20]. eLAA framework works on the principle of “listen before talk” so that it does not adversely affect other existing services. 2.3.1 WLAN Offload WLAN offload refers to the delivery of cellular network over WLAN to bypass fractional traffic of the cellular network via WLAN network. It takes advantage of the fact that most mobile devices have a built-in WLAN transceiver. Moreover, WLAN access 13 points are ubiquitous. Offloading can be performed in various ways such as delivery of LTE data over WLAN network, coarse static transition between WLAN and LTE based on environment, and signal strength or smooth seamless transition between WLAN and LTE when entering/exiting WLAN or LTE coverage cell [3]. 2.3.2 Link Aggregation Reliable LTE is used to increase the coverage and quality-of-service (QoS) of WLAN network. Operators can also take advantage of existing WLAN infrastructure to improve the data rate and throughput. It can be very useful for an indoor environment or heavily crowded public place such as airports or stadiums. 2.3.3 Carrier Aggregation Carrier aggregation works on the principle of the coexistence of the licensed band with the unlicensed band. Licensed band works as the primary band to maintain reliable service, while the unlicensed band works as the secondary band to enhance data throughput and network capacity. 2.4 Unary-Segmented Switched-Capacitor Power Amplifier (SCPA) 2.4.1 Conventional SCPA SCPA uses Class-D topology at the core of its functionality, but the superior performance is due to different amplitude modulation technique. Class-D architecture works on the principle of envelope elimination and restoration technique [27]. A phase modulated LO clock switches the PA inverter, while the low drop-out (LDO) provides 14 the amplitude modulated supply to the inverter. Figure 2. 6 shows a generic Class-D operation. Class-D offers better efficiency compared to linear power amplifiers (Class A, B, AB, and C) [28] but worse than other switching PA’s, i.e, Class-E and Class-F [28]– [34]. The efficiency degradation is mostly due to drain capacitance especially at high frequency and also due to linear low drop-out (LDO) supply modulator [35]. On the contrary, Class-D is a digital friendly architecture and is inherently area efficient due to the absence of choke/inductor like Class-E and Class-F PA. SCPA improves the efficiency by removing the LDO and replacing the capacitor in the matching network with an array of capacitors with smaller values where voltage can be controlled by capacitive voltage divider. The bottom plates of the capacitors are switched between VDD and VGND by an inverter. Output voltage can be modulated depending on the ratio of capacitors that are toggling and that are held at fixed potential. It offers better efficiency compared to Class-D as it eliminates the nonefficient linear voltage regulator. It also offers better matching across process-voltage-temperature (PVT) variations because the output voltage can be expressed as a ratio of capacitance rather than an absolute value of capacitance. Figure 2. 7 shows a generic n-bits SCPA. Even though it shows a single ended operation, the final implementation is differential for better linearity. Amplitude modulation is performed by controlling the ratio of capacitors that are switching at the LO clock rate and that are held at fixed potential. For instance, Figure 2. 8 shows two test cases (top left and bottom left). All capacitors are switching at the clock rate according to Figure 2. 8 (a) and Figure 2. 8 (b). Therefore, two equations can be written for positive and negative half cycle of the local oscillator (LO). During negative half cycle for test case Figure 2. 8 (a), capacitor, C connects to the VDD 15 by the PMOS devices in the PA. The bottom plates of the capacitors are charged by an equivalent charge of 𝑄 ′ = 4𝐶𝑉𝐷𝐷 . (2. 1) During positive cycle, the bottom plates connect to VGND by the NMOS devices. The charge gets shared by all capacitors in the capacitive network as 𝑄 ′′ = 4𝐶𝑉𝑜𝑢𝑡 . (2. 2) The charge must be conserved; hence, 𝑄 ′ = 𝑄 ′′ , and 𝑉𝑜𝑢𝑡 = 𝑉𝐷𝐷 . For the test case in Figure 2. 8 (b), charge conservation principle implies that 𝑉𝑜𝑢𝑡 = 0.5 𝑉𝐷𝐷 . 𝑄 ′ = 2𝐶𝑉𝐷𝐷 . (2. 3) 𝑄 ′′ = 4𝐶𝑉𝑜𝑢𝑡 . (2. 4) Therefore, the output voltage can be controlled by controlling the number of capacitors that are switching. Figure 2. 8 (c) and Figure 2. 8 (d) show the corresponding implementation with the matching network that provides the bandpass filtering to create a sinusoidal output. Major sources of the nonlinearity are the mismatch in the capacitor ratio and the jitter in the LO clock. Most fine-line CMOS processes usually offer superior linearity due to lower parasitic capacitance. As a result, SCPA offers best linearity performance among all the switching PAs. The output voltage of an N-element SCPA array with equal capacitances where n capacitors are switching can be expressed as 𝑉𝑜𝑢𝑡 = 2 𝑛 𝜋 𝑁 𝑉𝐷𝐷 , (2. 5) 16 where 𝑉𝐷𝐷 is the supply voltage of the inverter, and the coefficient 2 𝜋 is due to the fundamental coefficient in the Fourier expansion of a square wave. The power consumption in a SCPA is due to the charging-discharging current though the PA. The input and output power can be expressed as 𝑃𝑖𝑛 = 𝑃𝑜𝑢𝑡 = 𝑛(𝑁−𝑛) 𝑁2 2 𝑉𝑜𝑢𝑡 𝑅𝑜𝑝𝑡 2 𝐶 𝑉𝐷𝐷 𝑓𝑐 . 2 𝑛 = (𝜋)2 (𝑁)2 (2. 6) 2 𝑉𝐷𝐷 𝑅𝑜𝑝𝑡 , (2. 7) where 𝐶 is the unit capacitance used in the array, 𝑓𝑐 is the frequency of operation for the SCPA, and 𝑅𝑜𝑝𝑡 is the optimum termination impedance presented to the PA. The efficiency of the SCPA is determined by the ratio of output power to total power: 𝜂𝑆𝐶𝑃𝐴 = 𝑃 𝑃𝑜𝑢𝑡 𝑖𝑛 +𝑃𝑜𝑢𝑡 = 4𝑛2 4𝑛2 + , 𝜋𝑛(𝑁−𝑛) 𝑄𝑁𝑊 (2. 8) where QNW is the loaded network quality factor, which is related to Ropt and total equivalent capacitance in an SCPA (Ctot) as 𝑋𝑜𝑝𝑡 𝑄𝑁𝑊 = 𝑅 𝑜𝑝𝑡 1 = 2𝜋𝑓 𝑅 𝑐 𝑜𝑝𝑡 𝐶𝑡𝑜𝑡 . (2. 9) The matching network efficiency for a two-element LC down-conversion can be calculated as 𝜂𝑚𝑎𝑡𝑐ℎ = 𝑄 1− 𝑁𝑊 𝑄𝑐𝑎𝑝 𝑄𝑁𝑊 1+ 𝑄𝑖𝑛𝑑 . (2. 10) Total SCPA efficiency is the product of the efficiency of the matching network and the efficiency of the SCPA assuming an ideal matching network. It can be calculated as 17 𝜂𝑡𝑜𝑡𝑎𝑙 = 𝜂𝑆𝐶𝑃𝐴 . 𝜂𝑚𝑎𝑡𝑐ℎ = 4𝑛2 4𝑛2 + 𝜋𝑛(𝑁−𝑛) 𝑄𝑁𝑊 . 𝑄 1− 𝑁𝑊 𝑄𝑐𝑎𝑝 𝑄 1+ 𝑁𝑊 , (2. 11) 𝑄𝑖𝑛𝑑 where Qcap and Qind are the quality factors of the capacitor and inductor used in the twoelement matching network. The first step to designing an SCPA is to calculate the optimum termination impedance (Ropt) from the desired output power (Pout) using Equation 2.7. SCPA efficiency is shown in Figure 2. 9 in terms of QNW and percentage of unit cells toggling in the array for a reasonable Qcap ~50 using Equation 2. 10 and 2. 11. It is evident that the efficiency of SCPA has a convex optimum peaking for QNW between 2~4. Therefore, it is reasonable to choose a QNW ~ 3 as a starting point before final global optimization is performed using detailed model parameters. Once QNW has been defined, Equation 2. 9 is used to calculate total capacitance (Ctot) in the array. The total capacitance can be divided into different ratio-metric unit capacitors based on the switched-capacitor array architecture provided in Section 2.4.2. The design procedure for the SCPA has also been explained in Chapter 4 (Section 4.2 and Section 4.4.1). 2.4.2 Existing C-DAC Architectures Emerging modes of operation such as eLAA has stringent mask requirement due to higher transmitted power and necessity the of coexistence of multiple operations in the tightly regulated band. For instance, unlicensed 5-6 GHz band can be used for both WiFi and cellular. Therefore, it is desired to support higher resolution in order to reduce the quantization noise. Moreover, the half-duplex operation as well as coexistence of multiple radios on the same platform requires dynamic digital power control (DPC) of 60 dB or 18 more. Moreover, the transmitter requires good phase and amplitude linearity in order to meet stringent error vector magnitude (EVM) specifications. EVM is the measure of mean polar error (amplitude and phase combined) for each data point from desired point in the constellation. It is also desired that higher resolution and linearity are achieved without hampering the efficiency. In summary, the transmitter needs to support approximately 16-bit of resolution with good linearity but without having significant impact on efficiency. This section presents the existing C-DAC architectures. 2.4.2.1 Full Unary Sized C-DAC Array Most common approach to achieve higher resolution using switched-capacitor circuit is to use all unary segmented capacitors driven by unit inverters. The unary CDAC architecture uses the same capacitors and devices for all the unit cells, and hence provides better matching. For instance, a 16-bit RF C-DAC would have 216 (= 65536) unit capacitors as shown in Figure 2. 10. Theoretically, it should provide better linearity, but it has some serious bottlenecks from a practical perspective. Firstly, the total capacitor, which is the sum of all unit capacitors, depends on the network quality factor and output power. Therefore, having a gigantic number of unit cells would translate into a very small unit capacitor. This causes a linearity issue due to parasitic capacitance. Moreover, most processes only allow discrete values of capacitors due to the finite grid size for generating masks. It causes systematic errors due to mismatch in capacitance ratio. Secondly, the supply and routing along with the LO distribution would have a serious detrimental effect on the efficiency due to such a large number of unit cells. 19 2.4.2.2 Full Binary Sized C-DAC Array Another option is to use a binary-sized switched-capacitor C-DAC architecture as shown in Figure 2. 11. The DPA consists of a 16-bit switched-capacitor array. This architecture offers 16-bit resolution with good efficiency due to the much smaller number of unit cells. Contrary to full unary array, the total number of cells is only 16 instead of 65536. This architecture is practically implausible especially for a resolution above 8-bits [36]. This is due to the fact that even the state-of-the-art CMOS processes have a minimum capacitor for both of the metal-insulator-metal capacitor (MiM) or the verticalnatural capacitor (VN). For instance, a minimum capacitor of 0.5 fF would translate into capacitor of 16.4 pF in the most significant bit (MSB). It is not judicious to use a minimum capacitor as it suffers more PVT variations. In addition, clock jitter and mismatch set the minimum capacitor value, which will be discussed in subsequent sections in details. Therefore, this architecture either translates into impractically large capacitor or suffers from severe nonlinearity in terms of amplitude-amplitude (AM-AM) and amplitude-phase (AM-PM). 2.4.2.3 Mixed Binary-Unary C-DAC Array Another popular approach is to use a binary-unary combined switched-capacitor C-DAC architecture as shown in Figure 2. 12. The DPA consists of 16-bit switchedcapacitor array. The array is subdivided into 8-bits binary sized capacitor and 8-bits unary sized capacitor [14]. This architecture offers 16-bit resolution with good efficiency due to a much smaller number of unit cells. The total number of cells is only 264 instead of 65536 20 as in the case of fully unary array. This architecture is widely used for low-speed analog DACs. Even though this architecture is good for efficiency, it offers poor linearity performance. Firstly, because each binary capacitor scales by a factor of 2, so does the inverter. Often the process will not allow setting the sizes to the required precision due to the finite grid resolution in the physical layout. This phenomenon causes systematic error. Secondly, the minimum capacitor often translates into an infeasible capacitance value. For instance, a 100 fF unit capacitance in the unary cell would translate into 0.39 fF capacitance for the LSB bit of an 8-bit binary array. It would seriously impact linearity due to a systematic error as well as nodal parasitic. Thirdly, it poses a challenge for matching from layout point of view as different unit cells are used for each bit in the binary array. Lastly and most importantly, the admittance seen by each unit PA is different due to the difference in size of the capacitance. It will impact the linearity, especially the phase performance. 2.4.2.4 Combined C-2C and Unary C-DAC Array A better solution is to replace the binary sized capacitor array with a C-2C array as shown in Figure 2. 13. All C-2C capacitors combined is equivalent to a single unit cell in the unary array. The unit capacitor in the C-2C segment should be equal to 1.5 times (C’=1.5C) of the unit capacitor in the unary array in order to fulfill the above relation. C2C segmented array uses the same number of unit cells as in the binary segmented array; hence, it preserves the efficiency benefit. Moreover, it uses the same unit cell throughout the C-2C array; hence, it provides better matching in terms of layout complexity and 21 parasitic capacitance. As a result, it provides better linearity performance than the binary segmented array. However, the main bottleneck of this architecture is the nodal parasitic associated with the series capacitor. An N-bit C-2C segment has N number of series capacitor and N number of metals to substrate bottom plate capacitance. These parasitic capacitances have a nonlinear effect on both amplitude and phase affecting the EVM ultimately. Figure 2. 14 shows an N-bit C-2C array where unit capacitor is defined as C’, and nodal parasitic capacitor is defined as Cp. The normalized output voltage, Vout for a digital code corresponding to bit n can be expressed as 𝑉𝑜𝑢𝑡 = ∑𝑁 𝑛=0 3𝐶′ +𝐶𝑅,𝑛+1+ 𝐶𝑝 𝐶′+𝐶𝑅,𝑛+1+𝐶𝑝 𝑏𝑛 3𝐶′ +𝐶𝑅,𝑛+2+ 𝐶𝑝 𝐶′+𝐶𝑅,𝑛+2+ 𝐶𝑝 ……… 3𝐶′ +𝐶𝑅,𝑁 𝐶𝑝 , (2. 12) 𝐶′+𝐶𝑅,𝑁+ 𝐶𝑝 where b0, b1, ….bn are the Boolean values depending on the code-word, and CR,n is the equivalent capacitance looking from bit n to the output, which can be derived by the mathematical induction method 𝐶𝑅,𝑁 = 𝐶 ′ . (2. 13) 𝐶𝑅,𝑁−1 = [𝐶𝑅,𝑁 + (𝐶 ′ + 𝐶𝑝 )] \|\| 2𝐶 ′ . (2. 14) 𝐶𝑅,𝑛+1 = [𝐶𝑅,𝑛+2 + (𝐶 ′ + 𝐶𝑝 )] \|\| 2𝐶 ′ . (2. 15) 𝐶𝑅,𝑛 = [𝐶𝑅,𝑛+1 + (𝐶 ′ + 𝐶𝑝 )] \|\| 2𝐶 ′ . (2. 16) A 10-bit C-2C array is simulated using an ideal capacitor and adding external nodal parasitic capacitors (6.5%) to compare the effect of parasitic capacitance on linearity. The simulation result is compared to the theoretical values using Equations 2. 12 to 2. 16 and is shown in Figure 2. 15. It is evident that the theoretical value is in close conformity of the simulation results. Theoretical calculations can be performed in a 22 similar process to illustrate the effect of nodal parasitic on the INL for different resolution of C-2C array. Figure 2. 16 shows the theoretical INL versus code-word for a 13-bit C2C array for different resolutions. The discontinuity in the INL occurs for every bit position in a multiple of 2n. It is worthwhile mentioning that a 10% parasitic capacitance (compared to unit capacitance) translates into peak INL~125 LSB for a 13-bit C-2C array. An INL of 128 LSB for a 13-bit C-DAC translates into ENOB ~ 6-bits. This INL only considers the ideal nodal parasitic. INL is further aggravated if other nonlinearities, such as supply inductance, parasitic due to routing and layout traces and capacitive mismatches are taken into account. Figure 2. 17 shows the peak INL versus Cp for different resolutions. It is evident that C-2C above 6-bit is not judicious considering the INL that translates into effective number of bits (ENOB) for a reasonable parasitic capacitance of 6~8%. Amplitude nonlinearity is relatively easier to linearize using digital predistortion (DPD). Parasitic capacitance has a more detrimental effect on the phase performance. Figure 2. 18 shows the phase versus code-word simulated using a 16-bit switchedcapacitor array with an 8-bit C-2C segmented and an 8-bit Unary segmented array. It is worthwhile to mention that DPD can linearize any integrated nonlinearity and phase if the DPD rate is run at RF local oscillator (LO) rate. However, the DPD power consumption depends on the clock rate of the DPD. Therefore, it is useful to design the PA with higher resolution so that it has extra states of resolution that can be fixed using digital predistortion. Good linearity has the added benefit of running it at a lower clock rate; hence, it improves efficiency, especially at back-off. C-2C being the LSB bits contributes to transmitted power equivalent to just a unit unary cell out of 256 unary cells. 23 Therefore, linearity in the C-2C is more important from an EVM perspective rather than a spectral mask regulation point of view. The AM-PM shows discontinuity/ nonmonotonicity due to admittance mismatch similar to AM-AM. This effect is popularly known as dinosaur effect. It causes the EVM to degrade. The phase response being nonrepetitive is difficult to predistort using DPD. 2.4.3 Effect of Capacitive Mismatch on C-DAC This section discusses the trade-off between different existing architectures in terms of matching between the capacitors and resolution. As discussed in Section 2.4, the C-2C architectures limit the resolution due to nodal parasitic, but it has the added benefit of using C (unit capacitor) and 2C (multiple of unit capacitor) throughout the array. On the contrary, high resolution binary and unary cell translates into a very low unit capacitor in the array. This section explores the effect of minimum capacitance on the resolution of the array. Figure 2. 19 shows the effect of RF clock jitter on signal-to-noise ratio (SNR) and ENOB. Extremely low capacitance (sub-fF) would cause the jitter to increase, eventually aggravating SNR and ENOB. Therefore, it limits the minimum capacitor that can be used, and hence effectively limiting the maximum resolution that can be achieved. In addition, the matching between the capacitors dictates the maximum resolution that can be achieved due to integrated nonlinearity (INL). INL depends on both the final DAC resolution, B, and element matching, σE, which can be expressed as follows: 1 𝜎𝐼𝑁𝐿 = 2 √2𝐵 − 1 𝜎𝐸 , (2. 17) where B is the total number of bits, and σE represents the standard variation of the 24 capacitor mismatch. The mismatch distribution also follows approximately the wellknown area-scaling rule, which is given by following Equation 2. 18 [25]: 𝜎𝐸 ≈ 𝐾 √𝐴𝑟𝑒𝑎 , (2. 18) where K is a process parameter. The mismatch analysis is done using 65 nm CMOS instead of 16 nm CMOS due to the restricted access of process parameters in 16 nm CMOS process. For the 65nm CMOS process, the value K is 0.86. Choosing minimum sized capacitors (e.g., 4μm2), the mismatch is 𝜎𝐸 =0.43%. As shown in Figure 2. 20 and Figure 2. 21, choosing the minimum capacitor size in the process and an INL < ±1 LSB results in maximum resolution of ~19-bit. In practice, choosing a minimum size capacitor would result in a high inductor value required for resonance, degrading energy efficiency. Hence, the effect of mismatch between capacitors array can be reduced by increasing the areas of capacitors, while also reducing the loss in the resonant circuit. 2.4.4 Proposed C-DAC Architecture It is evident from the discussion in Section 2.4.2 that neither of the architectures are capable of achieving high resolution (~16 bits) with good efficiency, especially at high frequency (5-6 GHz). In order to improve linearity in a high resolution in the SCPA, a novel switched-capacitor architecture is being proposed in this section. This architecture uses only unary cells. Therefore, it provides better linearity from matching and parasitic point of view because the same unit cell is being used throughout the whole array. In addition, the parasitic effect and PVT variation are mitigated as all the bits suffer the same nonlinearity. However, unlike fully unary array, this topology uses segmentation in order 25 to reduce the number of unit cells in the array for a given resolution. Fewer number of cells improves efficiency and saves area. Figure 2. 22 shows the proposed unary segmented RF C-DAC architecture. The number of segmentation (N+1) and the number of bits for each of the sections are complemetely generic and usually are driven by the efficiency and linearity requirement. For instance, increasing the MSB bit would provide better linearity but at the cost of efficiency and area. However, the best performance is achieved when the array is segmented equally as shown in Figure 2. 23. Therefore, a 16-bit array should be segmented as 8b-8b if segmented once, 4b-4b-8b if segmented twice, and 8b-4b-2b-1b if segmented thrice for best performance. It is worthwhile to note that any kind of segmentation would have nodal parasitic effect. But unlike C-2C array, unary segmentation has nodal parasitic, which is dependent on the segmentation number rather than for each bit. Therefore, the nonlinearity in the phase and amplitude is mitigated considerably. Figure 2. 24 shows a generic unary segmented array architecture. The series parasitic for each segment still causes phase variation due to admittance mismatch for each segment. The effect of parasitic capacitance can be explained from the equations of an N-bit unary SCPA as follows: 𝐶𝑡𝑜𝑡1 = (2(𝑁−𝑡) − 1)𝐶 + 𝐶𝑝 . 2𝑡 𝐶 (2. 19) 𝐶𝑡𝑜𝑡2 = (2𝑡 𝐶 + 𝐶𝑝 ) \|\| 2𝑡−1. (2.20) 𝐶𝑡𝑜𝑡 = 𝐶𝑡𝑜𝑡1 + 𝐶𝑡𝑜𝑡2 . (2.21) 𝑉𝑜𝑢𝑡 = 𝐶𝑡𝑜𝑡2 𝐶𝑡𝑜𝑡 𝑁 ∑2𝑛=0 𝑢𝑛 𝑥 𝐶 2𝑡 𝐶+𝐶𝑝 , (2.22) where u is the boolean unary value corresponding to binary codeword, C is the unit 26 capacitance, Cp is the parasitic capacitance, Ctot1 is the equivalent capacitance in the MSB, Ctot2 equivalent capacitance in the LSB from the MSB side, and Ctot is the total equivalent capacitance in the array. However, the nodal parasitic adversely affects the linearity only during switchover to different segments. Therefore, it shows superior linearity performance. Moreover, the capacitance seen by each segment is equal; hence, the phase is smoother within each segment. Figure 2. 25 and Figure 2. 26 show the amplitude linearity in terms of INL and DNL, respectively, in a 16 nm CMOS at different code-word for a 16-bit unary segmented SCPA. The segmentation is done as 4bit-4bit-8bit. However, the INL and DNL simulation is done using schematic only; hence, it does not account for the nonlinearities arising from layout mismatch, parastics from traces and routing. Moreover, the matching is much better in fine line CMOS processes (i.e., 16 nm). It is worthwhile mentioning that the INL and DNL do not suffer from mismatch due to same unit cell throughout the array. Additionally, this architecture is less succeptible to nodal parasitics. Figure 2. 27 shows the AM-PM glitches due to switch-over among the segments. This is primarily for two reasons. Firstly, the equivalent capacitance seen by the segment0 is different than that of segment-1 and segment-2, as shown in Figure 2. 28. This effect is commonly known as dinosaur effect. Secondly, the ESR and trace resistance contributes to RC delay between each segments. Even though the phase response is not critical for spectral mask regulation due to very low output power, it can be the decisive factor for the EVM performance. It is important to note that available resolution is more important here than the effective number of bits (ENOB). Even if the SCPA is perfectly linear at a certain 27 condition, it will suffer nonlinearity at other temperatures and process corners. Therefore, transmitters most often require DPD. The DPD can be used to linearize the system as long as the system has extra states for correction, which is defined by the resolution. It is worthwhile to mention that even though DPD can linearize the system, the power consumption of the DPD depends on the nonlinearity of the system. For instance, a highly nonlinear system requires the DPD algorithm to run at a much higher rate, eventually causing higher power consumption in the DPD itself. Figure 2. 29 shows the minimum DPD rate to achieve good spectral performance. Unary segmented SCPA enables the DPD to be performed in a piecewise linear pattern, which enables the DPD to run at 408 times lower rate than the LO rate, eventually saving DPD power 408 times. The amplitude and phase data can be used to model any PA in order to visualize the spectral performance in presence of an actual modulated data packet. An OFDM signal with a bandwidth of 20MHz is applied to a model, which is built using the AMAM and AM-PM data shown in Figure 2. 25 to 2. 27 and Figure 2. 29. The spectral performance is illustrated in Figure 2. 30. The polar-fix point curve shows the noise floor with an ideal PA. The polar-NL curve shows the spectral performance without the DPD. The Polar-DPD curve shows the noise floor with the DPD algorithm (Figure 2. 29) applied to the PA. The SCPA achieves satisfactory performance with the DPD. However, it is still desired to improve the phase performance of the unary segmented SCPA in order to reduce the DPD complexity. The primary reason of phase discontinuity in any segmented switched-capacitor array is the admittance mismatch, as illustrated in Figure 2. 28. One way to fix this issue is to use a tunable phase correction delay buffer for each segment. C-2C segments require one delay buffer for each C-2C bit; hence, it possess 28 design challenges and calibration complexity. Whereas, unary segmented array requires one delay for each segment. For instance, the proposed 16-bit SCPA has been segmented as 4bit-4bit-8bit. Therefore, it requires only three delay buffers to compensate for the phase. 2.4.5 Phase Linearization in a Unary-Segmented Array Any kind of segmentation would cause phase mismatch in the SCPA due to impedance mismatch from the PA side. On the contrary, full unary or full binary capacitor array is either impractical or offers poor linearity and efficiency. Therefore, an elegant solution is to introduce tunable delay logic in the LO path in order to compensate for the delay mismatch due to admittance variation. However, (N-1) delay is sufficient for a Nsegment unary-sized capacitor array as long as the delays are properly defined and precisely controlled across process, voltage, and temperature. But as the delays are in the critical RF path, it is judicious to rely on the relative delay rather than the absolute delay. Therefore, three delays are used, each one for LSB, sub-MSB, and MSB phase path in a 4bit-4bit-8bit segmented array, in order to eliminate any affect of initial delay between each segments. The delays are placed prior to the LO driver to reduce the additional power consumption in the delays. Figure 2. 31 shows the 4-bit tunable delay logic where delays can be adjusted with approximately 0.50 degree step. A binary to thermo-metric decoder converts the binary bits to unary bits, which eventually controls the PMOS and NMOS gates through ‘NAND’ and ‘NOR’ gates. Figure 2. 32 shows the pulse edge with different codes and Figure 2. 33 shows the relative delay in the RF clock (LO) for different codes at different RF frequencies. The delay shows linear behaviour; hence, delay tuning can 29 compensate for the phase mismatch if tuned for different segments in the C-DAC. The phase compensation network only consumes 1mW. Therefore, it does not affect the global efficiency. 2.5 Compact and Symmetrical Power Combiner Structures The requirement of higher order power combining is driven by higher output power specification. For instance, the 16nm process offers maximum supply voltage of 1.05 volt. Therefore, the required optimum termination resistance for the output power of 32 dBm can be calculated using Equation 2. 7 as 𝑅𝑜𝑝𝑡 = 2 𝑉𝑜𝑢𝑡 𝑃𝑜𝑢𝑡 . (2. 23) Optimum impedance lower than 1𝛺 is extremely problematic for many reasons [37], [38]. Firstly, the size of the devices in the PA cell would be absurdly large. The devices would carry high current through the channel. Design rules usually enforce lower metals to be used inside the devices due to minimum metal width and minimum spacing requirements. This would cause higher conduction loss (IR), hence aggravating the overall efficiency. Secondly, the heat dissipated in each PA would cause reliability issues. In addition, the supply and routing layout would be very challenging due to higher IR loss. Thirdly, the matching network would result in a very high inductance even at 5-6 GHz. Therefore, the overall efficiency would be significantly impacted by the lower quality factor of the matching network. On the contrary, higher order power combining reduces conduction loss in the PA but increases insertion loss in the combiner itself. Twoway power combining would require each PA to deliver 29 dBm, which would still cause significant efficiency degradation. Therefore, four-way and eight-way combining are two 30 suitable solutions. Four-way and eight-way combining would increase Ropt only by 4times and 8-times, respectively, but would also increase the insertion loss. Therefore, the choice between higher order power combining is a trade-off between conduction loss (IR drop) in the PA and supply, insertion loss in the combiner itself, and the overall die area. Figure 2. 34 shows possible solutions. 2.5.1 Four-Way Power Combiner Structure 2.5.1.1 Existing Solutions and Their Problems There are typically two solutions for power combining: distributed transformer [39] or compact transformer [40], [41]. The concept of distributed transformer lies of multiple (N) transformer using slab inductors with 1:1 turn ratio. However, this approach is area inefficient, sensitive to other circuitry placed nearby (i.e., metal, bumps), and offers relatively lower quality factor (Q) than compact transformers, especially in fineline CMOS processes. On the contrary, compact transformer generally have lower insertion loss (IL) and offer better linearity and efficiency. Therefore, only compact transformers have been considered in this work. Power combining can be done by driving the primaries with separate SCPAs. Each unit cell of SCPA consists of switched capacitors driven by an inverter as shown in Figure 2. 35. The simplest power combiner would be a full turn primary and a 1:1 secondary. However, there are some linearity and efficiency problem associated with this architecture. Firstly, the secondary would requirement Vias due to the crossing in the middle, which will degrade the quality factor of the combiner significantly at high power. Secondly, the SCPA will not offer any efficiency improvement at 6 dB back-off (BO). 31 OFDM modulation has an average power approximately at 6 dB BO from the peak power; hence, the efficiency at 6 dB BO will dictate the average efficiency of the PA. Figure 2. 35 (top-middle) and Figure 2. 35 (top-right) show the conceptual circuit at peak-power and at 6 dB BO, respectively. It is evident from Figure 2. 35 (top-right) that the PA does not take advantage of load modulation (Ropt) by exercising the full primary [42]. This eventually degrades the efficiency at 6 dB BO. Moreover, core-0 (0+ & 0-) and core-1 (1+ & 1-) will have a different coupling coefficient due to the physical asymmetry of primary to secondary. Therefore, it will suffer nonlinearity at back-off. A modified two-way combiner (Figure 2. 36) enables efficiency enhancement at 6 dB BO by exercising the full primary at any transmitted power. Therefore, the combiner presents same Ropt to the PA as in case of full power. Therefore, theoretically the PA achieves peak efficiency both at peak power and 6 dB BO improving the average efficiency [42]. Although modified two-way power combining improves efficiency by load modulation (Doherty action), the combiner itself is very lossy [43]. The main reason for the insertion loss is lower quality factor caused by multiple crossings. Moreover, higher insertion loss reduces the output power, degrades the efficiency. Figure 2. 37 shows the quality factor and the insertion loss (1.2 dB) for the two-way power combiner in 16 nm CMOS. The combiner has been simulated using widely used EM simulator ADS Momentum and verified using EM simulator Helic for consistent performance. 2.5.1.2 Pathway to Four-Way Combiner A modified two-way combiner paves the way to a symmetrical higher order combiner structure. A two-way combiner offers lower efficiency due to lower Ropt to 32 achieve desired output power of 32 dBm. Higher insertion loss also has significant negative impact on the efficiency. On the contrary, higher order combining increases Ropt, and hence improves efficiency in the PA at the cost of increased die-area. Higher order combining often increases the insertion loss due to finite coupling co-efficient and quality factor. Therefore, the global efficiency, which is a multiplication of of PA efficiency and combiner efficiency, determines the order of the combiner [41]. Figure 2. 38 shows the four-way combiner architecture. It also shows the case for full power, 6 dB BO and 12 dB BO. The full primary is exercised at all cases; hence, it does not suffer efficiency degradation. Figure 2. 39 shows the implementation of the proposed four-way combiner. The layout has been simulated using Momentum and has been verified by EM simulators ADS and Helic. A four-way combiner shows two times improvement in terms of quality factor due to absence of any ‘Via’ in the combiner. The insertion loss at midband is 0.72 dB, which corresponds to >85% efficiency. On the contrary, a matching network instead of a combiner would have a very large transformation ratio of 50:0.25 Ω , which eventually would lead to poor efficiency in the matching network . Moreover, the size of the devices would be absurdly large, and hence will suffer huge IR loss resulting in a poor system efficiency. 2.5.2 Eight-Way Power Combiner Structure The proposed four-way combiner is fully scalable and extendable to higher order combining with minimum area penalty. For instance, the proposed eight-way combiner shown in Figure 2. 40 shows similar performance with an area overhead of just 30%. The eight-way power combining has an insertion loss of 1 dB at midband, which corresponds 33 to 80% efficiency in the matching network. Higher order combining provides a pathway to higher output power using CMOS for future modes of operation requiring 35 dBm output power or even more. 2.6 Switching Schemes in Higher Order Power Combiners 2.6.1 Four-Way Combiner The issue with the combiner structure shown in Figure 2. 38 is that the differential PA cores (i.e., 0+ & 0-) are not placed together. It increases layout complexity and creates matching problems. It also causes linearity issue due to different coupling coefficient due to physical asymmetry. Moreover, it aggravates the performance at the second harmonic due to differential mode inductance at the virtual ground mode (shown in Figure 2. 38 for 6 dB and 12 dB mode). These two factors cause significant nonlinearity, especially at higher resolution such as 16-bit that the DPA is being designed for. On the contrary, this architecture fully exercises the resolution of the combiner. For instance, the combiner shown in Figure 2. 39 is a 2-bit combiner because each of the four cores can be controlled individually as all unit cells in the SCPA are fully “ON” or fully “OFF.” The switching scheme is explained in details in Figure 2. 40 (top). The proposed switching scheme shown in Figure 2. 40 (bottom) offers much better linearity because the combiner is symmetric from the left and right side of output port. Therefore, symmetric pairs (core-1 + core-3 and core-0 and core-2) have equal coupling coefficient; hence, they deliver equal power at any output power level. This improves linearity significantly. Moreover, the differential pairs are grouped together, which is very convenient for matching from the layout point of view. It also helps cancel 34 out any differential inductance through the traces of supply and routing. The drawback of this scheme is that only two cores out of four cores can be controlled independently. For instance, the number of unit cells that are switching at the LO rate in core-1 and core-3 are always same. The same analogy is applicable for core-0 and core-2. This behaviour implies that the proposed combiner is a 1-bit combiner. The superior linearity comes at the expense of a reduced resolution by 1-bit. But the resolution can be compensated by the technique described in Section 2.4.4. Moreover, the DPA offers slightly lower efficiency (1-2%) at 3 dB BO compared to typical switching scheme of higher power consumption in the LO distribution network, which will be discussed in the next section. However, it has negligible impact on the average efficiency as most spectrally efficient modulation such as OFDM has a peak to average power ratio (PAPR) of approximately 6 dB. 2.6.2 Eight-Way Combiner The switching scheme determines the overall linearity and efficiency, especially at back-off [50]. Figure 2. 41 shows the current methods of switching and also the proposed method of switching. The benefit of proposed switching is that the combiner is exactly same left and right to the axis of symmetry (center line) at all power levels. In addition, all the primaries are exercised at all power levels to take advantage of the efficiency enhancement by Doherty-action. As a result, the combiner nonlinearity effect is minimized. It is crucial in order to reach higher resolution such as 16 bits and beyond. However, the proposed switching has an efficiency penalty. Figure 2. 42 shows the overall efficiency performance for solution #1 and solution #2. The efficiency is 35 exactly the same at critical points such as 6 dB BO and 12 dB BO because the switching is exactly same at these points. However, the efficiency is reduced by 1% at around 3 dB BO due to the switching method, as shown in Figure 2. 43. But it improves the linearity significantly as long as the left and right side from the axis of symmetry is identical. In the conventional switching scheme, the DPAs that are turned off are connected to a virtual ground. There would be no nonidealities if these were perfect ground. But in reality, the differential inductance would cause nonlinearity. The inductance is introduced from the metal traces and routing in the array itself. 2.7 Physical Design and Extracted Simulation Results This section presents the layout details of the 32 dBm 5-6 GHz 16-bit SCPA designed in 16 nm CMOS. One obvious advantage of unary segmented array is that the same unit cell can be used throughout the whole array. Therefore, the optimization can be done in size and performance on the unit cell. Figure 2. 44 shows the layout of the unit cell. A metal capacitor is used due to its higher packing density and higher quality factor at the sacrifice of higher parasitic capacitance. The quality factor of the capacitor is crucial for overall efficiency as the PA is designed for higher output power. There are two choices for the devices in the PA cell: RF devices or standard digital devices. Standard digital devices have more compact layout because the PMOS and NMOS devices can be placed together. On the contrary, RF devices enables the differential pair to be placed together; hence, it has better rejection to second order effects. There are four switched-capacitor arrays in the proposed SCPA. The size of the array dictates power consumption in the LO distribution, and hence overall efficiency. Therefore, standard 36 digital devices are chosen owing to smaller array size. Figure 2. 45 shows the differential unit cell. In the proposed SCPA, the array is segmented as LSB, sub-LSB, and MSB using 4bit-4bit-7bit. The MSB cell consists of 128-unit cells. The unit cells are grouped together as quadruples (group of four unit cells) owing to the convenience of LO distribution. It also provides better scalability and debugging capability. Figure 2. 46 shows the quadcell. The full array is subdivided into 4bit-4bit-7bit unit cells as LSB, sub-LSB, and MSB cells. Therefore, LSB segment contains 16-unit cells, sub-LSB segment contains 16-unit cells, while MSB contains 32-unit cells. As the unit cells are grouped as quadruples, LSB and sub-LSB combined contains 8 quadruples. Hence, the segmentation only contains 25% area of MSB array. On the contrary, the full area would be 512 times larger without segmentation. Therefore, unary segmentation reduces the area overhead by approximately 2048 times, considering four arrays in the DPA. Figure 2. 47 shows one array of the proposed SCPA. Figure 2. 48 shows the overall floorplan that includes the four-way combiner along with four identical unary segmented cores. Two row decoders and two column decoders are used to provide the enable signal and LO, respectively. The row and column decoders not only provide a filtered LO signal but also provides binary to unary conversion. The decoder switching scheme is designed in such a way that the adjacent cells turn on sequentially if provided a linear input, resulting in a “snake-like” switching pattern. It improves the phase linearity significantly as the effect of the RC delay in the routing traces is minimized significantly. 37 The top-level test-bench has been simulated using analog design environment (ADE and ADEXL). The main benefit is that the architecture is fully scalable to any process and voltage simply by optimizing the size of the devices and unit capacitor of the unit cell. Another benefit is that it is a fully digital implementation. It takes the digital inputs and clock as input and provides a modulated RF output. Figure 2. 49 provides the output versus efficiency for two different modes for the eLAA application using schematics only. The ripples in the efficiency is due to the LO distribution power consumption in the LSB and sub-LSB. The LSB and sub-LSB combined contribute to an output power equivalent to one-unit cell out of 128 cells in the MSB. But the LSB and sub-LSB consist of 32-unit cells. Therefore, the power consumption in the digital logic is significantly higher than the RF output power it contributes, which causes the efficiency to reduce during the codes where LSB and sub-LSB are operating. The top level SCPA has been simulated with different level of extraction. The output power is above 30 dBm across the desired band of operation, which signifies that the combiner network has enough bandwidth with good quality factor. Moreover, the efficiency peaks at around the midband, which enables good efficiency across the whole frequency range. Figure 2. 50 shows the output power and efficiency. Table 2. 1 shows the output power and efficiency at full code and half code (6 dB BO) for different levels of extraction. Extraction simulation is performed in order to evaluate performance over the frequency range of 5-6 GHz. Figure 2. 51 shows the frequency versus efficiency and Pout. Apart from frequency response, output power, and efficiency, the SCPA is simulated for linearity. The phase compensation circuit discussed in Section 2.4.5 has 38 been implemented to the SCPA, as shown in Figure 2. 52. The performance of the DPA is simulated with and without phase compensation as shown in Figure 2. 53. The phase response is shown in log scale and linear scale in order to illustrate the phase response both in lower and higher code-words. It is evident from the linear scale that the phase response is much more linear with compensation. Therefore, it improves the EVM and out-of-band (OOB) spectral mask even after running the DPD at a reduced lower clock rate. 2.8 Measurement and Troubleshooting The DPA has been taped out in 16 nm process as part of the whole transceiver chain. Flip-chip technique was used in order to reduce the bondwire inductance; later it was PCB mounted. Figure 2. 54 shows the system level block diagram along with the measurement set-up in Figure 2. 55. The whole transceiver has four distinct voltage levels. In the phase path, the LDO is supplied with a voltage of 1.3 V. The LDO output (0.9 V) is supplied to the digital to time converter (DTC) that generates modulated RF clock at 0.9 V peak. The DTC output is filtered by a buffer chain in the LO distribution network (LOD), which provides clock for the row and column decoders. The last stage of the decoder is supplied with the PA supply of 1.05 V. In the amplitude path, both the LDOs to the digital code generator (DFE) and binary to thermometric decoder digital logic are supplied with 1.3 V. The last stage of the amplitude path is also supplied with PA supply of 1.05 V. Figure 2. 55 shows all the voltage supplies. There are four sense probes, which sense the voltage near the PA bumps. As the PA is designed for high power, a large voltage drop is expected between the DC source and Vsense probe. The whole set- 39 up is put inside a controlled temperature chamber (Thermotron) in order to help with the heat dissipation. Unfortunately, the measurement result shows much lower output power than extracted simulation. The peak output power achieved is 25 dBm with a supply voltage of 0.9 volt at DC source output. There is approximately 200 mV drop from DC source to the Vsense probe. The output is shown in Figure 2. 56. The output is measured single ended, and the RF cable has a loss of approximately 4.5 dB. Therefore, the peak output power with an effective supply of 0.7 V is 25 dB (17.5 + 3+ 4.5 dB). The PA is very sensitive to supply voltage due to multiple voltage domains. Moreover, ultra-low threshold voltage devices (ulvt) were used in the decoder, which has very low threshold voltage. Therefore, the PA output was starting to collapse as the PA supply was increased to peak supply voltage of 1.05 V. The OOB noise gives an indication of the problem related to the voltage domains. Figure 2. 57 shows the degradation of the noise just when the PA supply from the DC source is increased beyond 0.9 V. This issue can be fixed with replacing the ulvt devices with standard threshold voltage devices (svt). The reduced output power is due to IR drop. The PA has a DC current of approximately 4A. Hence, 100 mΩ of resistance in the supply can significantly degrade the performance. The measured result is compared with the simulated result to estimate the IR loss. Figure 2. 58 shows the measured and simulated DC current versus code at different supply voltages. It is estimated that the PA supply is 0.67 V for peak power of 25 dBm. Voltus is a widely used IR tracing and reliability analysis tool embedded with cadence environment. Voltus was used to identify the IR losses through the SCPA. Voltus shows a big discrepancy with the Spectre RC extraction. Voltus simulation shows an 40 approximate IR drop between 300 mV to 350 mV at the supply network as shown in Figure 2. 59. Voltus can be effectively used to improve the supply network for lower IR drop and hence increase output power by reducing conduction losses through the metal routing traces. 2.9 Future Work and Summary A 16-bit four-way combining unary segmented switched-capacitor power amplifier for unlicensed band 5-6 GHz has been proposed. The proposed SCPA achieves good linearity with a peak efficiency of around 43% at a maximum supply voltage of 1.05 volt. This proposal also presents a four-way and eight-way combiner to achieve output power of approximately 2W or more using conventional CMOS devices and nominal supply voltage in state-of-the-art CMOS processes. The switching scheme allows for achieving high linearity in the DPA. Although the proposed transmitter meets all the specifications for the desired eLAA protocol, there are plenty of opportunities for improvement. Firstly, codesign of the antenna along with the combiner can significantly reduce the size of the combiner, paving the way to achieve a higher quality factor and better efficiency. For instance, an electrically small antenna designed for lower resonant frequency can reduce the size of the combiners or can provide a path to higher output power using the same size. Alternatively, a nonresonant antenna enables to filter out undesired harmonic contents. Secondly, noise shaping can be implemented in the LSB bits to improve the performance. Thirdly, harmonic trap can be implanted at the output of the transmitter in order to achieve better harmonic suppression. Fourthly, a harmonic termination network can be 41 implemented at the output of the PA to sharpen the drain voltage, which may boost the efficiency due to lesser crowbar current. Lastly, multiphase technique can be implemented in order to improve the harmonic rejection technique. These techniques combined with the proposed architecture paves the way to a fully digital transmitter, that meets stringent regulatory spectral mask and EVM requirements. 42 Table 2. 1 Simulation result with extraction of different hierarchy. Simulation type Full Power 6 dB BO Pout = 32. 77 dB Pout = 26. 7 dB Schematic η = 47.87 % η = 36.89 % Unit cell (C extraction) Pout = 32.52 dB Pout = 26.46 dB η = 46.01 % η = 34.73 % Unit cell (RC extraction) Pout = 32.15 dB Pout = 26.01 dB η = 41.73 % η = 27.78 % Quadruple cell (RC extraction) Pout = 31.48 dB Pout = 25.26 dB η = 37.91 % η = 27.11 % Full array (RC extraction) Pout = 31.3 dB Pout = 25.1 dB η = 36.3 % η = 26.42 % Exabytes/month Mobile data traffic 45 40 35 30 25 20 15 10 5 0 41.4 29.3 20.5 14 5.9 4.2 2.5 2014 2015 2016 9.3 2017 2018 Year Figure 2. 1. Mobile data traffic per month by year. 2019 2020 2021 43 Mobile data traffic for different applications Exabytes/month Internet video Online gaming Web, email and data File sharing 140 120 100 80 60 40 20 0 2016 2017 2018 Year 2019 2020 Figure 2. 2. Mobile data traffic per month by year in different categories. Monthly mobile data traffic by geography Exabytes/month 2016 2017 2018 2019 2020 2021 80 70 60 50 40 30 20 10 0 Asia Pacific North America Western Central and Middle Europe Eastern East and Europe Africa Figure 2. 3. Mobile data traffic per month by geography. Latin America 44 Unlicensed spectrum availability Bandwidth (MHz) 250 200 150 100 50 0 2.4 GHz 5.1 GHz 5.3 GHz 5.4 GHz 5.8 GHz 5.9 GHz Frequency Figure 2. 4. Unlicensed spectrum availability at different frequency band. Figure 2. 5. Possible ways to improve quality-of-service (QoS) using unlicensed band. 45 Figure 2. 6. Conventional Class-D power amplifier. Figure 2. 7. Generic single-ended SCPA. 46 Figure 2. 8. Amplitude modulation in a single ended SCPA. 47 Figure 2. 9. Dependency of SCPA efficiency on loaded quality factor. 48 Figure 2. 10. Full unary C-DAC architecture. 49 Figure 2. 11. Full binary C-DAC architecture. 50 Figure 2. 12. Mixed binary-unary C-DAC array. 51 Figure 2. 13. Mixed C-2C-unary C-DAC array. 52 Figure 2. 14. N-bit C-2C array with nodal parasitic capacitance. Figure 2. 15. Comparison of theoretical and simulated INL for a 10-bit C-2C C-DAC. 53 Figure 2. 16. Theoretical INL versus codeword for different Cp in a 13-bit C-2C. Figure 2. 17. Theoretical peak INL versus Cp for different C-2C resolution. 54 Figure 2. 18. Nonperiodic AM-PM (Dinosaur Effect) for C-2C array at the LSB. Figure 2. 19. Effect of clock jitter on SNR and ENOB. 55 Figure 2. 20. C-DAC resolution versus capacitive mismatch. Figure 2. 21. C-DAC resolution versus the dimensions of capacitors. 56 Figure 2. 22. Concept of unary-segmented C-DAC architecture. 57 Figure 2. 23. Effect of bit position of segmentation on FoM. Figure 2. 24. A generic (N+1) times unary segmented t-bits C-DAC. 58 Figure 2. 25. INL of a 16-bit unary-segmented array in 16 nm (Schematic only). Figure 2. 26. DNL of a 16-bit unary-segmented array in 16 nm (Schematic only). 59 Figure 2. 27. AM-PM of a 16-bit unary-segmented array in 16 nm (Schematic only). Figure 2. 28. AM-PM glitches due to admittance mismatch in a segmented array. 60 Figure 2. 29. Reduced rate piecewise linear DPD for correction using DPD. Figure 2. 30. Spectrum of the proposed SCPA with the applied OFDM data packet. 61 Figure 2. 31. Schematic diagram of the implemented phase correction circuit. Figure 2. 32. RF clock rising edge after delay compensation for different codes. 62 Figure 2. 33. RF clock delay after delay compensation for different codes. Figure 2. 34. Possible choices of designing high power PA: (left) increasing devices’ size, (middle) two-way power combining, and (right) four-way power combining. 63 Figure 2. 35. Existing two-way power combiner structure and efficiency degradation at 6 dB (top-right). 64 Figure 2. 36. Modified two-way power combiner structure for efficiency improvement at 6 dB (top-right). 65 Figure 2. 37. Two-way combiner structure and performance: (top) modified two-way combiner structure, (bottom-left) quality factor of the combiner, and (bottom-right) insertion loss of the combiner. 66 Figure 2. 38. 12 dB BO. Four-way combiner structure and the cases for full power, 6 dB BO, and 67 Figure 2. 39. Four-way combiner structure and performance: (top) four-way combiner structure, (bottom-left) quality factor of the combiner, and (bottom-right) insertion loss of the combiner. 68 Figure 2. 40. Switching schemes: (top) conventional switching scheme and (bottom) modified switching scheme for linear operation. 69 Figure 2. 41. Eight-way combiner structure and performance: (top) eight-way combiner structure, (bottom-left) quality factor of the combiner, and (bottom-right) insertion loss of the combiner. 70 Figure 2. 42. Switching scheme in an eight-way power combiner. 71 Figure 2. 43. Efficiency comparison between the solutions in Figure 2. 41. Figure 2. 44. Implementation and physical design: (left) layout of the unit cell and (right) unit cell schematics. 72 Figure 2. 45. Layout and schematics of the differential unit cell. Figure 2. 46. Layout and schematics of the quadruples of differential unit cell. 73 Figure 2. 47. One C-DAC array of the proposed DPA. 74 Figure 2. 48. Layout of the proposed SCPA. 75 Figure 2. 49. Pout versus efficiency for WiFi and Cellular modes of operation. Figure 2. 50. Pout versus efficiency for with different level of extraction. 76 Figure 2. 51. Frequency versus Pout (dB) and efficiency (%) with extraction. 77 Figure 2. 52. Phase compensation integrated to the SCPA. Figure 2. 53 Phase versus code (AM-PM) with and without phase compensation. 78 Figure 2. 54. System level block diagram of measurement setup. Figure 2. 55. Measurement of the fabricated DPA, inside temperature chamber. 79 Figure 2. 56. Spectrum of the measured single ended output. Figure 2. 57. Degradation of the noise after the increasing PA supply beyond 0.9 V. 80 Figure 2. 58. Measured and simulated input current versus code at different VDD. Figure 2. 59. Voltus simulation for IR tracing. 81 CHAPTER 3 A HYBRID DUAL RATE (/ NYQUIST) SWITCHED-CAPACITOR POWER AMPLIFIER 3.1 Introduction Power amplifier (PA) is the dominant block in most modern wireless transceivers; hence, it continues to be a vibrant field of research to the RFIC designers. Switching digital power amplifiers (DPA) such as Class-E, Class-F, Class-D offer significantly better energy efficiency over the linear power amplifiers (Class A, B, C). DPA can operate either as a voltage mode [24], [44]–[46] or current mode [47], [48]. Current mode DPA works on the principle where each unit cell works as a constant current source. Even though both modes offer comparable efficiency, voltage mode PA is better adoptable to shrinking supply voltage in fine-line CMOS processes. Current mode switching DPA suffer from tighter voltage headroom because the threshold voltage of the CMOS devices has not shrunk proportionately to the maximum supply voltage. On the contrary, the devices in the voltage mode PA works as a digital switch. As the devices getting faster with the scaling, the voltage mode PAs continue to show superior performance compared to current mode PA. Among all switching PA topologies, the switched-capacitor power amplifier (SCPA) that employs Class-D topology at the core of its functionality shows superior 82 performance and better adaptability continued to CMOS scaling. SCPA is a high efficiency voltage mode PA where a unit PA cell works more like a digital cell than analog cell. SCPAs use capacitors, along with transistors acting as switches, to achieve their linearity. Therefore, it can take the maximum benefit from CMOS scaling. This allows the SCPA to offer higher linearity and efficiency at higher output power when compared to current-mode DPAs [24]. SCPAs work on the principle of a switchedcapacitor digital-to-analog converter (C-DAC), but it simultaneously works as a frequency up converter or mixer and power amplifier. Moreover, it can recombine the output at precise phase contrary to analog DAC. It is area and energy efficient compared to linear PAs. The architecture inherently provides bandpass filtering and hence does not require large area base-band filtering. One of the prime challenges with any DPA, including the SCPA, is that it is a quantized system, and its out-of-band (OOB) noise is dominated by signal quantization. Because the quantization noise is broadband, it greatly affects the OOB performance of DPAs. It can only be reduced with additional filtering, or with enhanced resolution [24], [25], [49]–[52]. Filters can be designed to create a notch at the particular frequency to reduce the OOB within a narrow window of frequency[51]. But a lower resolution SCPA will exhibit aggravated noise performance over a wider bandwidth, which cannot be compensated using a single narrowband filter. On the contrary, it is challenging to increase the resolution of an SCPA, owing to the desire to use metal-insulator-metal (MIM) capacitors that can only be divided down to their minimum value, limiting either the resolution of the SCPA, or the quality factor achievable. Hence, SCPAs have a tradeoff between achievable resolution and quality factor if using a single C-DAC array. 83 Split array technique can improve the resolution with good efficiency [25], [36]. It subdivides the switched-capacitor array into two or more segments with a series attenuation capacitor where the MSB capacitors are unary sized and the LSB arrays can be either binary, unary or C-2C ladder. This technique allows the above linearity/Q paradigm to be broken [53], [54]. C-2C and unary segmented arrays allow the differences in the minimum and maximum capacitance in the array to be minimized, enabling improvements in the capacitor matching and hence linearity; albeit C-2C arrays cannot be extended indefinitely because they can incur significant nonlinearity at high output resolutions, owing to the internodal parasitic in the C-2C topology [55]. Recent efforts have shown that it is possible to achieve a resolution of 16 bits even using a relatively older process such as 65 nm CMOS. In addition to the split array technique using C-2C capacitor in the LSB, hybrid C-DAC architecture in the LSB can further increase the effective resolution [56], [57]. In this chapter, we introduce a hybrid DSM/Nyquist SCPA that achieves an effective resolution of 9b, using only a 6b capacitor array. The hybrid DAC keeps the MSB (Nyquist rate) DAC unchanged, but it uses a delta-sigma modulated (DSM) DAC in parallel to the MSB [58], [59]. It allows an increase to the effective resolution using same process by some additional bits. This paper is organized as follows: In Section 3.2, the hybrid DSM/Nyquist SCPA architecture is introduced. In Section 3.3, circuit design details are given. Section 3.4 provides simulation results, followed by conclusions in Section 3.5. 84 3.2 Hybrid DSM/Nyquist SCPA Architecture The proposed architecture is a hybrid DAC where there is an oversampled  modulated (DSM) LSB array in parallel with a conventional MSB array running at Nyquist rate. The resulting architecture is a dual-rate hybrid SCPA (H-SCPA, Figure 3. 1). Although the output shown in Figure 3. 1 is single ended, the final implementation is differential. The output of the architecture consists of a high-speed dithered signal that is embedded on the low-speed, low-resolution Nyquist rate signal [56]. The DSM in the LSB path compresses the bit width of its input signal, reducing the overall number of switched-capacitor cells in the SCPA. This offers two key benefits: first, the impact of nonidealities in the switched capacitor array (e.g., capacitor mismatch) are reduced, allowing for reduced time in design cycles to mitigate the nonidealities. Second, the analog layout is more compact, since effectively a reduced number of cells is required to achieve a desired output resolution. It also simplifies the routing and clock distribution. The first step in designing an H-SCPA is to determine how to partition the segmentation of oversampled LSBs and Nyquist-rate MSBs. In a design, the segmentation ratio, r, can be defined by the following [56]: 𝑟= 2𝑁 −2𝑁−𝑀 2𝑁 −1 , (3. 1) where N is the total number of bits of the array, and M are the number of MSBs. The primary concern when segmenting is the acceptable bandwidth and linearity of the system. The bandwidth, BW, of the proposed system is given by the following [56]: 𝐵𝑊 ≈ 𝑓𝑜𝑠 𝜋 sin −1 − {10 3log2[2𝑁 −𝑟(2𝑁 −1)] 10𝑘 }, (3. 2) 85 where fos is the oversampling rate of the DSM, and k is the order of the noise shaper in the DSM. The other concern in segmentation is the achievable linearity of the hybrid system. For this purpose, Simulink models are used to compare segmented hybrid SCPAs to conventional SCPAs. The primary metric in choosing an overall resolution for SCPAs is the level of out-of-band (OOB) noise of the quantized signal. It has been demonstrated that an overall resolution of 9b leads to acceptable levels of OOB noise in digital PAs [60]. Hence, the aim of the proposed design is to achieve an effective number of bits (ENOB) ≥ 9b. In the proposed H-SCPA, a 12b design with a target bandwidth of 50 MHz that meets the bandwidth and linearity requirements for a 20 MHz, 64 QAM OFDM signal (e.g., Wi-Fi and LTE) is chosen. For validation purposes, a first-order DSM noise shaper is chosen (e.g., k=1), since it can be easily designed and synthesized in the chosen 65 nm technology using standard cells. Simulink and SPICE simulations demonstrate that a segmentation of N-M=8 (e.g., the LSB path) and M=4 (e.g., the MSB path) allow for the required BW and linearity requirements to be satisfied. As noted in prior work, segmenting the array and performing DSM on the LSB path results in reduced OOB noise when compared to purely DSM systems, owing to the reduced signal energy in the LSB DSM path [56]. It is worthwhile to mention that spectral regulation in wireless communication is more stringent for OOB noise than in-band noise. Delta-sigma modulation is fundamentally a noise-shaping technique where noise is pushed far away from the carrier signal; hence, it improves the effective number of bits within band of interest. As a consequence, it affects the OOB noise performance if the oversampling factor is not 86 exceedingly high. Therefore, DS modulated DAC is not a substitute for high resolution DAC, but it is a technique to enhance the effective resolution even further in conjunction, with the split-array technique discussed in details in Chapter 2 [25], [36]. 3.3 Hybrid SCPA Circuit Details This chapter focuses more on the fundamental concept of resolution enhancement using DSM in a conventional DAC than testing the highest available resolution in a particular process. Therefore, a relatively lower resolution SCPA is implemented to illustrate the concept. The H-SCPA is divided into a 4b unary weighted Nyquist rate MSB unary array and 2b binary array that is controlled by an 8b oversampled DSM, as shown in Figure 3. 2. An off-chip phase modulator is input to the H-SCPA as the RF clocking signal through a low-voltage differential signaling (LVDS) clock receiver and distributed through a differential clock driving buffer chain to all cells in the array. The re-timing circuit adjusts the delay through the unary/thermometric path in order to match the phase delay with the binary path. This results in smoother code versus phase response, eventually resulting in a better EVM and ENOB. The unary encoder provides binary to thermometric conversion for the amplitude logic. The bandpass matching network provides optimum impedance to the PA for the desired output power. The sizing of the capacitor and the matching network have been explained in details in Chapter 2. The SCPA output PA stage is a cascoded inverter, operating from a 2.4 V power supply [24], [44]. The SCPA is designed to achieve an output power of 250 mW, and it operates differentially, resulting in an equivalent Ropt=18.7 Ω (Equation 2.23) .The total capacitance of the array is chosen by the desired network quality factor (QNW2), and it is 2.3 pF. The capacitor array is divided into 15 unary capacitors of 144 fF and two binary 87 weighted capacitors. To compensate for saturation of the DSM, the first binary weighted capacitor is equivalent to the unary array unit capacitor (144 fF), and the second binary weighted capacitor is 72 fF. To balance the charge of the binary weighted cells, a dummy 72 fF capacitor is added in parallel with the binary segment, effectively resulting in equivalent binary capacitors of 36 fF and 72 fF in parallel with the unary array. The matching network transforms a 100 Ohm differential output impedance to the required optimal termination impedance using inductors L1=3 nH, L2=720 pH and a shunt capacitor, Csh= 32 fF. The A 12-bit digital input is distributed to a fully synthesized, segmented decoder where the 8 LSBs are interpolated before being input to the DSM. The interpolation is implemented using synthesized half-band filters. The block diagram of the first order, 2b DSM is shown Figure 3. 3. The DSM takes an 8-bit input and outputs the 2 DSM modulated bits that are input to the binary-weighted capacitors in the H-SCPA capacitor array. All the internal operations are 12b sign extended, to prevent improper overflows. The MSB bits are re-timed before being input to a thermometer encoder. Re-timing is achieved in synthesis to synchronize the MSB and LSB output bits before they are input to the SCPA capacitor array. 3.4 Extracted Simulation Results The proposed hybrid SCPA has been implemented in a 65 nm CMOS process with metal-insulator-metal capacitor (MiM) and ultra-thick metal layers. Figure 3. 4 shows the layout of the H-SCPA. The design is RC extracted with full parasitics and is simulated against frequency and code-word. Figure 3. 5 shows the full power and 6 dB 88 back-off (BO) performance using RC extraction. Both the efficiency and output power (full power and 6 dB BO) have a convex optimum at the same frequency (~2.8 GHz), which signifies an optimized design. The H-SCPA achieves a peak output power of 23.5 dBm at 2.8 GHz. The system efficiency (SE), which is a ratio of the output power to all DC and RF power input to the chip, is 48% at 2.8 GHz. The system has wide efficiency bandwidth, with > 30% SE from 2.2-3.6 GHz. Also shown are the output power and efficiency versus frequency when the H-SCPA is operated at -6 dB power back-off. The DS modulator is inherently a noise-shaping technique, which is evident from the spectra of the DSM shown in Figure 3. 6. It can be seen as the noise gets pushed away from the carrier, which eventually limits how much incremental resolution can be achieved from using this technique on top of other techniques such as split-array. To validate the linearity of the H-SCPA, the output voltage characteristic is simulated versus input digital codeword across the entire 12b input codeword range at 2.8 GHz. The simulated integrated nonlinearity (INL) and differential nonlinearity (DNL) are plotted as functions of the digital code in Figure 3. 7. The simulated INL is < ±1 LSB, and the simulated DNL is < ±0.5 LSB at a resolution of 9b. The hybrid approach yields an effective number of bits (ENOB) of 9b, which is 3b higher than the designed resolution of the array, validating the resolution enhancement provided by the DSM. The simulated SE is plotted as a function of output power in Figure 3. 8, and the simulated output power is plotted as a function of input codeword in Figure 3. 9. A 64-QAM, OFDM signal with 20 MHz channel bandwidth is provided as input into the H-SCPA. The simulated power spectral density and constellation of the signal are shown in Figure 3. 10 and Figure 3. 11, respectively. The spectrum shows good 89 margin to the spectral mask, while the constellation shows an EVM of 1.9 %-rms, with no digital predistortion applied to the signal. It is noted that noise shaping from the DSM does marginally increase the close-in noise in the power spectral density (PSD) estimate and could be mitigated if using more MSBs, or an advanced DSM with more output levels. The average transmit power is 17.4 dBm, with an average SE of 24.9%. 3.5 Conclusion This chapter presents a new hybrid-mode SCPA that segments a traditional SCPA into two segments. An LSB C-2C subarray is modulated using a Delta-Sigma modulator (DSM) connected in parallel with a Nyquist-rate unary MSB array. This technique enables a hybrid SCPA (H-SCPA) that can achieve a higher bandwidth than a traditional DSM based DAC, while achieving a higher resolution than a traditional Nyquist DAC. H-SCPA breaks the traditional tradeoffs in terms of bandwidth and resolution that are well known in both fully DSM based DAC and fully Nyquist rate DAC. The H-SCPA is designed with a resolution of 6b, and using the hybrid approach, realizes an output ENOB of > 9b. A prototype design is implemented in 65nm CMOS, and postextracted simulation results validate the performance of the H-SCPA. The H-SCPA achieves a peak output power of 23.5 dBm and a peak system efficiency of 48% at an operating frequency of 2.8 GHz. When amplifying a 20 MHz, 64 QAM, OFDM signal, the average output power and system efficiency at 2.8 GHz are 17.4 dBm and 24.9 %, respectively, while achieving an error vector magnitude (EVM) of 1.9 %-rms and meeting the spectral mask requirements. 90 Figure 3. 1. Block diagram of the proposed hybrid Nyquist/DSM SCPA. 91 Figure 3. 2. Schematic of the proposed H-SCPA, consisting of a 4b Nyquist rate unary sized MSB and a 2b binary sized LSB controlled by an 8b oversampled DSM. 92 Figure 3. 3. Block diagram of the first order Delta-Sigma modulator. Figure 3. 4. Chip layout of the experimental prototype H-SCPA in 65nm CMOS. 93 Figure 3. 5. Output power (dB) and system efficiency (%) versus frequency using RC extracted model of the H-SCPA. Figure 3. 6. Spectrum of the Delta-Sigma modulator. 94 Figure 3. 7. INL and DNL of the H-SCPA at 9b effective resolution. Figure 3. 8. System efficiency (including DC and RF all losses, pad drivers, input power) versus output power. 95 Figure 3. 9. Figure 3. 10. Output power (dB) versus code-word. Simulated PSD for a 20 MHz, 64 QAM, OFDM signal. 96 Figure 3. 11. Output power (dB) versus code-word. 97 CHAPTER 4 TUNABLE MULTIBAND DIGITAL POWER AMPLIFIER 4.1 Introduction Wireless data traffic is increasing at an unprecedented pace mostly due to faster processors, cloud and mobile computing, high capacity storage, and high performance graphics cards and unbridled pace of mobile data traffic due to social networks, mobile gaming, and a gamut of texting, streaming and sharing applications. In addition, "Faster data, for everyone, everywhere" has been the driving force for a wide range of emerging applications, popularly known as Internet-of-Things (IoT). Transmitters traditionally have been the most critical block in the whole transceiver chain. It is being the highest power consuming block that dictates the overall performance, especially in mobile devices having limited battery capacity. Therefore, it is an area of active research to the data throughput and to improve service offerings by efficiently using the limited and discrete available spectrum in combination with spectrally efficient modulation schemes. Spectrally efficient modulation schemes such as orthogonal frequency division multiplexing (OFDM) and quadrature amplitude modulation (QAM) increase data throughput but at the expense of higher peak-to-average power ratio (PAPR). High PAPR signifies that the transmitter must be able to transmit at the peak power even though it has higher probability to transmit at a much lower output on average. Consequently, the DPA 98 that is most efficient at peak power, offers reduced efficiency as the output power reduces (back-off) [61]. Therefore, the transmitter/digital power amplifier (DPA) needs to be designed to meet the peak power, but the average efficiency is mostly dictated by the efficiency at back-off (~ 6dB). Significant efforts have been made to reduce the PAPR of the OFDM modulation while maintaining the spectral efficiency [17], [18]. Lower PAPR signifies the transmitter can be designed efficiently while embedding equal amount of data into the transmitted signal. New algorithms and modulation schemes have also been reported in the literature, but OFDM still continues to be the only commercially viable modulation for high speed wireless cellular and WLAN data transmission. In addition, the DPA needs to meet a stringent noise and linearity requirement to comply with the error vector magnitude (EVM), adjacent channel leakage ratio (ACLR), out-of-band (OOB) spectral regulation [11], [16]. The situation is further aggravated by the fact that the limited available spectrum is not continuous due to diverse applications such as commercial, personal, unlicensed, and military; hence, the PA needs to support multiple narrow bandwidth channels over a wideband frequency range at good average efficiency and linearity [62]. There exist several conventional approaches to address this issue of fragmented frequency spectrum. The most common approach is to use a wideband PA; however, this often sacrifices efficiency in favor of a flat, broadband frequency response [60], [63]– [65]. An alternative approach is to use distributed amplifiers and balanced amplifiers [66]–[68]. Distributed amplifiers suffer from losses in termination resistors, larger die area, and voltage stress across the devices. The in-phase voltages are added along the output transmission line; hence, different stages are exposed to different voltage domains. 99 The last stage limits the output power because it enters into the saturation first. It also degrades overall efficiency as the PAs in different stages exhibit from different performance. Balanced amplifiers solve this issue, but requires 900 coupler both at inputs and outputs. Couplers usually have limited bandwidth. In addition, the insertion loss increases if designed for broadband response aggravating the overall efficiency. The average efficiency is further worsened coupled with the fact that the PA is supposed to operate at large PAPR. Another well-known approach is to use multiple narrowband DPA, each optimized for particular band of interest and combined through a switching/muxing mechanism to cover all the frequency channels over the whole bandwidth [26], [33], [69]. Although each PA offers better efficiency, it comes at the expense of increased cost and design complexity. This approach requires independent design and optimization of each PA for each specific frequency band. Moreover, each PA needs different calibration, digital predistortion and dynamic power control (DPC) in order to maintain seamless coexistence. This chapter presents a frequency tunable transmitter architecture capable of covering 1.1GHz bandwidth (1.4-2.5GHz) with 58.8% fractional bandwidth using a single efficient narrowband switched-capacitor digital power amplifier. (SCPA) [24]. The operating frequency is controlled by a digitally programmable series capacitor bank (DPC) in combination with the cumulative equivalent capacitance of the SCPA. Although the series capacitance from DPC affects the resonant frequency by changing the reactive component of the termination impedance (Xopt), it has no impact on the real optimum termination impedance (Ropt) presented by the matching network. The slight change in output power at different bands is due to the finite bandwidth of the LC balun. Such an 100 energy and area efficient approach allows flexible operation of a single PA over a contiguous wider band of frequencies without hindering efficiency or affecting output power. The SCPA inherently having superior linearity compared to other switching PAs (class E, class F etc.) is able to meet regulatory specifications of EVM and ACLR without any digital predistortion (DPD) [29]. The chapter is organized as follows. The motivation and problem statement are discussed in Section 4.2. A brief summary of the SCPA is provided in Section 4.3. Frequency tunable SCPA architecture is discussed in Section 4.4. In Section 4.5, implementation details are provided, followed by measurement results in Section 4.6. A pathway to future direction is discussed in Section 4.7. Finally, it concludes with a summary and comparison results in Section 4.8. 4.2 Problem Statement and Motivation IoT is the vision of universal connectivity, but 5G is the tool that can truly transform this mission into a success. IoT comprises diverse applications; therefore, 5G must be able to adapt to a wide range of operating specifications in terms of power, frequency, adaptability, and flexibility. One fundamental challenge is that 5G must be able to cover a wide frequency range to support multiple modes of operation and multiple channels, especially in a mobile platform. The situation is aggravated by the fact that available spectrum is heavily crowded and discretely fragmented. Figure 4. 1 shows the frequency spectrum allocation in the United States. 5G consists of the most heavily crowded and fragmented range, especially due to military, commercial, industrial and noncommercial applications. Therefore, it is desired for the transmitter designed for 5G 101 to be able to cover wide frequency range with good energy efficiency in order to leverage cost and area reduction and support multiple applications. This argument is even more critical in a mobile platform because these metrices are more crucial in battery-powered mobile devices. There are several solutions in the literature to cover wideband PA operation [64], [65]. One common approach is to use distributed transmitters (shown in Figure 4. 2) [70], [71]. Distributed transmitters suffer from poor efficiency due to the nonconstant load impedance at each stage. The last stage saturates first due to the larger signal swing. Therefore, usually the PA is operated at a higher back-off, which eventually aggravates overall efficiency. One solution to this nonconstant load impedance is to use balanced amplifiers where each PA shares the load impedance equally. However, the balanced amplifiers require couplers at both input and output. The wideband couplers have higher insertion loss; hence, it cannot improve efficiency significantly. Another solution is to use single wideband amplifier designed for higher power (shown in Figure 4. 3). The wideband transmitters cover wider bandwidth but only at the sacrifice of lower energy efficiency. This is due to multiple factors. Firstly, passive components and matching networks are built with a lower quality factor because the bandwidth is inversely proportional to quality factor. Lower quality factor increases insertion and conduction loss in the passive components and matching networks. Secondly, one narrowband bandpass filter is required for each individual band of operation. Thirdly, the isolator has associated insertion loss. Another popular solution is to use an individual narrowband PA and bandpass filter for each band (shown in Figure 4. 4). As the PAs are narrowband, this architecture offers significantly better efficiency compared to wideband PA. However, 102 this architecture is energy efficient but occupies a significantly larger area when compared to wideband solution. Moreover, each PA must be designed and optimized for an individual desired band; hence, it increases design complexity. Therefore, a single narrowband PA with a tunable bandpass filter leverages the area efficiency of a wideband PA architecture and energy efficiency of the multiple narrowband transmitter architecture. This dissertation presents the first ever single narrowband voltage mode SCPA architecture with 58.8% fractional bandwidth. SCPA being a digital voltage mode PA topology exhibits best linearity among all switching mode PAs (Class E, F, D). Moreover, the equivalent Thevenin impedance of an SCPA is capacitive (excluding turnon resistance in the switches); hence, changing equivalent capacitance would leave the real impedance and output power unhampered from the output port but would change the resonant frequency due to change in the imaginary part of the equivalent capacitance. Therefore, SCPA is an ideal candidate for the frequency tunable transmitter architecture. 4.3 Operation of a Switched-Capacitor Power Amplifier (SCPA) SCPA is a polar DPA that is capable of delivering high power with high average efficiency [24]. It has the best efficiency among all nonload modulated PA topologies. It also offers better matching across process-voltage-temperature (PVT) variations because the output can be expressed as a ratio of values (capacitance) rather than the absolute value. The architecture being a voltage mode switching PA rather than a current mode switching PA (e.g., class E) is more adaptable to ever shrinking voltage headroom. Current mode PA exhibits poor linearity in terms of AM-AM and AM-PM due to load dependent parasitic capacitances. SCPA offers best linearity among all switching PAs. 103 SCPA is a scalable and digital friendly PA architecture, which uses Class-D topology at the core of its functionality, but the superior performance is due to different amplitude modulation technique. Class-D architecture works on the principle of envelope elimination and restoration technique. A phase modulated clock (LO) switches the PA inverter cell while the low drop-out (LDO) provides the amplitude modulated supply to the inverter. A polar SCPA improves the energy efficiency by removing the LDO and replacing the capacitor in the matching network with an array of capacitors with smaller values where output voltage can be controlled by capacitive voltage division. The bottom plates of the capacitors are switched between VDD and VGND by a phase modulated clock switching the inverter cells. Output amplitude can be modulated depending on the ratio of capacitors that are toggling and that are held at fixed potential. The concept of SCPA has been discussed in Chapter 2. The output voltage of an N-element SCPA array with equal capacitances where n capacitors are switching can be expressed as 2 𝑛 𝑉𝑜𝑢𝑡 = 𝜋 𝑁 𝑉𝐷𝐷 , (4. 1) where VDD is the supply voltage, and the coefficient 2/π is due to the fundamental coefficient in the Fourier expansion of a square wave. The design process is initiated by calculating the optimum termination resistance (Ropt) according to the desired output power Pout as follows 𝑉2 2 𝑛 𝑃𝑜𝑢𝑡 = 𝑅𝑜𝑢𝑡 = (𝜋)2 (𝑁)2 𝑜𝑝𝑡 2 𝑉𝐷𝐷 𝑅𝑜𝑝𝑡 . (4. 2) Total capacitance is subdivided into unit cells based on the architecture, required resolution, and OOB noise requirement. Full unary, unary-binary, split array unarybinary, and C-2C-unary are among the popular choices [53]; however, each architecture 104 comes with different sets benefits and disadvantages. The choice of the required resolution is dictated by the matching of the capacitances in a particular process and required OOB noise requirement. An 8-bit resolution (4-bit binary LSB and 4-bit unary MSB) switched-capacitor architecture is preferred for this design (shown in Figure 4. 5). The size of the capacitors for an M-bit binary LSB and L-bit unary MSB can be calculated as follows 𝐶𝑈,0 = 𝐶𝑈,1 = 𝐶𝑈,𝐿−2 = 𝐶𝑈,𝐿−1 = 𝐶𝐵,0 = 𝐶𝐵,1 21 = 𝐶𝐵,2 22 = 𝐶𝐵,𝑀−1 2𝑀−1 = 𝐶𝑡𝑜𝑡 2𝐿 𝐶𝑈,𝐿−1 2𝑀 . (4. 3) 𝐶 𝑡𝑜𝑡 = 2𝐿+𝑀 , (4. 4) where 𝐶𝑈,0 , 𝐶𝑈,1 , … , 𝐶𝑈,𝐿−1 are the unary capacitances, and 𝐶𝐵,0 , 𝐶𝐵,1 , … , 𝐶𝐵,𝑀−1 are the binary capacitances. Any series reactive component placed in the array would not alter the Ropt or the Pout, but it would change the resonant frequency of the operation due to the change of equivalent Thevenin admittance seen from the matching network. This case is illustrated in Figure 4. 6. The power consumption in a SCPA is due to the charging-discharging current through the PA. The input and output power can be expressed as 𝑃𝑖𝑛 = 𝑛(𝑁−𝑛) 𝐶𝑡𝑜𝑡 𝑁2 2𝐿+𝑀 2 𝑉𝐷𝐷 𝑓𝑐 , (4. 5) where 𝑓𝑐 is the frequency of operation for the SCPA, and 𝑅𝑜𝑝𝑡 is the optimum termination impedance presented to the PA. The efficiency of the SCPA is determined by the ratio of output power to total input power as 𝜂𝑆𝐶𝑃𝐴 = 𝑃 𝑃𝑜𝑢𝑡 𝑖𝑛 +𝑃𝑜𝑢𝑡 = 4𝑛2 4𝑛2 + 𝜋𝑛(𝑁−𝑛) 𝑄𝑁𝑊 , (4. 6) 105 where 𝑄𝑁𝑊 is the loaded quality factor of the matching network. Ideally, a high quality factor (Q) matching network increases SCPA average efficiency. However, the maximum loaded Q that can be used is limited for two practical reasons. First, losses in the matching also increase, keeping abreast with the increase of Q when using lossy on-chip passive components (e.g., spiral inductors and transformers). Second and most importantly, the bandwidth is inversely proportional to the Q; hence, high Q matching results in a narrower operable frequency range without tuning. 4.4 Frequency Tunable Multiband SCPA The SCPA can be modeled as a series-resonant RLC circuit; the resistance arises from the finite “ON” resistance of the CMOS switches, and the series capacitance is the equivalent capacitance approximately calculated as the summation of all unit capacitances. An L-bit Unary SCPA with a unit unary capacitor of C would result in an equivalent impedance of LC, assuming low switching resistance and negligible device and nodal parasitic capacitances. The capacitance, CA, arising from the DPC can be placed in series combination with the total capacitance of the array, Ctot. Therefore, the combined capacitance, CT, which is approximately Ctot + LC, does not change the linear operation of the SCPA or the Ropt, but it changes the resonant frequency of the PA by changing the Xopt. DPCs are readily available from both microwave monolithic integrated circuit (MMIC) and microelectomechanical system (MEMS) manufacturers, and they offer a wide tuning range and moderate to high quality factor [72]–[75]. Schematic of the differential tunable DPA is shown in Figure 4. 7. The bondwire inductance from the 106 package is included in series with a surface mount inductor to resonate with CT. The implementation is differential, and a surface mount LC balun is used as a matching network. It converts the differential output from the SCPA to a single ended output to drive the antenna port. It also performs the impedance step-down transformation by presenting optimum impedance to the PA by converting 50Ω antenna impedance to the desired Ropt in order to transmit at the desired output power. LC balun involving a discrete passive component such as the one used in this design offer lower insertion loss, but it has relatively narrower bandwidth compared to the transmission line based wideband baluns. In addition, discrete LC baluns are cheaper and compact, and they offer flexibility to reoptimize in case of over-estimation or under-estimation of bondwire inductance or nodal parasitic capacitance. It also provides tolerance to packaging or PCB manufacturing aberrations. The DPC is controlled through a serial-peripheral interface (SPI) that sets the CT in the series resonant network and hence sets the output center band frequency. 4.5 Implementation The implemented SCPA in this design is a polar one; hence, it has an amplitude path and phase path as shown in Figure 4. 7. A digital pattern generator generates the digital codeword corresponding to the amplitude envelope. The codeword effectively controls the number of unit PA cells that toggles at the RF clock by the EN control pin of an ‘AND’ logic gate. A vector signal generator generates the differential RF clock. The differential implementation improves linearity and reduces the second order effects. Details about the critical design choices and sizing of the crucial components are discussed in the next section. 107 4.5.1 Capacitor Selection and Sizing The SCPA used for this design is an 8-bit one segmented as 4-bit binary and 4-bit unary. The segmentation is based on the figure-of-merit (FoM); segmenting the array equally results in better FoM [53]. A full unary array would increase the number of unit cell; although the size of the unit cells would scale accordingly. Therefore, the switching loss would not change. However, the routing complexity and the power overhead in the binary-to-unary decoder would increase significantly. Moreover, the matching between cells due to lower unit capacitance also would aggravate the integrated/differential nonlinearity (INL/DNL). The Ropt is calculated from Equation 4. 2 from desired Pout and required resolution. The network quality factor QNW dictates the value of Ctot. The matching network efficiency for a two-element LC down-conversion can be calculated as 𝜂𝑚𝑎𝑡𝑐ℎ = 𝑄 1− 𝑁𝑊 𝑄𝑐𝑎𝑝 𝑄 1+ 𝑁𝑊 . (4. 7) 𝑄𝑖𝑛𝑑 Total SCPA efficiency can be calculated as the product of the efficiency of the matching network and the efficiency of the SCPA without matching as 𝜂𝑡𝑜𝑡𝑎𝑙 = 𝜂𝑆𝐶𝑃𝐴 . 𝜂𝑚𝑎𝑡𝑐ℎ = 4𝑛2 𝜋𝑛(𝑁−𝑛) 4𝑛2 + 𝑄𝑁𝑊 . 𝑄 1− 𝑁𝑊 𝑄𝑐𝑎𝑝 𝑄𝑁𝑊 1+ 𝑄𝑖𝑛𝑑 . (4. 8) SCPA efficiency is shown in Figure 4. 8 in terms of QNW and percentage of unit cells toggling in the array for a reasonable Qcap ~50. It is evident from Figure 4. 8 that the efficiency of SCPA has a convex optimum peaking for Q NW between 2~4. Ctot can be calculated from QNW ~ 3 and desired Ropt as follows 108 𝑋𝑜𝑝𝑡 𝑄𝑁𝑊 = 𝑅 𝑜𝑝𝑡 1 = 2𝜋𝑓 𝑅 𝑐 𝑜𝑝𝑡 𝐶𝑡𝑜𝑡 . (4. 9) The total capacitance, Ctot, is calculated using Equation 4.9 from required Ropt (Equation 4.2) and QNW (=2~3). Once the Ctot is calculated, Equation 4. 3 and Equation 4. 4 can be used to determine the exact value of the unit capacitors. These equations determine the ratio of unit capacitors in a binary-unary switched-capacitor array to achieve a desired Ctot. For this design, the unary capacitance is calculated as 87.5 fF, resulting in a total array capacitance, Ctot, of 1.4 pF. Metal-insulator-metal (MiM) capacitors were preferred instead of vertical natural metal capacitors (VN) owing to better matching. 4.5.2 Programmable Capacitor (DPC) and Matching Network For experimental purpose, an off-chip DPC is used for this design due to the flexibility of tuning. The future vision is to use 3-D MEMS based DPC, which is discussed in Section 4. 7. DPC is widely available from various manufacturers [72]– [75]. The DPC used in this design is a 6-bit programmable capacitor with capacitances ranging from 0.47-13 pF, and component Q between 8 and 42. Hence, the total capacitance, CT, when combined with the array ranges from 0.37-1.26 pF. CT, is resonant with the combination of the wirebond inductor, surface-mount wirebond inductor (Q>50), and the package inductance from the DPC. A conventional LC balun is used to convert differential output to single ended in order to drive antenna port. The total capacitance in the array seen from matching network remains constant, regardless of the input code. Hence the matching network is unchanged for any choice of input code. A series inductor in combination with the LC balun performs down- 109 conversion to desired Ropt. An off-chip matching network is preferred is this design due to the added flexibility of frequency tuning by matching network in combination with the DPC. 4.5.3 Switch, Driver, and Logic Design SCPA is comprised of multiple slices of unit cells; hence, careful design and layout consideration is necessary for placement, floor-planning, and routing. The PA unit cell is basically a cascode inverter where the top device (PMOS) is switched between VDD2 (2.4V) and VDD (1.2V), and the lower device (NMOS) is switched between VDD and VGND. The gates of the middle devices are held constant at VDD; hence, the PA cell can operate at 2.VDD with nominal voltage between gate-source terminals. Operating at higher supply reduces DC current. Therefore, layout traces and supply routing can be narrower, eventually reducing parasitic capacitance metal-to-metal coupling. A level shifter is used to clamp the RF clock between VDD2-VDD. The total DC current can be calculated roughly from the supply voltage and required output power. The current through each unit slice is calculated from total DC current and number of unit cells. The sizes of the devices in the PA cell is roughly calculated from the current through unit PA. Load pull simulation is needed in order to optimize the size of the PA cells. The buffer chain that drives the PA stage is extremely critical. It should be properly sized in order to synchronize the timing between the PMOS and NMOS clocks. An overlap causes a significant crowbar current through the PA stage and ultimately aggravates the drain efficiency. On the contrary, a high slew rate pulse at the PA gate creates third harmonics at the output, eventually affecting the efficiency 110 adversely. Additionally, timing mismatch between the differential unit cells would showup as second harmonics. Therefore, the driver should be designed to optimize power and timing. Ideally, driver stage should have a ratio of 2.7 drive strength. A lower ratio would increase power consumption but sharpens the clock at each subsequent stage. A threestage driver chain with a fan-out ratio of 2 is used in this design. PMOS path has additional delay due to level shifting; therefore, device sizes should be optimized in order to synchronize the timing between NMOS and PMOS. Moreover, inverters after the level shifters are placed in isolation wells to allow operation from these different supply rails. Each unit slice in the unary array is identical. Therefore, proper planning is necessary for the physical design so that all cells can be placed close together to reduce nodal parasitic and timing delay mismatch. Contrary to unary cells, each subsequent binary unit cell scales by a factor of 2; hence, device sizes are reoptimized for correction of timing and mismatch. The layout of the testchip and corresponding bondwire connection is shown in Figure 4. 9 and Figure 4. 10. 4.6 Experimental Results 4.6.1 Measurement Test-Bench and Instrumentation The prototype of the frequency tunable DPA is fabricated in a 65 nm CMOS process with nine layers of ultra-thick top metal for high quality factor passive components. The microphotograph of the chip is shown in Figure 4. 11. The prototype die is chip-on-board (COB) bonded to a PCB where it is interfaced with the I/O and DPC. The area of the DPA, including all I/O pads, is 0.575 mm2. However, the core circuits only occupy 0.12 mm2 because SCPA architecture fundamentally does not contain any 111 inductor/transformer except at the matching network. On-chip capacitors are area efficient and offer good quality factor. The cascoded output stage operates at 2.4 V, while all digital logic and clock drivers and distribution operate at 1.2 V. Cascoding and levelshifting allows all the devices to operate within rated voltage limit of 1.2 V across any two terminals unlike in other topologies such as Class-E switching PA. The array capacitors, driver chain, and the binary-to-unary decoder collectively occupy only a small fraction of the total area as shown in Figure 4. 9 and Figure 4. 11. The decoupling capacitors placed both on the left and right of the SCPA are critical. They reduce the effective inductance due to supply traces. The supply inductance causes a nonlinear function of DC current and RF output amplitude. Therefore, it causes code/amplitude dependent nonlinearity eventually requiring power hungry digital predistortion. The supply is routed in a grid-like pattern so that the equivalent inductance of each trace is parallel with others, eventually reducing overall inductance. The decoupling capacitors placed between supply and ground node work as a filter eliminating high frequency supply noise. Figure 4. 12 shows the instruments used for the measurements. Measurement setup (top left) includes all connections to the instrument. All DC and digital I/O inputs are connected to header pins, while the RF inputs are connectorized to the SMA plugs. Measurement setup for PSD and demodulated constellation for OFDM signal is shown at the top right. PCB with DUT chip-on-board (COB) bonded to the PCB is shown at the bottom of Figure 4. 12. Measurement test-bench for the tunable SCPA is shown in Figure 4. 13. Amplitude modulation is controlled by NI-6555 (High speed digital I/O). RF modulated input is generated by NI-5673 (Vector Signal Generator). Supply voltages are 112 generated from NI-4113, and the output is measured with NI-5663E (Vector Signal Analyzer). NI-6555 also provides serial peripheral interface (SPI) to control the DPC. 4.6.2 Static Measurements The performance of the tunable DPA is measured for efficiency and output power across frequency and output code with a static phase offset. All losses from the resonant inductors, DPC, LC balun and the pad drivers are included in the measurements when calculating system efficiency (SE). Figure 4. 14 and Figure 4. 15 show the output power and peak SE (total output power to total input power including DC), respectively, across the whole tunable frequency range of 1.4-2.5 GHz and output power range. The measured Pout versus frequency is shown for DPC code = {63, 32, 16, 8, 4, 2, 1, 0}, with the highest code corresponding to lowest frequency. The complete PA consumes a peak current of 200 mA from the 2.4V supply and 7 mA from the 1.2 V supply, while operating at the highest frequency. The tunable frequency range spans 1.4-2.5 GHz (8 frequency band), with 58.8% fractional bandwidth, and Pout varies by only 1.7 dB (minimum =20 dBm, maximum =21.7 dBm). The peak system efficiency (SE), which includes all external losses and input power (DC and RF), ranges from 25-38.1%. The efficiency is maximum at the lower range of frequency due to reduced switching loss. The efficiency increases again towards the higher range of frequency because the LC balun has lower insertion loss (IL) between 2.2-2.5 GHz, which is evident from the increased Pout within this range of operation. The efficiency and the output power both reduce in the middle of frequency range, mostly because of the finite bandwidth of the balun. Higher order broadband matching can flatten the Pout or efficiency, but at the sacrifice of reduced peak efficiency 113 [76]. Pout versus code-word (shown in Figure 4. 16) and SE versus Pout (shown in Figure 4. 17) are plotted at eight distinctly visible center frequencies, corresponding to the DPC states above. The output shows high linearity and good back-off efficiency as the power is varied across the dynamic range of the SCPA, at all frequencies. Due to low-loss switches and superior capacitor matching available in CMOS processes, the SCPA is linear with respect to the input code-word. This is independent of the frequency of operation, provided that at the operating frequency, the behavior of the output stage approximates a hard-switch effectively reducing the effect of the odd harmonics. With the wide range of output frequencies in the proposed PA, digital predistortion (DPD) is undesirable since it would require a large lookup table and highpower consumption. The linearity of a DPA can be measured by its integrated/differential nonlinearity (INL/DNL) and is plotted at four different frequencies across the range of operation in Figure 4. 18, Figure 4. 19, Figure 4. 20, and Figure 4. 21, respectively. The peak INL is approximately +2LSB~ -3LSB at frequency 2.23 GHz. The INL is mostly dictated by the supply and ground bondwire inductance in conjunction with the capacitive mismatch and nodal parasitic from physical design in the array. This effect can be minimized by opting a low inductance packaging such as flip-chip or wafer probing. The peak DNL is <1LSB except for the high codes at 2.23 GHz. The spurs in the DNL are periodic and mostly due to mismatch between the binary and unary sized capacitances. The resulting system can be operated without DPD because of the good linearity achieved. Additionally, the higher Q output network results in sharper filtering of the OOB noise, reducing the noise in the adjacent spectrum for alternate bands and enhancing coexistence. 114 4.6.3 Dynamic Measurements To validate the in-band performance of the PA, an LTE 64 QAM, 20 MHz, OFDM symbol with data rate=100.8 Mbps is measured at the PA output port, without DPD at frequencies across the band of operation. The measurement result is shown in Figure 4. 22. The measured EVM of the signals at peak Pout is <3.9%-rms. It also meets the EEUTRA adjacent channel leakage ratio (ACLR) specification for an uplink signal of <30 dBc within the whole range of frequency. The average output power and SE across all frequency bands are >14.1 dBm and 15.7%, respectively, at this ACLR. The best-case dynamic average Pout and SE are 14.1 dBm and 23.7%, respectively, at an output frequency of 1.53 GHz (DPC State = 63), as shown in Figure 4. 22. SCPA is a quantized switching PA architecture; therefore, OOB noise is mostly dictated by the quantization noise. The PSD for an LTE 64 QAM, 20 MHz, OFDM symbol with data rate=100.8 Mbps is shown in Figure 4. 23. The power spectrum density is plotted for outputs centered at 1.53, 1.58, 1.69, 1.86, 2.04, 2.23, 2.34, and 2.45 GHz, corresponding to DPC states 63, 32, 16, 8, 4, 2, 1, and 0, respectively. The PSD demonstrates OOB noise for a DAC with an effective number of bits (ENOB) of ~7.3. The OOB noise performance is consistent across the different bands shown; although slight differences may arise from better filtering from the matching network towards the higher end of the frequency range. OOB noise performance can be further improved with high resolution C-DAC architecture, as discussed in Chapter 2, and multirate Delta-Sigma C-DAC architecture is discussed in Chapter 3 [25], [77]. 115 4.7 Future Directions The DPC used in this work is an off-chip one. Any off-chip component is not suitable for mass production. It increases cost, design time, and needs additional resource for assembly. Therefore, it is desired to incorporate the tunable capacitor (DPC) either into the chip or to find a more compact and area efficient integrated solution. Interestingly, the concept of a tunable frequency can be achieved either by tuning inductance, or capacitance tuning or a combination of both. For instance, the resonant frequency (= 1 √𝐿𝐶 ) is dependent on both the equivalent inductance and capacitance. The combined impedance tuning gives better tuning range and added flexibility to tune bothways (increase or decrease). The conventional approach of designing a transmitter is to design the PA separately, assuming that the PA will drive a 50Ω antenna. The matching network converts the antenna impedance to a complex impedance Ropt+ j. Xopt depending on the desired output power and the equivalent admittance looking into the PA terminals, as illustrated in Figure 4. 5 and Figure 4. 6. There are several problems to this approach. Firstly, the system efficiency is a product of PA efficiency and the matching network efficiency. Hence, the efficiency of the matching network directly affects the overall energy efficiency. Secondly, higher quality factor passive components offer better efficiency but comes at the price of reduced bandwidth. Thirdly, the equivalent impedance of an SCPA is highly capacitive due to switched-capacitor array. Hence, the matching network must be highly inductive. Large inductors not only occupy significant die-area, but they also contribute nonlinearity due to the asymmetric filling. Filling and density check are unavoidable due to fabrication process requirement. In summary, 116 removal of matching network can save area and cost, improve performance, and significantly increase system efficiency. Antenna with tunable impedance is a common approach to many energy harvesting circuits and systems [78], [79]. This concept can be applied to tunable frequency transmitter architecture. Loop antenna is inherently inductive. Therefore, the antenna can be designed to have a complex impedance of Ropt+ j. Xopt instead of 50Ω. The size of the antenna determines the Ropt, while the distance between the parallel slots determine the Xopt in a loop antenna [78]. Ropt can be designed to meet desired output power, while Xopt can be designed such that it balances out the effective equivalent series capacitance of the switched-capacitor array [80], [81]. An off-chip DPC has associated bondwire inductance that causes nonlinearity. Flip-chip is a common technique for commercial SoC. It reduces bondwire inductance significantly and hence improves linearity and reduces parasitic capacitance. It is also mechanically more stable compared to wirebonding. A MEMs based 3-D stack digitally programmable capacitor can be attached on top of the chip, as shown in Figure 4. 24. 3D DPCs are commercially available from different manufacturers [72]–[75]. Therefore, the codesign of the antenna and embedding the DPC on-top of the chip can improve efficiency and provide an extra degree of freedom of frequency tunability. 4.8 Comparison and Summary A narrowband tunable digital power amplifier with fractional bandwidth of 58.8% has been introduced and implemented in 65nm CMOS. Instead of using wideband PAs or multiple narrowband PAs, a single narrowband PA is able to operate at multiband 117 within the frequency range of 1.4-2.5 GHz using a digitally programmable capacitor. It is a cost effective and area efficient solution to take leverage of the fragmented available spectrum without sacrificing performance metrics such as output power, linearity or efficiency. The DPA does not even need the digital predistortion (DPD) owing to the superior linearity of the SCPA architecture. The performance is validated from both static and dynamic measurements. The PA achieves a peak power of 21.7 dBm, and peak system efficiency is 38.1%, including all DC and input RF power consumption. The extracted simulated output power is around 22.5 dBm with SE of 42%. The efficiency is in close approximation to the desired one; slight reduction is partly due to reduced output power and underestimation of bondwire inductance. Output power varies only by 1.7 dB within the frequency span of 1.1 GHz. The ACLR is below the required -30 dBc LTE standard, and the measured EVM for peak Pout is <3.9 %-rms. A comparison to prior multiband/multistandard power amplifiers is provided in Table 4. 1. Compared to recent wideband PAs, the proposed solution offers the highest average efficiency and power for high PAPR OFDM signals, while including all external measurement losses. 118 Table 4. 1 Comparison to prior art for recent wideband/multiband/multistandard power amplifiers. Attributes This work Broadband architecture Programmable capacitor array Frequency (GHz) 1.4-2.5 Fractional BW (%) Pout (dBm) Efficiency (%) Modulation Dynamic Pout(dBm) Avg. Efficiency (%) DPD EVM (%-rms) ACLR (dBc) Supply (V) CMOS technology (nm) 58.8 20.0-21.7 25-38.1+ OFDM 20MHz 64QAM 14.1-15.1 15.7-23.7+ No 2.9-3.9 <-30 1.2/2.4 65 Matching network Off-chip Measurement PCB [14] Wideband W/ 2nd harmonic match [16] Wideband multi-resonant match 2-6 2–4.3 115.5 20.1-22.4 19-28.4& OFDM 20MHz 64QAM 11.1-13.2 N/A No <4% N/A 3.3 65 On-chip LC/XFMR Wafer Probe [22] Discrete multiband IPD 78.4 23.9-24.9 27-42.7# OFDM 20MHz 64QAM 14.6$ 15.6#$ PM-free 2.95$ <-33.5$ 1.4 28 0.835, 0.898, 1.88, 1.95 N/A 31.7-32 46.3-58.5& OFDM 20MHz 64QAM 27.5-28.1 33.3-37.3& No 3.5 <-32 3.4 153 On-chip XFMR SIP-IPD Wafer Probe PCB + System Efficiency (SE); &Power Added Efficiency(PAE); #Drain Efficiency(DE); $Modulation @2.8 GHz 119 Figure 4. 1. spectrum). United States frequency spectrum allocation (black box marks the 5G Figure 4. 2. Dual-fed distributed amplifier. 120 Figure 4. 3. Wideband transmitter architecture. Figure 4. 4. PAs. Multiband transmitter architecture using multiple discrete narrowband 121 Figure 4. 5. Concept of M-bit binary, L bit-unary tunable SCPA. Figure 4. 6. Concept of tunable frequency SCPA. 122 Figure 4. 7. Detailed schematic of the proposed tunable SCPA. 123 Figure 4. 8. Dependency of SCPA efficiency on loaded quality factor. Figure 4. 9. Layout of the tunable SCPA in 65 nm CMOS. 124 Figure 4. 10. Bondwire connection of the tunable SCPA. Figure 4. 11. Chip microphotograph of the tunable SCPA in 65 nm CMOS. 125 Figure 4. 12. Instruments and the chip-on-board. Figure 4. 13. Measurement setup for the test chip. 126 Figure 4. 14. Measured output power, Pout (dBm) versus frequency. Figure 4. 15. Measured system efficiency, SE (%) versus frequency. 127 Figure 4. 16. Measured output power, Pout (dBm) versus input code-word. Figure 4. 17. Measured system efficiency, SE (%) versus output power, Pout (dBm). 128 Figure 4. 18. Measured INL/DNL for DPC state = 1 and frequency = 2.34 GHz. Figure 4. 19. Measured INL/DNL for DPC state = 2 and frequency = 2.23 GHz. 129 Figure 4. 20. Measured INL/DNL for DPC state = 32 and frequency = 1.58 GHz. Figure 4. 21. Measured INL/DNL for DPC state = 32 and frequency = 1.53 GHz. 130 Figure 4. 22. Measured spectrum and constellation for an LTE 64 QAM, 20 MHz OFDM (100.8 Mbps) symbol. All measurements are DPD-less. 131 Figure 4. 23. Measured PSD for a 64 QAM, 20 MHz OFDM (100.8 Mbps) LTE across the tunable frequency range. All measurements are DPD-less. 132 Figure 4. 24. A MEMS based 3-D stack capacitor and a custom antenna with desired complex impedance. 133 CHAPTER 5 AN ULTRA-LOW POWER FULLY INTEGRATED PULSE-WIDTH MODULATED CMOS TEMPERATURE SENSOR FOR INTERNET-OF-THINGS 5.1 Introduction CMOS scaling has enabled fine-line ICs to offer superior performance, but at the expense of increased design complexity particularly when aiming to enable reliable performance over the range of expected process, voltage, and temperature (PVT) variations. Therefore, many modern microprocessors use multiple temperature sensors in critical blocks to maintain performance over a wide range of operating temperatures and process corners. On a modern multicore CPU, individual cores may be separated by large physical distances. Multiple temperature sensors placed in sensitive blocks can provide local and global heat maps and IR drops (e.g., resistive voltage drop). This information is helpful for troubleshooting, reliability analysis, fault analysis, and it provides extra leverage for improvement of physical design. In addition, many ICs such as wireless transceivers use temperature dependent look-up tables or polynomial curve fitting for reliability and improved performance [82]–[84]. Therefore, temperature sensors can provide local feedback and calibration facility, and hence improve linearity and efficiency. Additionally, a diverse slate of emerging applications such as infrastructure health monitoring, biomedical sensors and actuators, automotive vehicles, wearable 134 electronics, energy harvesting, self-driven vehicles, artificial intelligence and brainmachine interface (BMI), etc. all benefit from temperature sensing to reduce power consumption, operate from reduced supply-voltages, reduce chip area, and increase the acceptable range of temperatures and supply voltages that the devices can work over [85]–[88]. In conventional temperature sensors, bandgap references, reference current generators, multiple operational trans-conductance amplifiers (OTA), and one or more data converters (ADC or DAC) are required to enable accurate and reliable temperature sensing. These individual blocks typically have power consumption in the range of mWµW [82], [89]. Such high power-consumption is unacceptable for low-power operation, particularly when the operating power is provided by the energy scavengers. Moreover, these blocks typically require large voltage headroom, which is gradually shrinking due to CMOS process scaling. Bandgap voltage references use the threshold voltage dependencies of BJT p-n junctions and proper scaling of device geometries to generate temperature independent reference voltages [90], [91]. A complementary-to-absolute-temperature (CTAT) and a proportional-to-absolute-temperature (PTAT) reference voltage generator are scaled and summed, and their individual temperature dependences cancel each other out to generate a net temperature independent output reference voltage [92], [93]. This architecture innately has several challenges when trying to reduce power and supply voltage. First, a bipolar transistor is required. Bipolar transistors in CMOS are laterally diffused, and it is difficult to change the doping profiles of the base-emitter and base-collector junctions [94]. BiCMOS processes require extra mask layers and fabrication cost and subject the 135 chip to reduced production yield [95]. Second, relatively large voltage headroom (0.7 V) is needed to turn on the diodes in the BJTs, making them unsuitable for many low power applications, especially those that use energy harvesting [96]–[100]. Third, and most important, they consume relatively high power (> µW). 2T and 4T-topology based voltage references that use the MOS transistor in the subthreshold mode have been reported and offer low power consumption (e.g., a few pWs) [85], [101]. The 2T reference offers excellent performance over temperature, but the output reference voltage (~175.3 mV) is often too low compared to the threshold voltage of modern CMOS devices. The 4T reference offers higher output voltage (343 mV) but at the expense of degraded performance over temperature (33 ppm/0C). Both 2T and 4T references employ open-loop operation, making them prone to PVT variations and less adaptable to different process nodes. In this chapter, a bulk voltage compensation is leveraged to enable low-power operation, a higher output voltage, and reduced sensitivity [102]. It provides higher output voltage by allowing lower drain-to-source voltage operation and offsetting the resulting temperature dependency by compensating the bulk voltage. Bulk-voltage control is a common technique to reduce the threshold voltage of the devices but is not commonly used for temperature compensation in a voltage reference. Reference currents are the primary consumer of power in conventional temperature sensors. Conventionally, a reference current maintains a fixed voltage difference across the terminals of a resistor using OTAs configured in negative feedback. OTAs require large voltage headroom to achieve a desired open-loop gain and to minimize error. Moreover, the resistors used to create the reference current must be large 136 (e.g., >100 MΩ) to reduce power consumption [101]. Such high resistance is infeasible in standard CMOS. Different circuit topologies have been introduced in the literature where CMOS devices have been operated in a subthreshold mode of operation in combination of relatively smaller resistors (e.g., <10 MΩ) to generate a temperature independent current reference [103]–[106]. Large resistors require large chip area. N-well diffusion resistors offer increased resistance density. However, diffusion resistors are inherently nonlinear; hence they degrade the overall sensitivity of the temperature sensor. Because of this, they require additional calibration or trimming steps for accurate operation. Even resistor-less reference currents typically require a relatively large minimum supply voltage and can consume hundreds of nanowatts by themselves [107]. Many temperature sensors employ power hungry ADCs [82], [87], [89], [108]. ADCs typically require a stable supply voltage and higher voltage headroom; hence, they are difficult to power using energy harvesters (RF, TEG, etc.). Voltages from a harvester can be stabilized using a DC-DC converter but at the expense of higher power consumption [109]. Moreover, ADC increases circuit complexity; hence, it is unsuitable for certain applications such as near-field communication (NFC) where a sensor does not have a dedicated supply voltage. Therefore ADC-less temperature sensors are desirable. In this chapter, I propose a novel architecture that eliminates most of the traditional high-power consuming building blocks used in conventional temperature sensors (e.g., reference current, OTAs, and ADCs); hence, it enables reducing the overall power consumption significantly (<100 nW) and allows operation from the lowest reported supply voltage (~450 mV) [110]. This chapter is organized as follows. In Section 5.2, the circuit operation and design methodology for critical building blocks are 137 explained. In Section 5.3, simulation and measurement results of a prototype sensor are presented. Finally, conclusions and a summary of the key achievements are presented in Section 5.4. 5.2 Circuit Architecture and Theory of Operation The schematic of the proposed temperature sensor is shown in Figure 5. 1. In the proposed sensor, a stable reference voltage is generated using a 2T voltage reference circuit with bulk voltage compensation [111]. A 2T-based complementary-to-absolute temperature (CTAT) voltage is also generated to produce a voltage that is dependent on the ambient temperature of the sensor. The voltages are compared with a locally generated sawtooth waveform to produce voltage pulses that are combined using digital logic. The pulse from the reference voltage path is fixed, and hence the relative pulse width of the CTAT path can be compared via a ratio of the pulse duration to provide the temperature measurement. The design and operation of the individual blocks that comprise the temperature sensor are now described, beginning with the CTAT voltage generator. 5.2.1 CTAT Voltage Generation The most critical two blocks for optimal temperature accuracy are the CTAT voltage generator and the reference voltage generator. The CTAT used is based on the 2T bandgap cell and is shown in Figure 5. 2 (left). Nonlinearities in the CTAT voltage translate directly to temperature inaccuracy. However, the slope and the voltage level are not critical as long as the CTAT shows a linear temperature-to-voltage characteristic. The output voltage of a CTAT is calculated by assuming that the same the subthreshold 138 current flows through both M1 and M2; hence, by equating them, VCTAT can be solved as the following: 𝑊𝐿 𝑉𝐶𝑇𝐴𝑇 = 𝑓(𝑘)𝛥𝑉𝑡ℎ + 𝑓1 (𝑘) 𝑉𝑇 ln (𝑊1 𝐿2), 2 1 (5. 1) where f(k) depends on the work function, k, Vth is the threshold voltage of the devices, 𝑉𝑇 = 𝑘𝑇 𝑞 is thermal voltage, W is the width, and L is the length of the corresponding devices. The circuit shown in Figure 5. 2 (left) can be used to generate either a CTAT or a proportional-to-absolute temperature (PTAT) voltage. The operation as CTAT or PTAT only depends on the ratio of the nominator and denominator inside the logarithmic term in Equation 5.1. CTAT operation is preferred because it provides a more linear output voltage characteristic, with respect to temperature. Figure 5. 2 (right) shows the CTAT across typical, slow-slow, and fast-fast process corners in a 65 nm CMOS process. To validate the linearity of the slope across process corners, a Monte-Carlo simulation is run, and a linear regression is calculated for each simulation. A histogram of the coefficient of determination is shown in Figure 5. 3. The presented CTAT is designed to achieve a nominal temperature coefficient (TC) of -1.7 mV/°C. The temperature coefficient ultimately determines the accuracy of the sensor, as a larger TC magnitude increases the sensitivity of the pulse generation. Most CMOS processes offer a variety of devices with differing thresholds and operating voltages (e.g., core logic, RF and I/O devices). The first step to designing a CTAT is to choose the lowest possible threshold voltage NMOS device as the upper transistor (M1) in the 2T stack and a high(er) threshold voltage NMOS device as the 139 bottom transistor (M2). It is desired to have a large ratio in the logarithmic term in Equation 5.1 to increase the sensitivity of the output voltage, with respect to temperature. Hence, a minimum width (length) device is chosen for M1(M2). The length of M1 is chosen to be the maximum in the process. This not only increases the sensitivity, it also reduces power consumption. The width of M2 is constrained by the available area for the design. Because threshold voltage is process dependent, the optimal devices and device sizes should be found for the individual CMOS process being used. The proposed CTAT has been simulated in three different CMOS processes (65 nm, 130 nm, and 180 nm). It is noted that the devices/sizes used differ substantially based on the availability of devices in the given process; nevertheless, the process flow previously described can be followed for any CMOS process that offers a few alternative devices and works even for processes with only native devices. Figure 5. 4 shows the calculated CTAT voltage characteristics from device model parameters in three different processes and is compared with the extracted simulation results. Differences between the theoretical calculation and the simulation owe to the simplifications made for the subthreshold current equations. Note that only the overall linearity of the curve is important, so minor differences between theory and simulation can be ignored. Table 5.1 provides the optimized widths and lengths of M1 and M2 used, as well as the type of device used to generate the CTAT voltages are shown in Figure 5. 4. 5.2.2 Reference Voltage Generation The reference voltage is also based on the 2T topology, as shown in Figure 5. 5 (left). The operating principle is the same described by Equation 5.1, except for a constant 140 scaling factor. However, unlike the CTAT generator, a constant output voltage is desired across temperature for the reference voltage. Hence, proper device sizing is needed to null the thermal voltage dependent term in Equation 5.1. Additionally, the two NMOS devices (Figure 5. 5, M1, and M2) should be selected with significant threshold voltage differences. This allows a higher output voltage to be achieved even when constrained with a low supply voltage, VDD. A design constraint is that the drain-to-source voltage (VDS) should be >3×VT to reduce the effect of the exponential VDS dependent term of the well-known subthreshold MOS current and is given as follows: 𝐼𝐷 = 𝑘 ′ (𝑓 − 1)𝑉𝑇2 𝑒 (𝑉𝐺𝑆 −𝑉𝑡ℎ ) 𝑓 𝑉𝑇 (1 − 𝑒 −𝑉𝐷𝑆 𝑉𝑇 + 𝑉𝐷𝑆 𝑉𝐴 ). (5. 2) In the traditional 2T voltage reference, this constraint restricts simultaneously achieving a high output reference voltage, while using a low-supply voltage. Moreover, the source-to-bulk voltage is not the same for both M1 and M2, which leads to deviation from calculated behavior. Consequently, the output voltage is prone to variation over process corners. Figure 5. 5 (right) shows the output voltage versus temperature for three different process corners (slow-slow (ss), typical (tt) and fast-fast (ff)) in a 65 nm CMOS process for optimized 2T voltage references. The size of the devices is optimized for the tt-corner. Although the output voltage is relatively flat in the tt-corner, there is significant voltage variation across the ff and particularly the ss corner. Any variation in the reference voltage level translates directly into a temperature error in the proposed sensor architecture, as this would lead to a temperature dependent variation in the reference pulse width. To mitigate the deviation, it is possible to leverage a bulk terminal voltage compensation to compensate for the 141 output voltage deviation across temperature, as shown in Figure 5. 6. The proposed circuit controls the bulk/body voltage using a replica of the same circuit to offset the temperature dependent output variation, even at low VDS. By allowing a reduction in VDS, the power supply voltage, VDD can be reduced, which also reduces power consumption and allows operation with scavenged voltage sources. In the bulk/body compensated 2T voltage reference, the primary compensation network consists of M3 and M4 to create bulk compensation for the main 2T reference stage (e.g., M1 and M2). To further reduce the temperature sensitivity of the overall reference voltage, an auxiliary compensation (e.g., M5 and M6) can be added. The auxiliary stage is not mandatory and can be removed to save power or die area, at the expense of slightly reduced accuracy. The devices in the reference use channel lengths near the maximum allowable as a primary goal for the sensor system is to reduce power consumption. Shorter channel lengths can be used if other optimization goals are desired (e.g., reduced area); this would be at the expense of increased power consumption. For the optimized voltage reference in a 65nm process, the width/length for each device in the schematic of Figure 5. 6 is: M1: 3.6μ/80μ, M2:3.65μ/80μ, M3:500n/80μ, M4:100μ/80μ, and M5,6:400n/40μ. The reference is simulated across temperature with and without automatic compensation, as shown in Figure 5. 7 (left). After compensation, the reference achieves a precision of 8.3 ppm/°C. In order to enable design across different CMOS processes, a systematic design procedure is needed. First, the uncompensated 2T reference (Figure 5. 5 (left)) is designed using optimum W/L for the devices M1 and M2. The devices should have approximately equal W/L, but the ratio is determined according to the power budget, the available area, 142 and the minimum desired supply voltage. It should be noted that reduction of the supply voltage results in a lower VDS, and hence, it results in a temperature gradient. The resulting temperature gradient from the low voltage operation is compensated with the proposed replica paths. Depending on the PTAT/CTAT nature of temperature gradient of the main stage, the compensation circuit can be designed to control the bulk voltage of (M2) using a replica stage that is optimized to provide an opposing operation. If the main stage has a PTAT (CTAT) response, the compensation stages should also be designed to provide PTAT (CTAT) responses. This is because increasing (decreasing) the bulk voltage increases (decreases) the threshold voltage, which compensates for the temperature dependent deviation in the output voltage. To validate the sensitivity of the voltage reference across expected process variations, Monte-Carlo simulations are run across the expected range of process variations in a 65nm CMOS process. Shown in Figure 5. 7 (right) is a histogram of the range of expected reference voltages for the bulk compensated voltage reference of Figure 5. 6. The reference voltage has an average voltage of 285.4 mV with a standard deviation of 11.6 mV across 500 simulations. To validate the versatility of the technique, the reference voltage has been designed in three different CMOS processes using the design methodology previously described. Corner simulations of the output voltage versus temperature characteristic for the automatically compensated voltage reference are shown for 130 nm CMOS in Figure 5. 8 (left) and for 180 nm CMOS in Figure 5. 8 (right). Note that the auxiliary compensation circuit (e.g., M5 and M6, Figure 5. 6) has been omitted in these simulations, as these are for validation of the described technique only. Similarly, to validate the performance across the range of expected PVT variations, Monte-Carlo 143 simulations have been run. A histogram of the range of expected reference voltages is shown for a 130 nm CMOS process in Figure 5. 9 (left) and for a 180 nm CMOS process in Figure 5. 9 (right). The histograms show different reference voltages (VREF) due to different threshold voltages of the devices in the different processes. Nevertheless, both processes produce consistent outputs with a standard deviation  7.5 mV across >6500 run-sets in 130 nm and 180 nm process, respectively. It should be noted that the deviation of the output voltage across corners has no direct relation in determining the error of the overall sensor. It reduces the resolution of the sensor but does not directly translate to temperature error in the final output. On the contrary, the voltage gradient of VREF, with respect to temperature, directly contributes to the error. This is because changes in the voltage level across temperature correspond to differing reference pulse widths across the range of temperature. A static change in voltage level changes the pulse width, but the change is the same across all temperatures. Monte-Carlo simulations have been performed to find the maximum deviation of VREF (VREF) across the desired temperature range (T=-20~100 oC) and process variations. The histogram of the Monte-Carlo simulation results is shown for 130 nm in Figure 5. 10 (left) and 180 nm in Figure 5. 10 (right). The average temperature co-efficient is 1.68 and 2.4 ppm/oC in 130 nm and 180 nm, respectively. The performance improvement relative to 65 nm is expected due to longer-channel length of the devices. Table 5. 2 provides the optimized device sizes used in the simulations for all three processes that were evaluated. The unequal W/L of M1 and M2 in 130 nm CMOS is used to boost VREF because of the reduced threshold voltage difference among the available device options. The geometry difference was compensated using optimized devices for 144 bulk compensation. The architecture is less susceptible to PVT variations because the metrics that change across process-corners do not affect the overall performance directly. The output voltage reference level can change significantly due to process variations, but the bulk voltage compensation ensures that the variation does not occur across temperature and hence has little impact on the performance of the sensor that it will be embedded in. The temperature gradient which directly translates into sensor output error is < 300 μV in all three of the processes that it was simulated in, across the entire intended temperature range, as shown in Figure 5. 10. 5.2.3 Sawtooth Wave Generation To generate the PWM signals for the reference and CTAT voltages, they are compared with a sawtooth waveform. An inverter-based ring oscillator, frequency divider and capacitively loaded, current-starved inverter (CSI) comprise the sawtooth waveshape generation, as shown in Figure 5. 1. There is a trade-off between area and power consumption to ensure that the oscillation conditions are met at the desired frequency. A gate length of 20 m is chosen to satisfy the following: 1 𝑓 = 2 𝑁 𝜏, (5. 3) where N is the number of stages, and 𝜏 is the delay per stage. Although it is relatively more power hungry than alternatives (e.g., relaxation oscillators), the ring oscillator is chosen due to its reliable operation and relatively stable output period and duty cycle. The oscillator has an output frequency  1.07 kHz ( 24.38 kHz) at -20°C (100°C). The frequency is higher than needed to provide adequate temporal resolution, this is a compromise to maintain small overall size in the ring oscillator. To 145 save power in subsequent stages, the frequency is reduced with a frequency divider. The divider is comprised of cascaded true-single-phase flip-flips and provides a divide ratio of 16. To ensure that the duty cycle of the sawtooth pulse is consistent across temperature, the CSI’s current source is biased using a replica CTAT voltage generator that is identical to the one used as the temperature sensor. This guarantees that the charging/discharging current of the load capacitor is compensated across temperature. The schematic of the sawtooth wave generator is shown in along with the devices’ size and choice is shown in Figure 5. 11. As was previously noted, the oscillator frequency changes with respect to temperature. Frequency variation is acceptable, provided that it does not impact the duty cycle of the voltage pulses that will be generated. Therefore, the choice of the capacitor in the sawtooth generator is critical. The temperature sensor architecture relies on linear operation; hence, the value of capacitor is selected in such that a linear ramp is ensured across the entire temperature range. The frequency variation across temperature is plotted in Figure 5. 12 (left). The transient waveform of the output of the sawtooth generator is shown in Figure 5. 12 (right) for temperatures across the range and shows a consistent linear ramp with a consistent duty cycle. 5.2.4 Comparator Design Two comparators are used to generate a reference pulse and a CTAT pulse. The sawtooth waveform comprises one input of both comparators, while the reference voltage generator and the CTAT voltage generator comprise the other inputs, respectively, as 146 shown in Figure 5. 1. The comparator has a PMOS input and a cross-coupled (latchbased) NMOS active load as shown in Figure 5. 13. The design is standard with one exception. The tail current source is a transistor operating in sub-threshold conduction. Because rail-to-rail swing is not possible with this choice, the NMOS transistors are chosen as high-Vth devices, while the PMOS devices are chosen as low-Vth devices, to maximize the output voltage swing. To combine the reference voltage pulse with the CTAT voltage pulse, the comparators drive a MUX that is clocked by the ring oscillator; the outputs of the MUX are combined using a NAND-gate to realize a pulse-width modulated (PWM) output waveform. The temperature is the ratio between the output pulse widths of the reference and CTAT comparators. 5.3 Simulation and Measurement Results To validate the performance of the proposed sensor, it is fabricated in a 65nm RF CMOS process. The circuit occupies an area of 220×305 m2, excluding I/O pads, as shown in Figure 5. 14. The reference voltage generator dominates the chip area due to use of long channel lengths to minimize power consumption and mismatch. The I/O pads all include double stacked diodes for ESD protection; stacking was used to reduce parasitic leakage current in the ESD circuits. ESD is necessary because of the wirebond based assembly. Individual dies are chip-on-board (COB) and bonded to a custom printed circuit board (PCB) to allow for a wired measurement inside of a controlled temperature chamber, rather than wafer probing the device on a controlled temperature chuck. Before the temperature can be measured with the proposed system, a two-point temperature 147 calibration must be performed for each die to estimate the slope and absolute level of the CTAT voltage source, since its absolute level and TC can change due to PVT variations. This calibration is offline and needs to be performed only during validation after manufacture. The simulated time-domain output waveforms of the CTAT and reference voltage generators, as well as the output of the sawtooth generator and the PWM generator, are plotted in Figure 5. 15, for an ambient temperature of -20 °C. Two distinct pulse-widths of VPWM are observed, the first corresponding to the reference and the second corresponding to the CTAT. The system is designed such that the reference pulse is always longer than the CTAT pulse. This is done by ensuring that the maximum value of the CTAT voltage never exceeds the value of the reference voltage. The two most critical blocks (e.g., reference voltage and CTAT generators) have been simulated with extracted parasitic capacitance across temperature and supply voltage range. Figure 5. 16 (left) and Figure 5. 16 (right) show the temperature coefficient and the line sensitivity of the reference voltage, respectively. The reference voltage achieves a line sensitivity of 0.012%/V and a TC of 8.3 ppm/0C. The temperature range is dominated by the fact that the CTAT voltage should not lie in the nonlinear region of sawtooth waveform. The CTAT offers a sensitivity of -1.7 mV/0C, as shown in Figure 5. 17. Note that the slope is uniform across different operation voltages; hence, the aforementioned calibration only needs to be performed at the platform voltage that will be used. The simulated maximum error is +0.2 0C to -0.22 0C and power consumption using extraction ranges from 7.9 nW-120 nW at -20 °C and 80 °C, as shown in Figure 5. 18 (left) and 5. 18 (right), when using a supply voltage of 0.45 V. The power consumption 148 for each block (Figure 5. 1) is simulated to estimate the power breakdown of the entire system. The breakdown is shown in Figure 5. 19. The dominant power consumers are the comparator and the sawtooth generation, which is expected since these blocks have switching like behavior and operate with relatively high activity factors. Note that the reference and CTAT voltage generators only consume 8% of the total power budget. To measure the temperature sensor across temperature, the COB PCB assembly was placed into a Thermotron temperature chamber and allowed to soak for 3 minutes at each temperature measured to ensure that the ambient temperature is stable. A highimpedance source measurement unit (SMU) was used to monitor the voltages of both the reference and CTAT generators, across two different COB assemblies. The measured reference and CTAT voltages are shown in Figure 5. 20 and Figure 5. 21, respectively, across the full range of the temperature sensor. It is seen that the reference voltage is flat, and the measured output voltages are in the expected range based upon Monte-Carlo simulations. The CTAT voltages shows a linear characteristic across the range of operation. The measured precision of the reference voltage generator is 17.3 and 11.9 ppm/°C for chip 1 and chip 2, respectively; this is within the expected range of results based upon the Monte-Carlo simulations. The measured TC for the CTAT generator is 0.87 and -1.33 mV/0C, respectively, for chip 1 and chip 2, which compares well to the simulated TC of -1.7 mV/°C and is also within the predicted range base upon MonteCarlo simulations. The temperature can be read according to the following equation: 𝑇𝑅𝐸𝐹 𝑇𝐶𝑇𝐴𝑇 = 𝑉𝐷𝐷 −𝑉𝐶𝑇𝐴𝑇 𝑉𝐷𝐷 −𝑉𝑅𝐸𝐹 , (5. 4) where TREF and TCTAT are the duty cycles of the output pulses for the reference and CTAT pulses in the output PWM waveform, respectively. VCTAT can be solved for from Equation 149 5.4, and then the measured temperature can be found using the known linear relationship of VCTAT with respect to temperature that is found by the aforementioned two-point calibration. In order to measure the temperature, the duty cycles are monitored over five consecutive periods and averaged; the sample rate varies with time due to the clock frequency variation of the ring oscillator variation. The simulated average sample rate varies with temperature from 200/sec to 950/sec at -20 °C and 80 °C, respectively. This provides more than adequate temporal resolution across the operating range of the sensor. The temperature sensors accuracy is measured across the range of temperatures from -20-80 °C. The error is plotted versus temperature in Figure 5. 22. The measured temperature inaccuracy ranges between -2.5/+1°C and -0.45/+0.5°C for chips 1 and 2, respectively. The relative inaccuracy of chip 1, compared to chip 2, is attributed to the reduced sensitivity of the CTAT generator on chip 1, but both chips fall within the predicted expectations based upon the Monte Carlo simulations. Figure 5. 23 shows the average error for the two test chips for minimum, nominal, and maximum temperature of -20, 30, and 80 °C across the supply voltage range of 0.45V-0.9V. The error is minimal at midvoltage range of 0.6-0.75V and increases at both low and high temperature due to the degradation of the reference generator at low temperature and the increased operating frequency at high temperature. The power breakdown per block could not be measured due to lack of individual supplies for each block, but the total power consumption measured is in close agreement to the extracted simulated power consumption. Hence, it is reasonable to assume a similar measured power breakdown compared to the simulated power breakdown. The measured 150 reference and CTAT voltages (VREF and VCTAT) do not require any calibration except for the final two-point calibration to determine the slope of CTAT voltage. Unlike most traditional sensors, the temporal resolution in this architecture is variable due to the changing operating frequency across temperature; hence, the conversion time (Tconv) changes across temperature. The maximum and minimum measured sample rate is 21 samples/sec and 280 samples/sec @ -20°C and 80°C, respectively, for a five sample average. The measurement error can be minimized significantly with a higher degree of averaging at the expense of a longer Tconv. For instance, averaging 50 readings instead of 5 reduces the maximum error to -0.35/+0.15°C from -2.5/+1°C. 5.4 Comparison and Summary The proposed sensor is amicable for low power IoT applications due to low minimum operable supply voltage (450 mV), wide operable voltage range (~0.5 V), and extremely low power operation (< 50 nW). But the biggest benefit comes from the fact that the output temperature information is embedded in a pulse-width modulated waveform rather than an absolute analog output voltage. An analog output would require an additional ADC in order to convert to digital output in order to transmit wirelessly using switching/digital PAs. An analog PA would also require a level-shifter, which would add up to the overall power consumption. On the contrary, proposed sensor outputs a digital pulse, which can be directly up-converted using a NAND-gate and can be transmitted using any low power digital PA such as Class-C or Class-D PA. This case is illustrated in Figure 5. 24. The proposed sensor achieves a power consumption as low as 17.6 nW (@-20 0C) and 47.2 nW (@80 0C) and can be operated from supply voltages as 151 low as 450 mV. The maximum measured temperature error is -2.5/+1 0C, with two-point calibration. The output of the sensor is a pulse-width modulated digital signal that can be transmitted wirelessly using low power techniques (e.g., on-off keying (OOK)). The temperature is measured using a ratio of output pulse-widths and hence is less susceptible to process-voltage-temperature (PVT) variations. It is suitable to be powered by energy harvesters. Hence, this architecture is very amicable to low power IoT applications. A comparison of figure-of-merit (FoM) and power consumption is provided in Figure 5. 25. A detailed comparison to state-of-the-art on low-power CMOS temperature sensors is in Table 5. 3. Table 5. 1. CTAT generator using different CMOS processes. Process M1 M2 nch_lvt (Vth ~ 275 mV) nch_25 (Vth ~ 525 mV) (W/L)=120n/20µ (W/L)=14µ/280n zvtnfet (Vth ~ 100 mV) nfet (Vth ~ 300 mV) (W/L)=3µ/20µ (W/L)=12µ/120n zvtnfet (Vth ~ 90 mV) nfet_25 (Vth ~ 580 mV) (W/L)=2.72µ/20µ (W/L)=40µ/320n 65 nm 130 nm 180 nm 152 Table 5. 2. Optimized device sizes for a 65nm, 130nm, and 180nm CMOS. Process 65nm 130nm 180nm M1 M2 M3 M4 nch_na nch_25 nch_na nch_25 3.6μ/80μ 3.65μ/80μ 500n/80μ 100μ/80μ zvtnfet dgnfet zvtnfet dgnfet 10μ/20μ 3.6μ/20μ 2μ/5μ 20μ/40μ zvtnfet nfet_25x zvtnfet nfet_25x 10μ/20μ 8.9μ/20μ 2μ/5μ 10μ/20μ *M5-6 has been omitted in 130nm ass. Table 5. 3. Comparison of this work with state-of-the-art. Attributes Tech. (nm) PDC@ 270C (μW) Area (mm2) Error (°C) Sample rate (s-1) Min. VDD (V) Temp. range (°C) FOM† (pJ) Calibration † This work [82] [83] [84] [86] [87] [89] [101] 65 130 65 90 350 160 700 180 0.0472 1200 360 25 0.110 5.1 187.5 0.071 0.067 0.12 0.0003 0.00005 0.084 0.08 4.5 0.09 -2.5~1 -4~4 -3.4~3.6 -1~0.8 ±0.1 ±0.15 ±0.1 -1.4~1.5 21/280** 5000 20000 N/A 10 188.9 10 33 0.45 1.2 0.6 1 N/A 1.5 2.5 1.2 -20~80 0~10 0 0~100 50~125 35~45 -55~125 -55~125 0~100 2.75 /0.2* 1536 88.2 N/A 4.4 0.075 23.15 1.79 Yes Yes Yes Yes Yes Yes Yes Yes Accuracy FOM = ( 𝐸𝑛𝑒𝑟𝑔𝑦 𝐶𝑜𝑛𝑣𝑒𝑟𝑠𝑖𝑜𝑛 ).( 𝑇𝑜𝑡𝑎𝑙 𝑖𝑛𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 𝑟𝑎𝑛𝑔𝑒 )2 [87][88]; @-200C ; *@800C 153 Figure 5. 1. Block diagram of the proposed temperature sensor. 153 154 Figure 5. 2. CTAT generator: (left) circuit schematic [85] and (right) output voltage versus temperature for different process corners (slow, typical, fast) in a 65nm CMOS. Figure 5. 3. simulation. Linear regression (R2) of the CTAT generator based on Monte-Carlo 155 Figure 5. 4. CTAT generation using the CMOS processes (65 nm, 130 nm and 180 nm) and devices provided in Table 2. 1. The output voltage (VCTAT) is compared to the theoretical value using Equation 2. 1. Figure 5. 5. Reference voltage generator: (left) circuit schematic and (right) Output voltage versus temperature for different process corners (slow, typical, fast) in a 65nm CMOS. 156 Figure 5. 6. Feedback controlled reference voltage generator circuit schematic. Figure 5. 7. Reference voltage: (left) comparison with and without feedback compensation and (right) Monte-Carlo simulation of the feedback compensated reference generator. 157 Figure 5. 8. Corner simulation: (left) bulk-compensated voltage reference in 130nm CMOS and (right) voltage-reference in 180nm CMOS. Figure 5. 9. Monte-Carlo simulation (left) bulk-compensated voltage reference in 130nm CMOS and (right) voltage-reference in 180nm CMOS. Figure 5. 10. Monte-Carlo simulation (left) deviation of VREF across temperature in 130 nm CMOS, and (right) deviation of VREF across temperature in 180 nm CMOS. 158 Figure 5. 11. Sawtooth wave generator circuit. Figure 5. 12. Schematic of the comparator circuit. 159 Figure 5. 13. Schematic of the comparator circuit. Figure 5. 14. Microphotograph of the fabricated temperature sensor in 65 nm CMOS. 160 Figure 5. 15. Transient waveform (extracted simulation) of the temperature sensor. Figure 5. 16. Extracted simulations: (left) temperature coefficient and (right) line sensitivity of the proposed reference voltage generator (VREF). 161 Figure 5. 17. Temperature coefficient of the CTAT from extracted simulations. Figure 5. 18. Extracted simulation results: (left) error versus temperature at different supply voltages and (right) overall power consumption at different supply voltages. 162 5% 10% 3% CTAT VREF 36% 46% Sawtooth Comparator Output stage Figure 5. 19. Power consumption breakdown from the extracted simulation. Figure 5. 20. Comparison of measured and simulated reference voltage (VREF) for multiple samples. 163 Figure 5. 21. Comparison of measured and simulated VCTAT for multiple samples. Figure 5. 22. Measured and simulated temperature error versus temperature at nominal VDD. 164 Figure 5. 23. Measured temperature error at minimum, ambient, and maximum temperature versus VDD. Figure 5. 24. Illustration of modulation and demodulation using ASK. 165 Figure 5. 25. A survey of prior art on CMOS temperature sensors. 166 CHAPTER 6 DISSERTATION SUMMARY Digital transmitters for emerging applications such as IoT, 5G, and enhanced license augmented access have been presented. Some of the fundamental issues related to transmitter design in fine-line CMOS processes have been addressed. New techniques have been discussed to increase efficiency, improve data throughput, enhance linearity, and reduce out-of-band noise. Chapter 2 presents a high power (32 dBm) digital power amplifier, which can operate either in WiFi mode or cellular mode in the unlicensed band of 5-6 GHz. It can increase data throughput and improve service offerings tremendously. Firstly, a high linearity unary segmented switched-capacitor architecture has been presented that enables the DPA to achieve the required resolution of 16 bits. Secondly, compact and symmetrical 4-way and 8-way combiner structures have been presented that are fundamental to designing high power PA with good efficiency using switched-capacitor circuits. Lastly, a new switching scheme has been presented that helps the DPA to achieve good EVM performances without sacrificing other performance metrics. Chapter 3 presents a hybrid dual rate comprised of Sigma-Delta oversampling and Nyquist rate switched-capacitor power amplifier that extends the boundary of maximum resolution achievable in any CMOS process. An SCPA comprised of 4 bits unary array 167 and 2 bits binary array is demonstrated to have an effective resolution of 9 bits. This technique in combination with the unary segmented unary array technique presented in Chapter 4 can greatly improve the achievable resolution, reduce the quantization noise, and push the out-of-band noise further down. Chapter 4 presents a tunable frequency digital power amplifier using a single narrowband switched-capacitor power amplifier, which shows similar output power and efficiency over a wideband frequency range (~1.1 GHz). A programmable capacitor that can be controlled by the serial-peripheral interface controls the resonant frequency of operation. This technique allows a single amplifier to operate at multiple fragmented bands over a wider frequency range, effectively improving data throughput. Chapter 5 presents an ultra-low power (<50 nW) PVT tolerant temperature sensor in 65 nm CMOS that provides the temperature information in a pulse width modulated waveshape. It is very suitable for IoT applications because it can be directly transmitted through simple amplitude modulation and demodulated using envelope detection. Low minimum operable voltage (450 mV) allows it to be powered by energy scavengers. Measurement results show excellent performance with a maximum error -2.5/+1 0C using a supply voltage as low as 450 mV. 168 REFERENCES [1] Nokia, “Unlicensed band opportunities for mobile broadband,” Nokia, white paper, 2016. [online]. Available: http://www.crplatform.nl/Documentation/Docs/15/Nokia_LTE_unlicensed_white _paper.pdf [2] Cisco, “Cisco Visual Networking Index: Forecast and Methodology, 2015-2020,” Cisco, white paper, 2016. [online]. Available: http://www.davidellis.ca/wp-content/uploads/2016/01/cisco-vni-june-2016481360.pdf [3] M. Blanco, “LTE in the Unlicensed Spectrum,” Keysight Technologies, white paper, 2016. [online]. Available: https://www.keysight.com/upload/cmc_upload/All/29March2016WebcastSlides. pdf [4] P. Lancia and J. Sinnott, “Qualcomm and SK Telecom Announce First Enhanced Licensed Assisted Access (eLAA) Over-the-Air Trial,” prnewswire.com. https://www.prnewswire.com/news-releases/qualcomm-and-sk-telecomannounce-first-enhanced-licensed-assisted-access-elaa-over-the-air-trial300336210.html (accessed Mar. 3, 2019). [5] Wireless Telecommunications Bureau, “The Mobile Broadband Spectrum Challenge: International Comparisons,” Federal Communications Commission, white paper, 2013. [online]. Available: https://docs.fcc.gov/public/attachments/DOC-318485A1.pdf [6] Intel, “Alternative LTE Solutions in Unlicensed Spectrum : Overview of LWA , LTE-LAA and Beyond,” Intel, white paper, 2016. [online]. Available: https://www.thailand.intel.com/content/dam/www/public/us/en/documents/whitepapers/unlicensed-lte-paper.pdf [7] R. Karaki et al., “Uplink performance of enhanced licensed assisted access (eLAA) in unlicensed spectrum,” IEEE Wireless Communications and Networking Conference (WCNC), pp. 1–6, 2017. [8] N. Wongkomet, L. Tee, and P. R. Gray, “A 31.5 dBm CMOS RF Doherty Power Amplifier for Wireless Communications,” IEEE J. Solid-State Circuits, vol. 41, no. 12, pp. 2852–2859, 2006. 169 [9] T. Sowlati, C. A. T. Salama, J. Sitch, G. Rabjohn, and D. Smith, “Low Voltage , High Efficiency GaAs Class E Power Amplifiers for Wireless Transmitters,” IEEE J. Solid-State Circuits, vol. 30, no. 10, pp. 1074–1080, 1995. [10] D. Su and W. Mcfarland, “A 2.5-V, 1-W Monolithic CMOS RF Power Amplifier,” IEEE Custom Integrated Circuits Conference, pp. 189–192, 1997. [11] F. H. Raab et al., “Power Amplifiers and Transmitters for RF and Microwave,” IEEE Trans. Microw. Theory Tech., vol. 50, no. 3, pp. 814–826, 2002. [12] S. Chen and J. Zhao, “The Requirements , Challenges , and Technologies for 5G of Terrestrial Mobile Telecommunication,” IEEE Communications Magazine, pp. 36–43, 2014. [13] Y. Li, R. Zhu, D. Prikhodko, and Y. Tkachenko, “LTE Power Amplifier Module Design : Challenges and Trends,” IEEE International Conference on Solid-State and Integrated Circuit Technology, no. 1, pp. 192–195, 2010. [14] J. Choi, D. Kang, D. Kim, J. Park, B. Jin, and B. Kim, “Power Amplifiers and Transmitters for Next Generation Mobile Handsets,” J. Semicond. Technol. Sci., vol. 9, pp. 14–22, 2009. [15] Y. Huang, Y. Chen, Y. T. Hou, W. Lou, and J. H. Reed, “Recent Advances of LTE / WiFi Coexistence in Unlicensed Spectrum,” IEEE Netw., vol. 32, pp. 107–113, 2018. [16] S. Ramakrishnan, “Design of Integrated Full-Duplex Wireless Transceivers,” Ph.D. dissertation, Dept. Elect. Eng. and Comp. Sci., Univ. California, Berkeley, CA, USA, 2016. [17] T. Jiang and Y. Wu, “An Overview : Peak-to-Average Power Ratio Reduction Techniques for OFDM Signals,” IEEE Trans. Broadcast., vol. 54, no. 2, pp. 257– 268, 2008. [18] H. Y. Sakran, M. Shokair, and A. A. Elazm, “An efficient technique for reducing PAPR of OFDM system in the presence of nonlinear high power amplifier,” Prog. Electromagn. Res., vol. 2, pp. 233–241, 2008. [19] Y. Ye, D. Wu, Z. Shu, and Y. Qian, “Overview of LTE Spectrum Sharing Technologies,” IEEE Access, vol. 4, pp. 8105–8115, 2016. [20] H. E. Kwon et al., “Licensed-Assisted Access to Unlicensed Spectrum in LTE Release 13,” IEEE Communications Magazine, vol. 55, pp. 201–207, 2017. [21] A. K. Bairagi, N. H. Tran, S. Member, and W. Saad, “A Game-Theoretic Approach for Fair Coexistence between LTE-U and Wi-Fi Systems,” IEEE Trans. Veh. Technol., vol. 68, no. 1, p. 442-455, 2019. 170 [22] I. Selinis, K. Katsaros, M. Allayioti, S. Vahid, and R. Tafazolli, “The Race to 5G Era ; LTE and Wi-Fi,” IEEE Access, vol. 6, pp. 56598–56636, 2018. [23] M. Maule, “Enabling Fairness and QoS for LTE/Wi-Fi Coexistence in Unlicensed Spectrum,” M.S. thesis, Telecommunication Eng., Tampere University of technology, Tampere, Finland, 2017. [24] S.-M. Yoo, J. S. Walling, E. C. Woo, B. Jann, and D. J. Allstot, “A SwitchedCapacitor RF Power Amplifier,” IEEE J. Solid State Circuits, vol. 46, no. 12, pp. 2977–2987, 2011. [25] Z. Bai et al., “Split-Array, C-2C Switched-Capacitor Power Amplifier,” IEEE J. Solid-State Circuits, vol. 53, no. 6, pp. 1666–1677, 2018. [26] A. D. Multimode and P. Transmitter, “A Digital Multimode Polar Transmitter Supporting 40MHz LTE Carrier Aggregation in 28nm CMOS,” Dig. Tech. Pap. IEEE Int. Solid-State Circuits Conf., pp. 218–220, 2017. [27] C. T. Chen, Y. C. Lin, T. S. Horng, K. C. Peng, and C. J. Li, “Kahn envelope elimination and restoration technique using injection-locked oscillators,” IEEE MTT-S International Microwave Symposium Digest, 2012, pp. 3–5, 2012. [28] S. C. Cripps, RF Power Amplifiers for Wireless Communications, 2nd ed., Boston, MA: Artech House, 2006. [29] J. S. Walling and D. J. Allstot, “Design considerations for supply modulated EER power amplifiers,” IEEE 14th Annu. Wirel. Microw. Technol. Conf. WAMICON no. 1, pp. 2–5, 2013. [30] N.O. Sokal and A.D. Sokal, “‘Class E – A New Class of High-Efficiency Tuned Single-Ended Switching Power Amplifiers,” IEEE J. Solid State Circuits, vol. 10, no. 3, pp. 168–176, 1975. [31] F. H. Raab and N. O. Sokal, “Transistor Power Losses in the Class E Tuned Power Amplifier,” IEEE J. Solid-State Circuits, vol. 13, no. 6, pp. 912–914, 1978. [32] F. H. Raab, “Class-F Power Amplifiers with Maximally Flat Waveforms,” IEEE Trans. Microw. Theory Tech., vol. 45, no. 11, pp. 2007–2012, 1997. [33] J. Ko et al., “A high-efficiency multiband Class-F power amplifier in 0.153µm bulk CMOS for WCDMA/LTE applications,” Dig. Tech. Pap. - IEEE Int. SolidState Circuits Conf., pp. 40–41, 2017. [34] H. Chireix, “High Power Outphasing Modulation,” Proc. Institure Radio Eng., vol. 23, no. 11, pp. 1370–1392, 1935. [35] S. A. El-Hamamsy, “Design of High-Efficiency RF Class-D Power Amplifier,” IEEE Trans. Power Electron., vol. 9, no. 3, pp. 297–308, 1994. 171 [36] Y. Lian and Y. Li, “Improved binary-weighted split-capacitive-array DAC for high-resolution SAR ADCs,” Electron. Lett., vol. 50, no. 17, pp. 1194–1195, 2014. [37] Y. H. Y. Han and D. J. Perreault, “Analysis and Design of High Efficiency Matching Networks,” IEEE Trans. Power Electron., vol. 21, no. 5, pp. 1484–1491, 2006. [38] H. Wang, C. Sideris, and A. Hajimiri, “A CMOS broadband power amplifier with a transformer-based high-order output matching network,” IEEE J. Solid-State Circuits, vol. 45, no. 12, pp. 2709–2722, 2010. [39] I. Aoki, S. D. Kee, D. B. Rutledge, and A. Hajimiri, “Distributed active transformer-a new power-combining and impedance-transformation technique,” IEEE Trans. Microw. Theory Tech., vol. 50, no. 1, pp. 316–331, 2002. [40] H. Xu, Y. Palaskas, A. Ravi, M. Sajadieh, and M. A. El-tanani, “A Flip-ChipPackaged 25.3 dBm Class-D Outphasing Power Amplifier in 32 nm CMOS for WLAN Application,” IEEE J. Solid-State Circuits, vol. 46, no. 7, pp. 1596–1605, 2011. [41] W. Tai et al., “A transformer-combined 31.5 dBm outphasing power amplifier in 45 nm LP CMOS with dynamic power control for back-off power efficiency enhancement,” IEEE J. Solid-State Circuits, vol. 47, no. 7, pp. 1646–1658, 2012. [42] P. Seddighrad, “Digitally-scalable Transformer-combining Power Amplifier Techniques,” Ph.D. dissertation, Dept. Elect. Eng., Univ. of Washington, WA, USA, 2012. [43] W. H. Doherty, “A new high efficiency power amplifier for modulated waves,” Proc. Inst. Radio Eng., vol. 24, no. 9, pp. 1163–1182, 1936. [44] V. Vorapipat, C. S. Levy, and P. M. Asbeck, “A Class-G Voltage-Mode Doherty Power Amplifier,” IEEE J. Solid-State Circuits, vol. 52, no. 12, pp. 3348–3360, 2017. [45] W. Yuan and J. S. Walling, “A multiphase switched capacitor power amplifier,” IEEE J. Solid-State Circuits, vol. 52, no. 5, pp. 1320–1330, 2017. [46] V. Aparin, J. Dunworth, L. Seward, W. Yuan, and J. S. Walling, “A transformer combined quadrature switched capacitor power amplifier in 65nm CMOS,” IEEE Int. NEWCAS Conf. NEWCAS, pp. 135–138, 2016. [47] H. Kobayashi, J. Hinrichs, and P. M. Asbeck, “Current mode Class-D power amplifiers for high efficiency RF applications,” IEEE MTT-S Int. Microw. Symp. Dig., vol. 2, no. 12, pp. 939–942, 2001. [48] T. P. Hung, A. G. Metzger, P. J. Zampardi, M. Iwamoto, and P. M. Asbeck, “Design of high-efficiency current-mode Class-D amplifiers for wireless 172 handsets,” IEEE Trans. Microw. Theory Tech., vol. 53, no. 1, pp. 144–150, 2005. [49] J. A. Weldon et al., “A 1.75-GHz Highly Integrated Narrow-Band CMOS Transmitter With Harmonic-Rejection Mixers,” IEEE J. Solid-State Circuits, vol. 36, no. 12, pp. 2003–2015, 2002. [50] R. Bhat, J. Zhou, and H. Krishnaswamy, “A >1W 2.2GHz Switched-Capacitor Digital Power Amplifier with Wideband Mixed-Domain Multi-Tap FIR Filtering of OOB Noise Floor,” IEEE International Solid-State Circuits Conference (ISSCC), pp. 234–235, 2017. [51] R. Bhat and H. Krishnaswamy, “A Watt-Level 2 . 4 GHz RF I / Q Power DAC Transmitter with Integrated Mixed-Domain FIR Filtering of Quantization Noise in 65 nm CMOS,” IEEE Radio Frequency Integrated Circuits Symposium, pp. 413– 416, 2014. [52] P. Emiliano, P. Filho, M. Ingels, P. Wambacq, and J. Craninckx, “An IncrementalCharge-Based Digital Transmitter With Built-in Filtering,” IEEE J. Solid-State Circuits, vol. 50, no. 12, pp. 3065–3076, 2015. [53] Z. Bai, D. Johnson, A. Azam, A. Saha, W. Yuan, and J. S. Walling, “A 12 bit splitarray switched capacitor power amplifier in 130nm CMOS,” Int. Syst. Chip Conf., pp. 24–28, 2017. [54] Z. Bai, D. Johnson, A. Azam, and J. S. Walling, “A 12 Bit Split-Array Switched Capacitor Power Amplifier in 130nm CMOS,” Proc. of IEEE SOC Conf., pp. 2428, 2016. [55] Z. Bai, W. Yuan, A. Azam, and J. S. Walling, “A split-array, C-2C switchedcapacitor power amplifier in 65nm CMOS,” Digest of Papers - IEEE Radio Frequency Integrated Circuits Symposium, pp. 336–339, 2017. [56] S. Su, T.-I. Tsai, P. K. Sharma, and M. S.-W. Chen, “A 12 bit 1 GS/s Dual-Rate Hybrid DAC With an 8 GS/s Unrolled Pipeline Delta-Sigma Modulator Achieving > 75 dB SFDR Over the Nyquist Band,” IEEE J. Solid-State Circuits, vol. 50, no. 4, pp. 896–907, Apr. 2015. [57] S. Su, S. Member, and M. S. Chen, “A 12-Bit 2 GS/s Dual-Rate Hybrid DAC With Pulse-Error Predistortion and In-Band Noise Cancellation Achieving >74 dBc SFDR and <–80 dBc IM3 up to 1 GHz in 65 nm CMOS,” IEEE J. Solid-State Circuits, vol. 51, no. 12, pp. 2963–2978, 2016. [58] P. A. J. Keyzer, J. Hinrichs, A. Metzger, M. Iwamoto, I. Galton, “Digital generation of RF signals for wireless communications,” IEEE MTT-S International Microwave Symposium Digest, pp. 2127–2130, 2001. [59] J. T. Stauth and S. R. Sanders, “A 2 . 4GHz , 20dBm Class-D PA with Single-Bit Digital Polar Modulation in 90nm CMOS,” IEEE Custom Intergrated Circuits 173 Conference (CICC), pp. 737–740, 2008. [60] L. Ye, J. Chen, L. Kong, E. Alon, and A. M. Niknejad, “Design considerations for a direct digitally modulated WLAN transmitter with integrated phase path and dynamic impedance modulation,” IEEE J. Solid-State Circuits, vol. 48, no. 12, pp. 3160–3177, 2013. [61] Y. Tan and H. Xu, “CMOS power amplifier design for wireless connectivity applications: a highly linear WLAN power amplifier in advanced SoC CMOS,” RF and mm-Wave Power Generation in Silicon, pp. 61–88, 2016. [62] M. A. McHenry, “NSF spectrum occupancy measurements project summary,” 2005. [online]. Available: https://www.bibsonomy.org/bibtex/2729bde91d1d6eef5e550086697f4ad77/chsiv ic [63] W. Ye, K. Ma, and K. S. Yeo, “A 2-to-6GHz Class-AB power amplifier with 28.4% PAE in 65nm CMOS supporting 256QAM,” Dig. Tech. Pap. - IEEE Int. Solid-State Circuits Conf., vol. 58, pp. 38–39, 2015. [64] J. Park, Y. Wang, S. Pellerano, C. Hull, and H. Wang, “A 24dBm 2-to-4.3GHz Wideband Digital Power Amplifier with Built-In AM-PM Distortion SelfCompensation,” Dig. Tech. Pap. - IEEE Int. Solid-State Circuits Conf., pp. 230– 232, 2017. [65] H. Wang, C. Sideris, and A. Hajimiri, “A 5.2-to-13GHz class-AB CMOS power amplifier with a 25.2dBm peak output power at 21.6% PAE,” Dig. Tech. Pap. IEEE Int. Solid-State Circuits Conf., vol. 53, no. 7, pp. 44–45, 2010. [66] J. C. Kao, P. Chen, P. C. Huang, and H. Wang, “A novel distributed amplifier with high gain, low noise, and high output power in 0.18-μm CMOS Technology,” IEEE Trans. Microw. Theory Tech., vol. 61, no. 4, pp. 1533–1542, 2013. [67] C. Grewing, G. Dev. Center, Infineon Technol. AG, Dusseldorf, and ; K. Winterberg ; S. van Waasen ; M. Friedrich ; G.L. Puma ; A. Wiesbauer ; C. Sandner, “Fully Integrated Distributed Power Amplifier in CMOS Technology, optimized for UWB Transmitters,” Radio Frequency Integrated Circuits Symposium (RFIC), pp. 87–90, 2004. [68] O Kyaw and KW Eccleston, “Class-B Balanced Single-Ended Dual-Fed Distributed Power Amplifier,” International Conference on Microwave and Millimeter Wave Technology, pp. 919–922, 2002. [69] H. W. and H. Hashemi, “A 0.5-6 GHz 25.6 dBm Fully Integrated Digital Power Amplifier in 65-nm CMOS,” IEEE Radio Frequency Integrated Circuits Symposium, pp. 409–412, 2014. [70] P. Chen, J. Kao, P. Huang, and H. Wang, “A Novel Distributed Amplifier with 174 High Gain , Low Noise and High Output Power in 0.18- um CMOS Technology,” IEEE MTT-S International Microwave Symposium, pp. 1–4, 2011. [71] C. Grewing et al., “Fully Integrated Distributed Power Amplifier in CMOS Technology , optimized for UWB Transmitters,” IEEE Radio Frequency Integrated Circuits Symposium, pp. 87–90, 2004. [72] “Datasheet.” [Online]. Available: https://www.qorvo.com/products/controlproducts/programmable-capacitor-arrays. [73] “Datasheet.” [Online]. Available: http://wispry.com/solutions/tunable-digitalcapacitor-arrays/. [74] “Datasheet.” [Online]. Available: http://www.ixysic.com/Products/ProgCap.htm. [75] “Datasheet.” [Online]. Available: https://www.psemi.com/products/digitallytunable-capacitors-dtc. [76] J. K. Nai, Y. H. Hsiao, Y. S. Wang, Y. H. Lin, and H. Wang, “A 2.8-6 GHz highefficiency CMOS power amplifier with high-order harmonic matching network,” IEEE MTT-S Int. Microw. Symp. Dig., pp. 4–6, 2016. [77] Z. Bai, W. Yuan, A. Azam, and J. S. Walling, “A split-array, C-2C switchedcapacitor power amplifier in 65nm CMOS,” Digest of Papers - IEEE Radio Frequency Integrated Circuits Symposium, pp. 336–339, 2017. [78] M. Stoopman, S. Keyrouz, H. J. Visser, K. Philips, and W. A. Serdijn, “Co-design of a CMOS rectifier and small loop antenna for highly sensitive RF energy harvesters,” IEEE J. Solid-State Circuits, vol. 49, no. 3, pp. 622–634, 2014. [79] A. Azam, Z. Bai, and J. S. Walling, “A low-cost, dual-band RF loop antenna and energy harvester,” IEEE Topical Conference on Wireless Sensors and Sensor Networks WiSNet, pp. 33–36, 2017. [80] Y. Liu, Y. Zhao, and Y. Zhou, “Lumped dual-frequency impedance transformers for frequency-dependent complex loads,” Prog. Electromagn. Res., vol. 126, pp. 121–138, 2012. [81] S. Keyrouz, H. Visser, and A. Tijhuis, “Multi-band simultaneous radio frequency energy harvesting,” European Conference on Antennas and Propagation (EUCAP), pp. 3058–3061, 2013. [82] D. Ha, K. Woo, S. Meninger, T. Xanthopoulos, E. Crain, and D. Ham, “Timedomain CMOS temperature sensors with dual delay-locked loops for microprocessor thermal monitoring,” IEEE Trans. Very Large Scale Integr. Syst., vol. 20, no. 9, pp. 1590–1601, 2012. [83] T. Yang, S. Kim, P. R. Kinget, and M. Seok, “Compact and Supply-Voltage- 175 Scalable Temperature Sensors for Dense On-Chip Thermal Monitoring,” IEEE J. Solid-State Circuits, vol. 50, no. 11, pp. 2773–2785, 2015. [84] M. Sasaki, M. Ikeda, and K. Asada, “A Temperature Sensor With an Inaccuracy of 1/+0.8 0C Using 90-nm 1-V CMOS for Online Thermal Monitoring of VLSI Circuits,” IEEE Trans. Semicond. Manuf., vol. 21, no. 2, pp. 201–208, 2008. [85] S. Jeong, I. Lee, D. Blaauw, and D. Sylvester, “A 5.8 nW CMOS Wake-Up Timer for Ultra-Low-Power Wireless Applications,” IEEE J. Solid-State Circuits, vol. 50, no. 8, pp. 1754–1763, 2015. [86] A. Vaz et al., “Full passive UHF tag with a temperature sensor suitable for human body temperature monitoring,” IEEE Trans. Circuits Syst. II Express Briefs, vol. 57, no. 2, pp. 95–99, 2010. [87] K. Souri, Y. Chae, and K. Makinwa, “A CMOS temperature sensor with a voltagecalibrated inaccuracy of ±0.15°C (3σ) from -55 to 125°C,” IEEE J. Solid-State Circuits, vol. 48, no. 1, pp. 292–301, 2013. [88] C. Azcona, B. Calvo, N. Medrano, S. Celma, and C. Gimeno, “A 1 . 2-V 1 . 35µW all MOS Temperature Sensor for Wireless Sensor Networks,” IEEE Int. Symp. Circuits Syst., pp. 365–368, 2015. [89] M. A. P. Pertijs, K. A. A. Makinwa, and J. H. Huijsing, “A CMOS smart temperature sensor with a 3σ inaccuracy of ±0.1°C from -55°C to 125°C,” IEEE J. Solid-State Circuits, vol. 40, no. 12, pp. 2805–2815, 2005. [90] K. Sanborn, D. Ma, and V. Ivanov, “A Sub-1-V Low-Noise Bandgap Voltage Reference,” IEEE J. Solid-State Circuits, vol. 42, no. 11, pp. 2466–2481, 2007. [91] K. N. Leung and P. K. T. Mok, “A sub-1-V 15-ppm/°C CMOS bandgap voltage reference without requiring low threshold voltage device,” IEEE J. Solid-State Circuits, vol. 37, no. 4, pp. 526–530, 2002. [92] E. Kuijk, “A precision reference voltage source,” IEEE J. Solid State Circuits, vol. 8, no. 3, pp. 222–226, 1973. [93] A. P. Brokaw, “A Simple Three-Terminal IC Bandgap Reference,” IEEE J. Solid State Circuits, vol. 9, no. 6, pp. 388–393, 1974. [94] I. S. M. Sun et al., “Lateral high-speed bipolar transistors on SOI for RF SoC applications,” IEEE Trans. Electron Devices, vol. 52, no. 7, pp. 1376–1383, 2005. [95] D. Harame et al., “High Performance BiCMOS Process Integration: Trends, Issues, and Future Directions,” Proceedings of the 1997 Bipolar/BiCMOS Circuits and Technology Meeting, pp. 36–43, 1997. [96] L. P. Zhang Jun-an, Li Guangjun, Zhang Rui-tao, Yang Yu-jun, Li Xi, Yan Bo and 176 Fu Dong-bing, “Challenge of High Performance Bandgap Reference Design in Nanoscale CMOS Technology,” Outlook and Challenges of Nano Devices, Sensors, and MEMS, pp. 45–68, 2017. [97] H. J. Visser and R. J. M. Vullers, “RF energy harvesting and transport for wireless sensor network applications: Principles and requirements,” Proceedings of the IEEE, vol. 101, no. 6, pp. 1410–1423, 2013. [98] R. J. M. Vullers and R. Van Schaijk, “Energy Harvesting for Autonomous Wireless Sensor Networks,” IEEE Solid-State Circuits Mag., pp. 29–38, 2010. [99] G. Park, T. Rosing, M. D. Todd, C. R. Farrar, and W. Hodgkiss, “Energy Harvesting for Structural Health Monitoring Sensor Networks,” J. Infrastruct. Syst., vol. 14, no. 1, pp. 64–79, 2008. [100] S. C. and G. D. M. Jacopo Olivo, “Energy Harvesting and Remote Powering for Implantable Biosensors,” IEEE Sens. J., vol. 11, no. 7, pp. 1573–1586, 2011. [101] S. Jeong, Z. Foo, Y. Lee, J. Y. Sim, D. Blaauw, and D. Sylvester, “A fullyintegrated 71 nW CMOS temperature sensor for low power wireless sensor nodes,” IEEE J. Solid-State Circuits, vol. 49, no. 8, pp. 1682–1693, 2014. [102] A. Azam, Z. Bai, D. Korth, and J. S. Walling, “A 0.35V 12.9pW 8.3ppm/0C 0.012%/V Feedback-controlled Voltage Reference in 65 nm CMOS,” IEEE New Circuits Syst. Conf., pp. 3–6, 2018. [103] A. Bendali and Y. Audet, “A 1-V CMOS current reference with temperature and process compensation,” IEEE Trans. Circuits Syst. I Regul. Pap., vol. 54, no. 7, pp. 1424–1429, 2007. [104] C.-H. Lee and H.-J. Park, “All-CMOS temperature independent current reference,” Electron. Lett., vol. 32, no. 14, pp. 1280–1281, 1996. [105] J. Georgiou and C. Toumazou, “A resistorless low current reference circuit for implantable devices,” IEEE Int. Symp. Circuits Syst. ISCAS, pp. 193–196, 2002. [106] G. De Vita and G. Iannaccone, “A 109 nW, 44 ppm/?C CMOS Current Reference with Low Sensitivity to Process Variations,” IEEE International Symposium on Circuits and Systems, pp. 3804–3807, 2007. [107] H. J. Oguey and D. Aebischer, “CMOS current reference without resistance,” IEEE J. Solid-State Circuits, vol. 32, no. 7, pp. 1132–1135, 1997. [108] K. Souri, Y. Chae, F. Thus, and K. Makinwa, “A 0.85V 600nW all-CMOS temperature sensor with an inaccuracy of ±0.4°C (3s) from -40 to 125°C,” Dig. Tech. Pap. - IEEE Int. Solid-State Circuits Conf., vol. 57, pp. 222–223, 2014. [109] A. P. Chandrakasan, N. Verma, and D. C. Daly, “Ultralow-Power Electronics for 177 Biomedical Applications,” Annu. Rev. Biomed. Eng., vol. 10, no. 1, pp. 247–274, 2008. [110] A. Azam, Z. Bai, and J. S. Walling, “An 11.2nW, 0.45V PVT-tolerant Pulse-width Modulated Temperature Sensor in 65 nm CMOS,” IEEE New Circuits Syst. Conf., pp. 117–120, 2018. [111] M. Seok, G. Kim, D. Sylvester, and D. Blaauw, “A 0.5V 2.2pW 2-transistor voltage reference,” Proc. of the IEEE Custom Integrated Circuits Conference, pp. 577–580, 2009.
Reference URL	https://collections.lib.utah.edu/ark:/87278/s6wb15dv