# A Single-Channel 10b 1GS/s ADC with 1-cycle Latency using Pipelined Cascaded Folding

Alireza Razzaghi, Sai-Wang Tam, Pejman Kalkhoran, Yu Wang, Chih-Yi Kuan, Brian Nissim, Lan Duy Vu, M.C. Frank Chang

UCLA, High Speed Electronics Lab, 64-127 Engr IV, Los Angeles, CA, 90095, USA

Abstract — A 10b 1GS/s ADC employing a single channel cascaded folding architecture is presented. Conversion speed of 1GS/s is attained by incorporating low-power distributed track-and-hold amplifiers after each folder. This ADC achieves a record 55.6dB peak SNDR and a 64dB peak SFDR and sustains a latency of one clock cycle. DNL and INL at 1GS/s sampling rate are measured 0.4LSB and 1.1LSB. Fabricated in a 0.35 $\mu$ m BiCMOS process, the ADC consumes 2W from a 3.5V supply.

*Index Terms* — ADC, cascaded folding, folding, pipelined, SiGe BiCMOS process

## I. MOTIVATION

Military surveillance, airborne early warning, and target recognition systems are swift decision making applications that require high-speed, low-latency electronics. These systems employ high dynamic range Digital Radar Receivers (DRR) that place the ADC as close to the antenna as possible. This mandates the use of ADC architectures that achieve 8-10 bits of accuracy at GHz conversion rates.

## II. ADC ARCHITECTURE

Time-interleaved schemes lead to high conversion rates [1]-[2]-[3]-[4]. However, they are susceptible to timing misalignments [2]-[4] and limited linearity [1]-[2]-[3] prevalent in CMOS technologies. Moreover, the large latency associated with the conventional pipeline [1]-[3]-[4] and successive approximation architectures [2] hinders the use of such time-interleaved schemes in high dynamic range DRRs.

An ADC is proposed which eliminates the timing misalignment by employing a single channel quantizer as depicted in Fig. 1. Since conversion speed is a premium, a cascaded folding scheme is adopted. Subranging, and multi-step schemes employ some form of decision feedback loop, limiting their ability to be scaled up in conversion speed. The superior matching of the BJTs lends itself to the cascaded folding scheme, therefore, a BiCMOS process is used to implement the ADC.

## III. CIRCUIT DESIGN

An open loop track-and-hold amplifier (THA) with enhanced linearity is designed to meet the wide



Fig. 1: Proposed ADC architecture.

dynamic range DRR specifications. Shown in Fig. 2, this THA is based on the design introduced in [5]. The input signal  $V_{in}$ + is replicated at the collector of the diode-connected transistor  $Q_d$  by experiencing an equal down-shift and up-shift while traversing through the base-emitter junctions of  $Q_{in}$  and  $Q_d$ , respectively. Switched Emitter Follower (SEF) consisting of  $Q_{s1}$ ,  $Q_{s2}$ , and  $Q_{out}$  is adopted as the switch element of the THA due to its superior speed and linearity. Moreover, by employing a constant bias current, SEF does not contribute a signal-dependent charge injection. The value of this bias current is chosen such that the third harmonic component HD<sub>3</sub> of the signal at the output of the SEF is bounded to -75dB at the Nyquist frequency of 500MHz.

The boost in track-mode linearity is created by injecting an assisting current provided by the auxiliary SEF into the hold capacitor  $C_h$  through the feedforward capacitor C<sub>ff</sub>. Note that C<sub>ff</sub> and C<sub>h</sub> sustain the same signal excursion, therefore, the current of  $C_{\rm ff}$ is reused to provide that of the hold capacitor C<sub>h</sub> in the track mode. This significantly reduces the  $V_{\rm BE}$ modulation in Q<sub>out</sub> so that the THA attains the track mode linearity of 9.4b at Nyquist frequency of 500MHz. Q<sub>clp</sub> clamps the voltage of node A in Fig. 2 to a replica of the held signal during the hold mode. This prevents the voltage of node A from falling to an unknown value during hold mode, which otherwise would hinder the turn-on of Qout during hold-to-track transition. As perceived in the schematic of Fig. 2, Qd is turned off during the hold mode and Qclp controls



Fig. 2: THA, half cell.

the voltage of node A. The replica of the held signal, stored across Cff, is shifted up by PMOS transistor Msh to alleviate the down-shift it experiences while traversing through the base-emitter junction of Q<sub>clp</sub>. The value of the up-shift is slightly larger than one  $V_{\rm BE}$ . Therefore, node A is clamped to a voltage which is slightly larger than the held value, faintly forward biasing Qout during the hold mode. This expedites the turn-on of Qout during hold-to-track transition, preventing C<sub>h</sub> from being discharged by Q<sub>s2</sub>. Crossed feedback capacitors C<sub>fb</sub> are devised to further improve the hold mode feedthrough and equalize the commonmode levels at the outputs of THA. These techniques enable the THA to maintain a linearity of 10.9b at low frequencies, with a negligible drop, to 10.5b at the Nyquist frequency of 500MHz.

Shown in Fig. 3, a fully differential reference ladder obviates the need for dc references and eliminates "reference bowing" due to the input bias current of the following array of emitter followers. In the differential reference ladder, the INL requirement sets a lower limit on the number of the taps for a given spread in tap resistance as demonstrated in (1):

$$INL_{\max} = 2^{n-1} \frac{\sqrt{N-1}}{N} \sigma_{\delta R_{R}}$$
(1)

where n is the ADC resolution, N is the number of the taps, and  $\sigma_{\delta R/R}$  denotes the spread in the tap resistance (less than 1%). On the other hand, the distributed nature of the loading places an upper limit on the number of stages that reference ladder can drive. Here, 40 zero-crossings (32 in-range) are tapped from the reference ladder to bound the INL to less than 1LSB. This results in an excessive propagation delay through the reference ladder which is compensated by pipelining the analog core. Padding 4 out-of-range zero-crossings (dummies) on each side of the full scale guarantees that the folding amplifiers exhibit a nonclipping folded characteristics across the entire range of input. This is of crucial importance to the



Fig. 3: Differential reference ladder.

accuracy of the interpolating and averaging operations at the boundaries of the input range. The ratio of tap voltage ( $V_{Tap}$ ) to thermal voltage ( $V_{TH}$ ) determines the INL peak due to interpolation between the hyperbolic tangent transfer characteristics of the emitter-coupled pairs. The current design adopts a tap voltage of 25mV to confine the INL peak to less than 0.5LSB. This results in an LSB voltage of  $V_{LSB} =$  $(2 \times V_{Tap})/IF=1.5625mV$  where IF is the aggregate interpolation factor of 32. The resulting LSB voltage of the comparators ( $1\sigma = 2.7mV$ ) with a moderate aggregate quantizer gain of  $A_v = 5.2$ . This also leads to a differential full scale input voltage of  $2^{10} \times V_{LSB} =$  $1.6V_{p-p}$ .

The folding degrees of 5 and 8 are implemented in the first and second folder, respectively. This choice of the folding degrees optimizes the total power consumption of the analog core by incorporating only 32 and 11 comparators in the fine and coarse quantizers. Shown in Fig. 4, the first folder comprises of five emitter-coupled pairs whose outputs are connected in alternating fashion to two identical load resistors, and whose inputs are connected to appropriately defined reference voltage levels. The INL peak resulting from the mismatches of the tail currents in  $5 \times$  folder can be derived as follows:

$$INL_{\max} = IF\left(\frac{V_{TH}}{V_{Tap}}\right)\sqrt{FF_1 - 1}\sigma_{\mathcal{J}_{I}} \qquad (2)$$

where  $FF_1$  is the folding degree of 5 and  $\sigma_{\delta/l}$  is the spread in the tail current sources of the 5× folder which is less than 1% attained through the superior matching of the degenerated BJT current sources. Equation (2) yields a peak INL of 0.8LSB which is alleviated to less than 0.5LSB through averaging property of the resistive interpolation networks.

Since the second folder has an even degree (8) with folded signals as inputs, the architecture of Fig. 4 for the second folder would result in a highly distorted



Fig. 4: 5× folder.

folded signal which is not useful. Therefore, the folding degree of 8 is realized via cascading three stages of Gilbert-type multipliers [6], illustrated in Fig. 5. The inputs of the multiplier (at ports A and B) must hold a phase shift of 90° in order that the zerocrossings of the product signal are equally spaced. This design exploits this 90° phase shift to operate the emitter-coupled pairs in the multipliers on the verge of the clipping. The resulting benefit is twofold. First, at the vicinity of the zero-crossings, the linearity of the multiplier output is improved since one of the pairs and, hence, its associated nonlinearity, captured by the tanh function, is decommissioned. Second, as one pair is nearly clipped (constant amplitude), the amplitude variations at the multiplier output due to the  $V_{\rm BE}$ mismatches between the transistors in the nonclipped pair are minimized. This improves the integral linearity at the zero-crossings generated by the 8× interpolator following the multiplier trees so that the ADC achieves the record peak SNDR of 55.6dB. As seen in Fig. 1, the differential reference ladder combined with 3× folders, 4× interpolator, and 8× folders also provide the coarse encoder with "windows", each comprising of two phase-shifted folded signals. The coarse encoder utilizes these windows to extract the most significant bits  $(b_0 \ b_1 \ b_2$  $b_3$ ) synchronized with the least significant bits ( $b_4$   $b_5$ b<sub>6</sub> b<sub>7</sub> b<sub>8</sub> b<sub>9</sub>) with the aid of bit-sync b<sub>4</sub>, as illustrated in Fig. 1. The phase shift span in the windows determines the maximum offset between the fine and coarse comparators that can be tolerated. The choices of folding degrees and first interpolation factor result in a  $32V_{LSB}$  offset tolerance between the fine and coarse quantizers which is significantly larger than  $1\sigma$ offset of 2.7mV in the comparators.

When THA enters the hold phase, settling transients at the output of the first folder creates excessive dynamics at the output of the second folder due to its frequency multiplication property. Exacerbated by the loading of the RC network formed by the  $8\times$ interpolation and the input capacitance of the fine comparators, settling time at the output of the second folder becomes prohibitive to attain the conversion speed of 1GS/s. To surpass the settling burden, this design employs pipelining principle [7], by



Fig. 5: Gilbert-type multiplier cell.

incorporating distributed THAs after each folder. This creates an extra 1ns window for the signal transients at the input of the fine comparators to subside so that the signal polarity is correctly detected. The distributed THAs are a simplified version of the front-end THA with a significantly lower power dissipation of 12mW. Interstage distributed THAs also disturb the timing relations among windows derived from the intermediate stages of the analog core assisting the coarse encoder. Therefore, a "Window Synchronizer", shown in Fig. 1, is realized to synchronize these windows with bit-sync  $b_4$ .

Considering the amplified LSB voltage of  $\approx$ 8mV at the comparators input, the comparator must achieve a total input referred offset voltage with  $1\sigma \leq 2.7mV$  at 1 GHz conversion rate so that  $3\sigma \leq 8mV$ . A full CMOS design leads to excessive capacitive loading on the 8× interpolation network and offset, prohibiting its use in this architecture. To avoid excessive capacitive loading on the 8× interpolation network while providing a rail-to-rail swing, a two stage solution is proposed as demonstrated in Fig. 6.



Fig. 6: The first stage bipolar and the second stage CMOS comparator.

An on-chip clock generator is designed to shift the phase of the master clock in eight steps (each 125ps). This is devised to ensure that, under random process variations, pipeline stages sampling occurs after the dynamics of their inputs have subsided.

## IV. EXPERIMENT RESULTS

The ADC is implemented in a  $0.35\mu$ m, four-metal, single-poly SiGe BiCMOS process with a transistor f<sub>t</sub> of 60GHz. The micrograph of the chip is shown in Fig. 7. It occupies an active area of  $4.5 \times 1.2$ mm<sup>2</sup> and consumes 2W of power.



Fig. 7. Chip micrograph.

SNDR and SFDR at 1GHz conversion rate with various input frequencies are plotted in Fig. 8. The SNDR is 8.9 ENOB at low input frequencies. This is the highest ENOB compared to any ADC with similar accuracy and speed to our knowledge. At 100MHZ input frequency, SNDR is 8.3 ENOB and it eventually rolls off to 6.5 ENOB at the Nyquist frequency. The ADC exhibits 64dB SFDR at 5MHz while for a 100MHz input the SFDR measures 60dB. Insufficient linearity of the output buffer of distributed THAs and sampling aperture jitter (1.24ps) contributed mainly toward the SNDR degradation. Fig. 9 demonstrates the output spectrum for 5.00488MHz sinusoidal input (1.6V<sub>p-p</sub>) quantized at 1GS/s. The DNL and INL peak at the sampling rate of 1GS/s are measured 0.4LSB and 1.1LSB, respectively. The DNL and INL profile are also depicted in Fig. 10. This ADC achieves the latency of one clock cycle in the analog core, surpassing those reported in [2]-[3] and [1] with twelve and four cycles of latency, respectively. Table 1 summarizes the performance metrics at 1GS/s conversion rate and 5MHz and 100.46MHz input



Fig. 8: Measured SNDR/SFDR. Fig. 9: Output spectrum.



Fig. 10: INL/DNL.

| PERFORMANCE SUMMARY      |                                               |                              |
|--------------------------|-----------------------------------------------|------------------------------|
|                          | $f_{in} = 5 MHz$                              | $f_{in} = 100.46 \text{MHz}$ |
| Sample Rate, $f_{\rm s}$ | 1 GSample/s                                   |                              |
| Resolution               | 10 bits                                       |                              |
| Latency                  | 1 Clock Cycle                                 |                              |
| Max INL                  | 1.1 LSB                                       | 1.2 LSB                      |
| Max DNL                  | 0.41 LSB                                      | 0.45 LSB                     |
| SNDR                     | 55.6 dB                                       | 51.6 dB                      |
| SFDR                     | 64.1 dB                                       | 59.8 dB                      |
| THD                      | -60.7 dB                                      | -56 dB                       |
| ENOB                     | 8.9                                           | 8.3                          |
| Aperture Jitter          | 1.24 ps@ $f_s = 1$ GS/s & $f_{in} = 200$ MHz  |                              |
| ERBW                     | 100 MHz                                       |                              |
| Input Range              | 1.6 V differential                            |                              |
| Input Termination        | $50 \Omega (100 \Omega \text{ differential})$ |                              |
| Supplies                 | 5.5~V~(THA) and $3.5~V~(ADC)$                 |                              |
| THA Current              | 64 mA                                         |                              |
| Analog Core Current      | 354 mA                                        |                              |
| Digital Current          | 139 mA                                        |                              |
| LVDS Output Drivers      | 140 mA                                        |                              |
| ADC Area                 | 5.3 mm <sup>2</sup>                           |                              |
| Die Area                 | $25 \text{ mm}^2$                             |                              |
| Technology               | 0.35 µm BiCMOS (1-poly, 4-metal)              |                              |

TABLE I PERFORMANCE SUMMARY

frequencies. As seen in this table, the logic circuits and clock buffers consume one forth of the power while the analog core uses 60% of the total power. Using a more advanced BiCMOS process with higher transistor  $f_t$  enables a lower power consumption.

#### REFERENCES

- S. Gupta, M. Choi, M. Inerfield, and J. Wang, "A 1GS/s 11b Time-Interleaved ADC in 0.13 µm CMOS," *ISSCC Dig. Tech. Papers*, pp. 576-577, Feb., 2006.
- [2] S. Louwsma, E. van Tujil, M. Vertregt, and B. Nauta, "A1.35 GS/s, 10b, 175 mW Time-Interleaved AD Converter in 0.13 μm CMOS," *Symp. VLSI Circuits*, pp. 62-63, June, 2007.
- [3] C. Hsu, et al., "An 11b 800MS/s Time-Interleaved ADC with Digital Background Calibration," *ISSCC Dig. Tech. Papers*, pp. 464-465, Feb., 2007.
- [4] K. Poulton, et al., "A 20GS/s 8b ADC with a 1MB Memory in 0.18 μm CMOS," *ISSCC Dig. Tech. Papers*, pp. 318-319, Feb., 2003.
- [5] A. Razzaghi, M. C. F. Chang, "A 10-b, 1-GSample/s track-and-hold amplifier using SiGe BiCMOS technology," in *CICC Dig. Tech.* Papers, Sept. 2003, pp. 433-436.
- [6] P. Vorenkamp, R. Roovers, "A 12-b, 60-MSample/s Cascaded Folding and Interpolating ADC," *IEEE J. Solid-State Circuits*, vol. 32, no. 12, pp. 1876-1886, Dec., 1997.
- [7] M. Choe, B. Song, K. Bacrania, "An 8-b 100-MSample/s CMOS Pipelined Folding ADC," *IEEE J. Solid-State Circuits*, vol. 36, no. 2, pp. 184-194, Feb., 2001.