# Metadata of the Book that will be visualized online

| Book Title           | Low Power Networks    | -on-Chip                                            |
|----------------------|-----------------------|-----------------------------------------------------|
| Book SubTitle        |                       |                                                     |
| Copyright Year       | 2011                  |                                                     |
| Copyright Holder     | Springer Science + Bu | siness Media, LLC                                   |
| Corresponding Author | Family Name           | Tam                                                 |
|                      | Particle              |                                                     |
|                      | Given Name            | Sai-Wang                                            |
|                      | Suffix                |                                                     |
|                      | Division              | Electrical Engineering Department                   |
|                      | Organization          | University of California, Los Angeles               |
|                      | Address               | Engineering IV Building, CA 90095, Los Angeles, USA |
|                      | Email                 | roccotam@ee.ucla.edu                                |
| Author               | Family Name           | Socher                                              |
|                      | Particle              |                                                     |
|                      | Given Name            | Eran                                                |
|                      | Suffix                |                                                     |
|                      | Division              | Electrical Engineering Department                   |
|                      | Organization          | University of California, Los Angeles               |
|                      | Address               | Engineering IV Building, CA 90095, Los Angeles, USA |
|                      | Email                 | roccotam@ee.ucla.edu                                |
| Author               | Family Name           | Chang                                               |
|                      | Particle              |                                                     |
|                      | Given Name            | MC. Frank                                           |
|                      | Suffix                |                                                     |
|                      | Division              | Electrical Engineering Department                   |
|                      | Organization          | University of California, Los Angeles               |
|                      | Address               | Engineering IV Building, CA 90095, Los Angeles, USA |
|                      | Email                 | roccotam@ee.ucla.edu                                |
| Author               | Family Name           | Cong                                                |
|                      | Particle              |                                                     |
|                      | Given Name            | Jason                                               |
|                      | Suffix                |                                                     |
|                      | Division              | Electrical Engineering Department                   |
|                      | Organization          | University of California, Los Angeles               |
|                      | Address               | Engineering IV Building, CA 90095, Los Angeles, USA |
|                      | Email                 | roccotam@ee.ucla.edu                                |
| Author               | Family Name           | Reinman                                             |
|                      | Particle              |                                                     |
|                      | Given Name            | Glenn D                                             |
|                      | Suffix                |                                                     |
|                      | Division              | Electrical Engineering Department                   |
|                      | Organization          | University of California, Los Angeles               |

AddressEngineering IV Building, CA 90095, Los Angeles, USAEmailroccotam@ee.ucla.edu

Author's Proof

# Chapter 10 RF-Interconnect for Future Network-On-Chip

Sai-Wang Tam, Eran Socher, M. -C. Frank Chang, Jason Cong, and Glenn D Reinman

**Abstract** In the era of the nanometer CMOS technology, due to stringent system 5 requirements in power and performance, microprocessor manufacturers are relying 6 more on chip multi-processor (CMP) designs. CMPs partition silicon real estate 7 among a number of processor cores and on-chip caches, and these components are 8 connected via an on-chip interconnection network (Network-on-chip). It is projected 9 that communication via NoC is one of the primary limiters to both performance and 10 power consumption. To mitigate such problems, we explore the use of multiband 11 RF-interconnect (RF-I) which can communicate simultaneously through multiple 12 frequency bands with low power signal transmission and reconfigurable bandwidth. 13 At the same time, we investigate the CMOS mixed-signal circuit implementation 14 challenges for improving the RF-I signaling integrity and efficiency. Furthermore, 15 we propose a micro-architectural framework that can be used to facilitate the exploration of scalable low power NoC architectures based on physical planning and 17 prototyping. 18

### **10.1 Introduction**

19

1

2

3

4

In the era of the nanometer CMOS technology, due to stringent system require- 20 ments in power and performance, processor manufacturers are relying more on chip 21 multi-processor (CMP) designs instead of single-core design with high clocking 22 frequency and deep pipelining architecture. Recent studies [1] also project that het- 23 erogeneous many-core designs with massive parallel data processing and distributed 24 caches will be the dominant mobile system architecture to satisfy future application 25 needs. However, many-core computation requires the partitioning of silicon real es- 26 tate among a large number of processor cores and memory caches. As a result, the 27

AQ1 S.-W. Tam (⊠)

C. Silvano et al. (eds.), *Low Power Networks-on-Chip*, DOI 10.1007/978-1-4419-6911-8\_10, © Springer Science+Business Media, LLC 2011

Electrical Engineering Department, University of California, Los Angeles, Engineering IV Building, Los Angeles, CA 90095, USA e-mail: roccotam@ee.ucla.edu

S.-W. Tam et al.

power consumption and communication latencies observed among large numbers of 28 cores will vastly impact the overall system performance. One commonly suggested 29 communication scheme is to connect them through the NoC and send data using 30 package switching [2, 3]. Recent NoC design efforts include Intel's 80-core design 31 [4] on a single chip and Tilera's 64-core microprocessor [5], where processing cores 32 are homogenous in both designs. 33

The future trend for NoCs, however, will be heterogeneous in nature. We ex- 34 pect that some cores will be general purpose processors running at moderate clock 35 rates with normal supply voltages for achieving higher data processing rates, while 36 others will be application-specific processors running at near/sub-V<sub>th</sub> modes with 37 much lower clock rates and lower supply voltages. For such heterogeneous many- 38 core systems, the on-chip interconnect network has been projected as the primary 39 performance bottleneck [6–9] to the nanometer processor in terms of power and 40 latency. We advocate the use of reconfigurable interconnect as a means of provid- 41 ing power-efficient adaptation of the interconnect among various components in a 42 heterogeneous many-core design.

In particular, we propose the use of low power multiband RF-interconnect (RF-I) 44 that can concurrently communicate via multiple frequency bands using shared trans- 45 mission lines to provide effective speed-of-light signal transmission, low power 46 operation, and reconfigurable bandwidth. Effectively, RF-I provides a flexible set 47 of low-latency communication channels that can be adaptively configured to the 48 bandwidth demands of a particular architecture – providing a number of concurrent 49 virtual communication channels out of a shared physical transmission media, such 50 as on-chip transmission lines. We also investigate the CMOS mixed-signal circuit 51 design challenges to bring RF-I to fruition and offer physical design examples to 52 ensure RF-I's signaling integrity and efficiency. Furthermore, we propose a micro- 53 architectural exploration framework that can be used to facilitate the exploration of 54 scalable architectures based on physical planning and prototyping, particularly for a 55 large number of processing cores. Our previous work has considered an architecture 56 that combines a mesh topology implemented with conventional interconnect that is 57 overlaid with a RF-I transmission line bundle. The RF-I acts like a reconfigurable 58 superhighway, providing flexible, accelerated communication channels for criti- 59 cal/sensitive communications. The conventional interconnect acts as a more general 60 set of surface streets that extend communications to all components on a chip. 61

### **10.2 Interconnect Problem in Future Information Processor** 62

The contemporary solution to building many-core on-chip interconnects is the use 63 of CMOS repeaters. However, despite improvements in transistor speed from one 64 technology generation to the next, wire resistance and capacitance scale poorly 65 [9, 10], if at all. Figures 10.1 and 10.2 project the performance of a 2-cm on-chip 66 repeater buffer link (i.e., a modern  $2 \times 2 \text{ cm}^2$  CMP inter-core interconnect) from 67 130 nm to 16 nm CMOS technology. These figures demonstrate that the link delay 68



#### 10 RF-Interconnect for Future Network-On-Chip

will grow worse with shrinking feature sizes, and the scaling of energy per bit will 69 be saturated at about 10 pJ/bit. One possible solution is to use low-voltage swing 70 interconnects [11–13], which inevitably require a power-hungry equalizer due to 71 the severe dispersive channel characteristics of the on-chip wire across base-band 72 frequencies. The signal bandwidth of the existing RC repeater buffer operates at not 73 more than 5 GHz in the foreseeable future, which is primarily due to severe thermal 74 constraints.

As shown in Fig. 10.3, an RC repeater buffer only utilizes less than 2% of the 76 maximum available bandwidth, set by the cutoff frequency  $f_T$  of CMOS, which is 77 240 GHz in 45 nm CMOS today and will eventually reach 600 GHz in 16 nm CMOS 78 according to the ITRS [14]. Owens et al. [7] even predicted that at 22 nm technology, 79 the total network power using repeater buffers will dominate chip-multiprocessor 80 (CMP) power consumption. Consequently, future CMPs using the RC repeater 81 buffer would encounter serious communication congestion and spend most of their 82 time and energy in "talking" instead of "computing". Intel's 80-tile CMP [4] demon-83 strated that their NoC consumed 30% of the total 100 W power consumption for a 84 10 × 8 mesh NoC running at a 4 GHz clock to support the 256 GB bisection band-85 width that is crucial for massive parallel processing. The same CMP design also 86 requires 75 clock cycles in the worst case for a data packet to communicate between 87 two opposite corners of the die. This clearly reveals the need to develop new on-chip 88

BookID 187644\_ChapID 010\_Proof# 1 - 20/07/10

S.-W. Tam et al.



Fig. 10.3 Given data rate of 4 Gbit/s and  $f_T$  at 240 GHz in 45 nm CMOS, the RC repeater buffer only utilize 2% of maximum available bandwidth

interconnect schemes that are both scalable in energy consumption and efficient in 89 inter-core communication. 90

# **10.3 How Can RF Help?**

According to the above analysis, the ideal interconnect-architecture for future 92 computing systems must not only be capable of giving high performance using 93 AQ2 low power but also adaptive according to individual processing core needs. As we 94 have pointed out, the traditional repeated wire does not fulfill such requirements 95 due to its poor performance scaling in general and poor noise immunity. It is also 96 not reconfigurable to perform multicast for network communications without a 97 large overhead and cannot be adapted dynamically to allocate the changing needs 98 of bandwidth. To circumvent the above deficiencies in traditional baseband-only 99 type of interconnect, we propose to use the multiband RF-interconnect for reasons 100 detailed as follows. 101

One of the key benefits of the scaling of CMOS is that the switching speed of 102 the transistor improves over each technology generation. According to ITRS [14], 103  $f_T$  and  $f_{max}$ , will be 600 GHz and 1 THz, respectively, in 16 nm CMOS technology. 104 A new record of a 324 GHz millimeter-wave CMOS oscillator [15] has also been 105 demonstrated in standard digital 90 nm CMOS process. With the advance in CMOS 106 mm-wave circuits, hundreds of gigahertz bandwidth will be available in the near 107 future. In addition, compared with CMOS repeaters charging and discharging the 108 wire, EM waves travel in a guided medium at the speed of light which is about 109 10 ps/mm on silicon substrate. The question here is: how can we use over hundreds 110 of GHz of bandwidth in a future mobile system through RF-I while concurrently 111 achieving ultra-low power operation and dynamic allocation in bandwidth to meet 112 future heterogeneously integrated mobile system needs? 113

91

#### 10 RF-Interconnect for Future Network-On-Chip

One of the possibilities is to use multiband RF-I, based on frequency-divisionmultiple-access algorithms (FDMA) [16–20, 22, 23], to facilitate inter-core communications on-chip. In the past, we have already demonstrated such interconnect schemes both on-chip and 3DIC (i.e., three dimensional integrated circuit) 117 that RF-interconnects can achieve high speed (5–10 Gb/s in  $0.18\mu$ m CMOS), low 118 BER (10<sup>-14</sup> without error correction) [17, 18], seamless re-configurability, and simultaneous, communications between multiple I/O users via multiple frequency 120 bands using shared physical transmission lines. The main advantages of RF-I 121 include: 122

- Superior signal to noise ratio: Since all data streams modulate RF-carriers, which 123 are at least 10 GHz above the baseband, the high speed RF-interconnect does not 124 generate and/or suffer from any baseband switching noise. This reduces possible 125 interference to the sensitive near/sub-V<sub>th</sub> operated circuit.
- High bandwidth: A multiband RF-interconnect link has a much higher aggregate 127 data rate than a single repeater buffer link.
   128
- Low power: Compared to a repeater buffer, a multiband RF-interconnect is able 129 to operate at much better energy per bit in the NoC. Compared to normal repeated 130 wire networks, which consume considerable amounts of power, a few RF-I nodes 131 only consume a very small amount of power (see Sect. 10.4, benchmarked using 132 pJ/bit as a metric).
- Low overhead: High data rate/wire and low area/Gigabit and low latency 134 due to speed-of-light data transmission (see Sect. 10.4, benchmarked using 135 Area/(Gbit/sec) as a metric).
- Re-configurability: Efficient simultaneous communications with adaptive bandwidths via shared on-chip transmission lines.
   138
- Multicast support: Scalable means to communicate from one transmitter to a 139 number of receivers on chip. 140
- Total compatibility and scalability: RF-I is implemented in mainstream digital 141 CMOS technology which can directly benefit from scaling of CMOS.
   142

The concept of RF-I is based on transmission of waves, rather than voltage signaling. When using voltage signaling in conventional RC time constant dominated 144 interconnects, the entire length of the wire has to be charged and discharged to 145 signify either '1' or '0'. In the RF approach, an electromagnetic (EM) wave is con-146 tinuously sent along the wire (treated as a transmission line). Data are modulated 147 onto that carrier wave using amplitude and/or phase changes. One of the sim-148 ple modulation schemes for this application is binary-phase-shift-keying (BPSK) 149 where the binary data changes the phase of the wave between 0° and 180°. By ex-150 panding the idea of the single carrier RF-I, it is possible to improve bandwidth 151 efficiency using *N*-channel multi-carrier RF-I. In multi-carrier RF-I, there are *N* 152 mixers in the Tx. Each mixer up-converts individual base-band data streams into 153 a specific channel. Those *N* distinct channels transmit *N* different data streams 154 onto the same transmission line. The total aggregate data rate ( $R_{\text{Total}}$ ) equals to 155  $R_{\text{Total}} = R_{\text{baseband}} \times N$ , where the data rate of each base-band is  $R_{\text{baseband}}$  and 156

BookID 187644\_ChapID 010\_Proof# 1 - 20/07/10

S.-W. Tam et al.



Fig. 10.5 Exemplary cross-section of the on-chip differential transmission line

the number of channels is N. A conceptual illustration of a six-carrier FDMA 157 RF-interconnect is shown in Fig. 10.4.

Future on-chip RF-I will require on-chip transmission-lines (TLs) that can 159 achieve multiband communication with high aggregate data rates, low latency, low 160 signal loss, low dispersion, and compact Si-area. One particular challenge is to 161 simultaneously support both baseband and RF bands on a single TL without severe 162 inter-channel interference. In this case, two fundamental propagation modes of 163 wave in the TL, odd and even modes, are used to support base band and RF-band, 164 respectively. Since the odd mode and even mode are orthogonal, we design a new 165 type of on-chip transmission line that can support dual-mode wave propagation. The 166 new design combines both differential and coplanar transmission line structure-the 167 cross-section of the TL is illustrated in Fig. 10.5. The top two thick metal layers 168 (M7 and M8) act as a differential signal line to support high frequency RF-band 169 data in the odd mode, while the M5 layer acts as ground plan to support baseband 170 data in the even mode. Verified using EM simulation, a simple side wall between 171 two signal lines can reduce cross-coupling by 10 dB. The latency of such TLs is 172 about 70 ps/cm and the loss is 15 dB/cm, both at 60 GHz. 173

#### 10 RF-Interconnect for Future Network-On-Chip

## 10.4 Expected Performance of RF-I with Scaling

Future CMPs require scalable interconnects to satisfy future needs in communication 175 bandwidth, power budget, and Si-area. In RF-I, the size of passive devices, such 176 as inductors, is the dominant consumer of silicon area. Since the size of a pas- 177 sive device is inversely proportional to the operational frequency, as higher carrier 178 frequencies are used, the size of the passive device can be greatly reduced. At 179 20 GHz, the size of the inductor is approximately  $50 \,\mu\text{m} \times 50 \,\mu\text{m}$ . However, due 180 to wavelength scaling, the size of the inductor at 400 GHz can be as small as 181  $12 \,\mu\text{m} \times 12 \,\mu\text{m}$ , roughly a 20 × reduction in area. As long as the carrier frequency 182 can increase at each new generation of technology, the transceiver area will also 183 scale down. Switching as fast as 300 GHz (i.e., half of the  $f_T$  of 16 nm CMOS [14] 184 to deliver reasonable gain) in future generations of CMOS will allow us to imple-185 ment a large number of high frequency channels for a physical RF-I bus. In each 186 new technology generation, the number of channels available on a single TL can be 187 expected to grow. Nonetheless, the average power consumption per communication 188 band is expected to stay constant (about 4–5 mW as seen in Table 10.1). The logic 189 behind this assumption is that although RF circuits at higher carrier frequencies 190 require more power, this additional power is compensated by the power saved at 191 lower frequency communication bands due to higher transistors available with scal- 192 ing. In addition to more frequency bands, the modulation speed of each frequency 193 carrier will also increase, allowing a higher data rate per band. As a result, the 194 aggregate data rate is expected to increase by about 40% through every CMOS 195 technology generation, as shown in Table 10.1. In addition, the cost of the data 196 rate, in terms of area/(Gb/sec) and the energy consumption per transmitted bit are 197 expected to scale down as well Figs. 10.6 and 10.7. 198

# **10.5** Implementation Examples

10.5.1 On-Chip Multi-Carrier Generation

For the on-chip RF-I illustrated in Fig. 10.4, a multiband synthesizer enables trans- 201 mitting multiple bands of modulated RF signal on transmission lines using FDMA 202

| t1.1         | Table 10.1 | Scaling trend of | RF-I                  |                       |                   |                  |
|--------------|------------|------------------|-----------------------|-----------------------|-------------------|------------------|
| t1.2<br>t1.3 |            | N (D 1           | Data rate<br>per band | Data rate<br>per wire | Energy<br>per bit | Area<br>per Gbit |
| t1.4         | Technology | No. of Bands     | (Gb/s)                | (Gb/s)                | (pJ)              | $(\mu m^2/Gbit)$ |
| t1.5         | 90 nm      | 3RF + 1 BB       | 5                     | 20                    | 1.00              | 1640             |
| t1.6         | 65 nm      | 4RF + 1 BB       | 6                     | 30                    | 0.83              | 1183             |
| t1.7         | 45 nm      | 5RF + 1 BB       | 7                     | 42                    | 0.71              | 810              |
| t1.8         | 32 nm      | 6RF + 1 BB       | 8                     | 56                    | 0.63              | 562              |
| t1.9         | 22 nm      | 7RF + 1 BB       | 9                     | 72                    | 0.56              | 399              |
| t1.10        | 16 nm      | 8RF + 1 BB       | 10                    | 90                    | 0.50              | 325              |

200

199

BookID 187644\_ChapID 010\_Proof# 1 - 20/07/10

this figure will be Printed in b/w

his figure will be Printed in b/w



Fig. 10.6 RF-Interconnect scaling in terms of total energy per bit and total data rate per wire



Fig. 10.7 RF-Interconnect scaling in terms of total data rate per wire and area per Gbps

between the transmitting and receiving units. A wide range on-chip frequency 203 synthesis approach is thus required to enable the simultaneous generation of mul-204 tiple carrier frequencies in the mm-wave range for multiband communications. 205 Traditional approaches to on-chip frequency generation require dedicated VCOs 206 and PLLs to cover multiple bands, thus consuming significant power and area. 207 A new technique for generating multiple mm-wave carrier frequencies is proposed 208 in our previous work [24] using simultaneous sub-harmonic injection locking to a 209 single reference frequency. This concept is illustrated in Fig. 10.8. A master VCO 210 generates a reference carrier at 10 GHz, which is fed into a differential pair. The 211 differential pair generates the odd harmonic of the reference signal from the nonlin-212 earity. The third harmonic of the reference carrier, 30 GHz, is then injected into the 213 respective slave VCOs for them to lock on to the harmonic. The main advantages 214 of this technique are reduction in power consumption, reduction in silicon area, and 215 simpler carrier distribution networks. A prototype of 30 and 50 GHz sub-harmonic 216 injection-locked VCOs was realized in a 90 nm digital CMOS process, as shown 217



10 RF-Interconnect for Future Network-On-Chip

Fig. 10.9 Die photograph of the 30 GHz and 50 GHz sub-harmonic injection VCO

in Fig. 10.9, and able to lock on from the second to eighth harmonics of the reference frequency with locking range reaching 5.6 GHz. Simultaneous locking on to the third and fifth harmonics of a 10 GHz reference signal was also demonstrated, as shown in Fig. 10.10. 221

# 10.5.2 On-Chip RF-Interconnect

In this section, we illustrate the implementation of a simultaneous tri-band on-chip 223 RF-interconnect [25] to demonstrate the feasibility of multiband RF-interconnect 224 for future network-on-chip. In this design, two RF bands in mm-wave frequencies, 225 30 and 50 GHz, are modulated using amplitude-shift keying, while the base-band 226 uses a low swing capacitive coupling technique. Each RF-band and base-band 227

this figure will be Printed in b/w

222



S.-W. Tam et al.

this figure will be Printed in b/w

Fig. 10.10 Output spectrum of the 30 GHz and 50 GHz VCO simultaneously locked with the same reference source at 9.7 GHz

carries 4 and 2 Gb/s, respectively. Three different bands, up to 10Gb/s in total, are 228 transmitted simultaneously across a shared 5 mm on-chip differential transmission 229 line. 230

Like many other communication systems, signal to noise ratio of RF-I must 231 be first estimated before starting on any major system designs such as selecting 232 modulation scheme and designing transceiver architecture. From the SNR, we can 233 estimate the bit error rate of the overall system. There are three types of noise that 234 we should consider in RF-I. The first type of noise source is thermal noise from 235 passive/active device. The second type of noise source is power supply noise. The 236 third type of noise source is inter-channel interference. 237

Thermal noise from passive and active devices is one of the major sources of 238 noise, which has been optimized to be low noise receiver front-end in many com-239 munication systems. RF-I, on the other hand, is not limited by thermal noise. We can 240 simply deduce it from the following simple calculation. 241

We assume the transmitter using amplitude-shift-keying modulation with carrier 242 at 60 GHz has 10% of output efficient, and the total power consumption is 3 mW. 243 The average output power,  $P_{\text{TX}}$ , will be 244

$$P_{\rm TX} = 3 \,\mathrm{mW} \times 10\% \times 0.5 = -8.24 \,\mathrm{dBm}.$$
 (10.1)

Based on full-EM wave simulation (measurement) on transmission line, the av- 245 erage signal attenuation of the on-chip transmission line is 1.5 dB/mm at 60 GHz. 246 Assuming the average length of an on-chip transmission is 1 cm. The total signal 247 loss will be -15 dB. Therefore, the signal power at the receiver front-end will be 248

$$P_{\rm RX} = -8.2 \,\rm dBm - 15 \,\rm dBm = -23.2 \,\rm dBm. \tag{10.2}$$

10 RF-Interconnect for Future Network-On-Chip

Assuming the channel bandwidth is 20 GHz and the noise figure of the receiver, 249 NRX, is 10 dB. With that information, we can calculate the noise power at the 250 receiver in the following: 251

 $P_{\text{noise}} = 4 \text{ KTR BW} + \text{NF} = -174 \text{ dB} + 10 \log(\text{BW}) + \text{NF} = -61 \text{ dBm}.$  (10.3)

After getting the signal power and noise power at the receiver front-end, we are 252 ready to calculate the signal to noise ratio (SNR) 253

$$SNR_{RX} = P_{signal}(dBm) - P_{noise}(dBm) = 37 \, dB.$$
(10.4)

Since the SNR at the receiver front end is 37 dB, the bit error rate of the on-chip 254 RF-I is not limited by the thermal noise. 255

In future CMP, there will be over tens or even hundreds of processing cores in a 256 single die, and noisy digital signal will be easily coupled to sensitivity mixed-signal 257 circuit through the power supply network and the low impedance deep-submicron 258

AQ3 CMOS substrate. Therefore, rejecting digital switching noise becomes one of the 259 most important design considerations. Fortunately, in RF-I, all data streams modu-260 late RF-carriers, which has at least 10 GHz above the baseband, and thus the high 261 speed RF-interconnect does not generate and/or suffer from any baseband switching 262 noise which is usually below 10 GHz. Comparing to conventional on-chip intercon-263 nects technique, low-swing signaling, which is directly suffered from the digital 264 supply noise, RF-I clearly has superior power supply noise rejection. 265

One of the advantages of RF-I is that it provides much higher aggregate data rate 266 than conventional on-chip interconnect by sending multichannels of data simulta-267 neously into one single transmission line. Interference among multiple channels 268 become critical in RF-I design. One particular parameter to quantify channel inter-269 ference is signal to interference ratio (SIR). Assuming the minimum SIR is 20 dB 270 and the modulation scheme is amplitude-shift-keying (ASK). The power spectrum 271 of the ASK is 272

$$P(f) = (A/2)^2 T \operatorname{sinc}^2 \left(\frac{f - f_c}{T}\right) + (A/2)^2 \,\delta(f - f_c). \tag{10.5}$$

From the power spectrum of the ASK, the separation between two adjacent chan-273 nels must be at least  $3 BW_{data}$  to satisfy the 20 dB of SIR, where  $BW_{data}$  is the data 274 rate of the data stream. For instance, the channel separation is 15 GHz for the data 275 rate of 5Gbps in each channel. 276

The schematic of the proposed tri-band RF-I is shown in Fig. 10.11. The modu-277 lation scheme of each RF band is amplitude-shift keying (ASK), in which a pair of 278 on–off switches directly modulates the RF carrier. Unlike other modulation schemes 279 such as BPSK [18, 22, 23], the receiver of the ASK system only detects the changes 280 in amplitude and not phase or frequency variations. Therefore, it operates asynchronously without a power hungry PLL. It also eliminates the need for coherent 282 carrier regeneration at the receiver. Consequently, RF-I does not suffer from carrier 283







Fig. 10.12 Schematic of the transmitter of the RF band

variations between the transmitter and receiver due to process variation. Moreover, 284 RF-I can also operate properly with conventional digital logic circuits placed di- 285 rectly under its passive structure, which gives better area utilization. 286

For each RF band, the design uses a minimal configuration that includes a 287 voltage-controlled oscillator(VCO) and a pair of ASK switches on the transmitter 288 side, as well as a self-mixer and baseband amplifiers on the receiver side. As shown 289 in Fig. 10.12, the VCO generates the RF-carrier and acts as a push-pull amplifier. 290 The RF-carrier from the VCO is first inductively coupled to the ASK modulator 291 through a 2:1 ratio transformer. After that, the input data stream modulates the RF 292 carrier via a pair of ASK switches. In order to maximize the modulation depth of the 293 ASK signal, the size of switches is chosen to provide an optimal balance between 294 the on-state loss and the off-state feed through. After the ASK modulation, the dif-295 ferential ASK signal is inductively coupled to the transmission line(TL) through the 296 second frequency selective transformer. The impedance matching requirement is 297 greatly relaxed because the reflected wave is attenuated significantly in the on-chip 298



10 RF-Interconnect for Future Network-On-Chip

Fig. 10.13 Schematics of the RF receiver



TL after reflection. By choosing the RF-carrier in mm-wave frequencies, the higher 299 carrier to data rate ratio further minimizes the dispersion of the signal and removes 300 the need for a power hungry equalization circuit. The receiver architecture in each 301 RF band is shown in Fig. 10.13. The self-mixer acts as an envelope detector and 302 demodulates the mm-wave ASK signal into a baseband signal, where it is further 303 amplified to a full-swing digital signal. The simulated voltage transfer curve of the 304 self-mixer is plotted in the Fig. 10.14 The ordinary measurement technique on fre-305 quency response in linear circuit, such as small signal AC response, is not applicable 306 to the self-mixer, operating nonlinearly in nature. Figure 10.15 shows the simulated 307 frequency response of the self-mixer by measuring the eye-opening at the output 308 of self-mixer in different input data rate of the ASK signal as high as 10 Gbps. 310 Figure 10.16 shows the transient simulation of the self-mixer running at 5Gbps ASK 311 signal which has carrier at 60 GHz.

The baseband (BB) uses a low-swing interconnect technique using capacitive 313 coupling [12]. As shown in Fig. 10.17, the baseband data is transmitted and received 314 using the common mode of the differential TL. At low frequencies, the transformer 315

BookID 187644\_ChapID 010\_Proof# 1 - 20/07/10

S.-W. Tam et al.

this figure will be Printed in b/w

this figure will be Printed in b/w



Fig. 10.15 Large Signal voltage transfer curve of the self-mixer



**Fig. 10.16** Transient simulation of the self-mixer (*left*) input ASK modulated signal with carrier at 60 GHz and (*right*) the demodulated 10Gbps ASK signal



Fig. 10.17 Equivalent circuit of the base band in common mode

becomes a short circuit, and a pair of low-swing capacitive coupling buffers transmits and receives the baseband data at the center tap of the transformer. 317

The transmitter and the receiver are connected by an on-chip 5-mm long dif-318 ferential TL. In order to support simultaneous multiband RF-I on a shared TL, 319 RF and BB are transmitted in differential mode and common mode, respectively. 320 These two propagation modes are naturally orthogonal to each other and suppress 321 the inter-channel interference (ICI) between RF and BB. Even with finite coupling 322 between differential mode (RF) and the common mode (BB), the low-pass char-323 acteristic of the BB receiver and the band-pass characteristic of the RF receiver 324 can provide further rejection of any possible ICI between RF and BB. The remain-325 ing challenge is the ICI between the different RF channels. In the transmitter, the 326 frequency selectivity of the second transformer in each RF band reduces ICI due to 327

his figure will be Printed in b/w this figure will be Printed in b/w



10 RF-Interconnect for Future Network-On-Chip

Fig. 10.18 Die photograph of the tri-band on-chip RF-I based



Fig. 10.19 Data output of the tri-band waveform 30 GHz, 50 GHz and base band

signal leakage to the adjacent RF band's ASK modulator. In the receiver of the RF 328 bands, the transformer at the input of the self-mixer acts as a band-pass filter. 329

The tri-band on-chip RF-I is implemented in the IBM 90 nm digital CMOS pro-330 cess. The die size is  $1 \text{ mm} \times 2 \text{ mm}$ , as shown in Fig. 10.18. Figure 10.19 shows 331 the recovered data waveform of the three bands: 30 GHz, 50 GHz and BB. The 332 maximum data rates for each RF band and BB are 4 Gb/s and 2 Gb/s, respectively. 333 The total aggregate data rate is 10 Gb/s. A RF-I with TX and TL only was also 334 implemented for measuring the spectrum of the tri-band RF-I signals. A 67-GS Cascade Micro-Probe directly probes the on-chip differential TL (only differential mode can be measured). Figure 10.20a shows the free running VCO spectrum with-337 out input data modulation on both RF bands at 28.8 and 49.5 GHz respectively. 338 When the two uncorrelated 4 Gb/s random data streams are applied to both RF 339 bands, as shown in Fig. 10.20b, the spectrum of each band broadens and spreads 340 over 10 GHz of bandwidth. The tri-band RF-I achieves superior aggregate data rate 341 (10 Gb/s), latency (6ps/mm), and energy per bit 0.45 pJ bit and 0.625 pJ/bit/mm, for 342 RF BB, respectively, which is summarized in Table 10.2.



Fig. 10.20 Spectrum on the differential mode RF-I signal with (a) no data input (b) with 4 Gbps data input in each band

t2.1 **Table 10.2** Performance summary of the tri-band RF-I. \*VCO power (5 mW) can be shared by all (many tens) parallel RF-I links in NOC and does not burden individual link significantly

| 2  |                                  | Tri-band RF-I                 |
|----|----------------------------------|-------------------------------|
| .3 | Interconnect technique           | RF-I                          |
| .4 | Bands                            | 30 GHz, 50 GHz, base band     |
| .5 | Data rate in RF channel (Gbps)   | 4                             |
| .6 | Data rate in BB channel (Gbps)   | 2                             |
| .7 | Total aggregate data rate (Gbps) | 10                            |
| .8 | BER                              | $10^{-9}$ across all channels |
| .9 | Latency (ps/mm)                  | 6                             |
| 0  | Energy per bit (RF) pJ/bit       | 0.45 (5mm)*                   |
| 1  | Energy per bit (BB) pJ/bit       | 0.63 (5mm)*                   |

# 10.5.3 3D IC RF-Interconnect

One of the current technological trends in CMOS processes is three-dimensional 345 stacking [9, 26], in which several thin tiers of circuitry are stacked vertically to 346 achieve a higher level of integration. Due to vertical integration, the same function-347 ality can be implemented in a smaller chip area, reducing both cost and the distance 348 signals that are required to travel across the chip. Reduced distance decreases both 349 transmission latency and the consumed energy. However, 3D stacking requires verti-350 cal connection between transistor and metal tiers, usually implemented using metal 351 studs that cut through layers of silicon and insulators. Alignment of such direct 352 connection is difficult on a large scale and therefore requires a relatively large con-353 nection area. 354

The use of RF signaling has an advantage over standard voltage signaling for 355 inter-layer communication. Because the signal is modulated on a high-frequency 356 carrier, it does not require a direct connection, and capacitive or inductive coupling 357 is enough for transmission. Figure 10.21 shows a schematic view of a fabricated 3D 358

S.-W. Tam et al.

this figure will be Printed in b/w

344



10 RF-Interconnect for Future Network-On-Chip







integrated circuit demonstrating an RF-interconnect using capacitive coupling, with 359 the photograph of the actual die shown in Fig. 10.22. In this circuit implemented 360 in 180 nm 3D SOI processed provided by the MIT Lincoln Lab [17], an amplitude 361 shift keying (ASK) modulation of a 25 GHz carrier is used, so that recovery of the 362 data requires only an envelope detector. Metal layers in each of the tiers are used to 363 form capacitors with values of tens of femto-farads that are sufficient for effective 364 coupling. This realized RF-interconnect achieves a maximum data rate of 11 Gb/s 365 per wire and a very low bit error rate (BER) of  $10^{-14}$  measured at about 8 Gb/s, 366 as shown in Fig. 10.23. Based on estimation, separation distance between adjacent 367 channel can be as small as 6.5 times of the separation distance between layer for 368 10<sup>-12</sup> BER. For example, in 180 nm 3D SOI process, the separation distance be- 369 tween two layer is  $3 \,\mu m$  and the separation distance between adjacent channel in 370 capacitive coupling interconnect is about 20 µm which is only about two-third of 371 the inductive coupling interconnect [21]. Therefore, the use of small capacitors for 372 coupling has an advantage over on-chip inductors or antennas due to the better field 373 confinement that reduces cross-talk and interference between differential links. 374





Fig. 10.23 Measurement result of the 3D RF-Interconnect at 11 Gbps with 2<sup>15</sup>-1 PRBS

# 10.6 Impact of RF-I in Future SoC/NoC Architecture

While RF-I has dramatic potential in terms of low-latency, low-power, high-band-376 width operation, the key enabling component of RF-I for future microprocessor 377 architectural design is reconfigurability. As an example of this reconfigurability, 378 we recently proposed MORFIC (mesh overlaid with RF inter-connect) [19, 27], 379 a hybrid NoC design which is shown in Fig. 10.24. It is composed of a traditional 380 mesh of routers augmented with a shared pool of RF-I that can be configured as 381 shortcuts within the mesh. In this design, we have 64 computing cores, 32 cache 382 memory modules, and four memory output ports – and RF-I is a bundle of transmis-383 sion lines spanning the mesh and features 16 carrier frequencies. We examined four architectures: 385

| <ol> <li>Mesh baseline – a baseline mesh architecture without any RF-I;</li> </ol>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 386        | 6 |  |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|---|--|
| 2. Mesh wire baseline - the baseline mesh architecture with express shortcut                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | s be- 387  | 7 |  |
| tween routers (conventional wire, not RF-I) that are chosen at chip design                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | time 388   | 8 |  |
| (i.e., no adaptability to application variation);                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 389        | 9 |  |
| 3. Mesh static shortcuts - the same express shortcuts as the Mesh Wire Baselin                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | e but 390  | 0 |  |
| using RF-I instead of conventional repeated wire;                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 39         | 1 |  |
| 4. Mesh adaptive shortcuts - the overlaid RF-I with shortcuts tailored to the particular terms of term | artic- 392 | 2 |  |
| ular application in execution.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | 393        | 3 |  |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |            |   |  |
| From the simulation results of our in-house cycle-accurate simulator [28]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | ], we 394  | 4 |  |
| demonstrated a significant performance improvement of the mesh adaptive short-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |            |   |  |
| cuts over the mesh baseline, an average packet latency reduction of 20-25% [19],                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |            |   |  |
| through the reconfigurable RF-I, as shown in Fig. 10.25. We further demonstrated a                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |            |   |  |
| 65% power reduction [27] by reducing the bandwidth of the baseline mesh by 7                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | 5% - 398   | 8 |  |
| reducing the 16 Byte wide to 4 Byte wide baseline mesh, as illustrated in Fig. 10.26.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |            |   |  |
| Our continued exploration of the MORFIC architecture will be instrumental in 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |            |   |  |
| gauging future CMP interconnect design tradeoffs, and in better quantifying what                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |            |   |  |
| benefits CMPs can expect from MORFIC in future generations of CMOS technolo-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |            |   |  |
| gies down the road.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | 403        | 3 |  |

375

this figure will be Printed in b/w

BookID 187644\_ChapID 010\_Proof# 1 - 20/07/10



#### 10 RF-Interconnect for Future Network-On-Chip

Fig. 10.25 Power performance trade-off curve of the mesh adaptive shortcuts over the mesh baseline

# 10.7 Future RF-I Research Direction

404

Before addressing possible future research directions for RF-I, we should compare 405 the performance and the proper communication range for all three types of inter-406 connects, including the traditional parallel repeated wire bus, the RF-I, and the 407 optical interconnect. We first compare the latency, the energy consumption per bit, 408 and the data rate density among them in Fig. 10.27 for the same 2-cm communi-409 cation distance on-chip. The performance of the parallel repeater bus is projected 410 according to the ITRS digital technology roadmap [14] with optimized repeater de-411 sign practice [29]; the RF-I performance is estimated based on the RF-technology 412 roadmap, employing our proposed RF-I design methodology portrayed above. The 413 optical interconnect performance is calculated based on [30] and extrapolated to 414

BookID 187644\_ChapID 010\_Proof# 1 - 20/07/10



Fig. 10.26 Power performance trade-off curve on different baseline mesh bandwidth from 16 byte wide to 4 byte wide baseline mesh



Fig. 10.27 Comparison of Interconnect technologies for a global 2 cm on-chip distance of latency between a traditional repeated parallel bus, RF-I and optical interconnect

further scaled technology nodes. In contrast to the latency increase of the traditional 415 repeater bus against the scaling shown in Fig. 10.28, RF and optical interconnects 416 are able to maintain similarly low latency over the scaling and keep the 2-cm data 417 transmission within a clock cycle. The RF and optical interconnects again show 418 significant benefit in energy consumption over the traditional bus, as shown in 419 Fig. 10.29. The RF-I even scales slightly better than that of optical interconnect 420 in terms of absolute energy per bit. Data rate density is expected to improve in all 421



#### 10 RF-Interconnect for Future Network-On-Chip





**Fig. 10.29** Comparison of Interconnect technologies for a global 2cm on-chip distance of data rate density for a traditional repeated parallel bus, RF-I and optical interconnect

three interconnects: The bus would benefit from the wire pitch; RF-I benefits from 422 the number of carrier bands and the effective transmission speed possible; and the 423 optical data density should improve under the assumption of more wavelengths used 424 [31], although its optical transceiver typically requires non-CMOS devices which 425 are less-scalable due to fundamental physical constraints and often more sensitive 426 to temperature variations. RF-I, on the other hand, has the major advantage of using 427 the standard digital CMOS technology. 428

this figure will be Printed in b/w

this figure will be Printed in b/w

BookID 187644\_ChapID 010\_Proof# 1 - 20/07/10

this figure will be Printed in b/w



Fig. 10.30 RF-I will crossover the energy efficient curve of the RC repeater and become more energy efficient above a 1 mm interconnect distance at a 16 nm CMOS process

Besides the performance, we may also assess the optimized communication 429 range for each of the interconnect technologies. As CMOS continues to scale toward 430 16 nm, traditional on-chip RC repeated wires are more suitable for local intercon- 431 nects with short communication distance due to further increased physical density 432 through the use of minimum-feature-width metal wires [10]. Figure 10.30 illustrates 433 the projected power/performance of both RC wires with optimal delay [14, 29] and 434 RF-I with a 16 nm CMOS process. Under approximately 1mm, the RC repeater is 435 able to provide superior energy efficient communication, but beyond 1 mm, the re- 436 peater buffers become less efficient than those of RF-I. The RF-I is expected to 437 maintain its performance advantages for global interconnect on-chip due to its total 438 compatibility with the CMOS technology, but can it maintain the same superior- 439 ity to an extended distance off-chip? Especially, to what range can it compete with 440 the optical interconnect which is clearly superior for longer-distance communica- 441 tions? We offer the answer to those questions by comparing the energy efficiency 442 between the off-chip RF-I and optical interconnect in Fig. 10.31, where the off-chip 443 RF-I energy-per-bit is estimated with the physical transceiver/transmission line de- 444 signs based on [18], and the optical interconnect results are obtained through the 445 data from [32, 33]. Accordingly, the RF-I actually exhibits better energy efficiency 446 at midrange distances of 30 cm or below. As the communication distance increases, 447 RF-I energy efficiency decreases rapidly due to the excessive power required to 448 compensate for the severe loss from the off-chip printed-circuit-board transmission 449 lines, while the power consumption of optical interconnect remains almost constant. 450 Therefore, despite substantial disadvantages in integration and cost, the optical in- 451 terconnect becomes more beneficial at interconnect distances beyond 30 cm. 452

this figure will be Printed in b/w



10 RF-Interconnect for Future Network-On-Chip





That is to say, in between traditional RC repeater buffer and optical interconnects, 453 there is an obvious technology gap for achieving cost/performance-effective com-454 munications in mid-distance range from a few millimeters to several tens of cen-455 timeters. The CMOS compatible RF-I may be the right technology to fill in such 456 a technology gap, as shown in Fig. 10.32, with the lowest latency, the least energy 457 consumption, and the highest data rate density. 458

However, in order to take full advantage of its potential and to be adopted by 459 the mainstream industry for CMP implementations, we must further advance RF-I 460 circuitry and low power many-core architecture designs in the following areas: 461

• Effective channel allocation scheme to support co-existing RF-band and base 462 band on a shared transmission line. The particular challenges include designing 463 a multiband coupler which is small in area, high in coupling efficient, and yet 464 minimizes the inter-channel interference. 465

S.-W. Tam et al.

- Reliable signaling techniques to provide an interference and noise resilient RF-I 466

   in future RF-I for NoC, highly reliable interconnects are required such that the 467
   bit-error rate (BER) is sufficiently low to maintain reliable computing. The BER 468
   in the current tri-band RF-I design [25] may not meet the future requirements due 469
   to interference from noisy digital circuits and the thermal noise from the active 470
   device.
- Transceiver architecture that can support self-arbitrated and collision-free multi-472 cast communications one of the potential advantages of RF-I is to provide 473 effective broadcasting over the NoC. However, protocol and infrastructure sup-474 porting effective self-arbitration and collision-free multi-casting are not well 475 developed yet.
- Adaptive loading balancing of the NoC through RF-I our current designs reconfigure the NoC at a coarse granularity, leveraging phase locality within the 478 application to amortize the cost of reconfiguration over many cycles. However, 479 further gain may be possible with more dynamic adaptation – such a design 480 will require a mechanism to rapidly arbitrate RF-I frequencies among multiple 481 communicating components and rapidly notify these components of their communicating frequencies.
- Reduction of NoC's memory bandwidth, latency, and power limitations by fully 484 using RF-I in the memory hierarchy while we have considered overlaid RF-I for 485 express channels in a mesh topology, there are further potential gains that can be 486 realized using RF-I as the main communication channel in the NoC. We are ex-487 ploring RF-I-enabled cross-bars and RF-I-based cache partitioning, for example. 488
- Leveraging multi-cast to improve cache coherence, transactional memory, 489 thread-level synchronization, or composable cores our initial efforts have 490 dealt with coarse-grain arbitration for multi-cast masters among a small set of 491 potential senders, but we are also considering a larger scale implementation 492 which can enable more nodes to cooperate as multi-cast senders. Such an im-493 plementation can dramatically improve the performance of more sophisticated 494 cache coherence protocols that require collective communication, transactional 495 memory schemes that require commits to be broadcast to all participating cores, 496 synchronization techniques such as barriers in multithreaded applications, and 497 composable cores where a number of simple cores cooperate together to handle 498 a single sequential thread. In this latter case, RF-I has a dramatic potential to 499 accelerate communication between cooperating cores. 500
- Transmission line base RF-I is difficult to scale more than 1,000-core NoC 501 In the case of over 1,000-core NoC, transmission line needs to span the en-502 tire chip area and requires excessive branching points to connect to local cores. 503 One particle solution is on-chip wireless interconnect, in which frequency band 504 up to sub-terahertz (100 GHz to 500 GHz). Lee [34] proposed a micro wire-505 less interconnect architecture for NoCs with hundreds to thousands of cores 506 which uses a two-tiered hybrid structure, wireless backbone and wired edges, 507 to interconnect thousands of cores in NoCs. This new micro on-chip wireless in-508 terconnect eliminates long wires and reduces latency for long-haul, many-hop, 509 inter-core communication. Moreover, based on simulation result, the latency of 510 such two-tiered hybrid structure is reduced about 20–45%. 511

Acknowledgements The authors would like to thank the US DARPA and GSRC for their contract 512 supports and TAPO/IBM for their foundry service. 513

# References

| 1. S. Borkar, "Thousand Core Chips - A Technology Perspective," Pro-ceedings of the 44th an-      | 515 |
|---------------------------------------------------------------------------------------------------|-----|
| nual conference on design automation, pp. 746–749, 2007                                           | 516 |
| 2. W.J. Dally, B. Towles, "Route Packets, Not wire: On-Chip Inter-connection Networks," Pro-      | 517 |
| ceeding of the 38th Design Automation Conference (DAC), pp. 684–689, 2001                         | 518 |
| 3. L. Benini, G. De Micheli, "Networks on Chips: a new SoC paradigm," IEEE Computer Maga-         | 519 |
| zine, pp. 70–78, Jan, 2002                                                                        | 520 |
| 4. S. Vangal et al., "An 80-Title 1.28 TFLOPS Network-on-Chip in 65nm CMOS," IEEE Interna-        | 521 |
| tional Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, pp. 98–99, 2007        | 522 |
| San Francisco California USA                                                                      | 523 |
| 5 S Bell et al "TILEF64TM Processor: A 64-Core SoC with Mesh Intercon-nect" IEEE Inter-           | 524 |
| national Solid-State Circuits Conference (SSCC) Digest of Technical Papers, pp. 88–89 2008        | 525 |
| San Erangisgo, California, USA                                                                    | 525 |
| 6 P. Kumar V. Zvuban D. Tullson "Interconnections in multi-core architectures: Understanding      | 520 |
| 0. K. Kumat, V. Zyuban, D. Tunsen, Interconnections in multi-core architectures. Onderstanding    | 520 |
| Michanisms, Overheads and Scaling, Proceed-ing of the 32nd International Symposium on             | 520 |
| Computer Architecture, pp. 408–419, June 2005                                                     | 529 |
| /. J.D. Owens, W.J. Dally, K. Ho, D.N. Jayasimna, S.W. Keckler, LS. Pen, Research Challenges      | 530 |
| for On-Chip Interconnection Networks," IEEE MICRO, pp. 96–108, Sept 2007                          | 531 |
| 8. T. Karnik, S. Borkar, "Sub-90nm Technologies-Challenges and Opportunities for CAD," Pro-       | 532 |
| ceedings of International Conference on Com-puter Aided Design, pp. 203–206, November             | 533 |
| 2002                                                                                              | 534 |
| 9. J. Cong, "An Interconnect-Centric Design Flow for Nanometer Technologies," Proc. of the        | 535 |
| IEEE, April 2001, vol. 89, no. 4, pp. 505–528                                                     | 536 |
| 10. R. Ho, K.W. Mai, M. Horowitz, "The future of wires," Proceedings of the IEEE, vol. 89, no. 4, | 537 |
| pp. 490–504, April 2001                                                                           | 538 |
| 11. A.P. Jose, K.L. Shapard, "Distributed Loss-Compensation Technique for Energy-Efficient        | 539 |
| Low-Latency On-Chip Communication," IEEE Journal of Solid State Circuits, vol. 42, no. 6,         | 540 |
| pp. 1415–1424, 2007                                                                               | 541 |
| 12. R. Ho et al., "High Speed and Low Energy Capacitively Driven On-Chip Wires," IEEE Journal     | 542 |
| of Solid State Circuits, vol. 43, no. 1, pp. 52–60, Jan 2008                                      | 543 |
| 13. H. Ito et al., "A 8-Gbps Low Latency Multi-Drop On-Chip Transmission Line Interconnect        | 544 |
| with 1.2mW Two-Way Transceivers," Proceeding of the VLSI Symposium, pp. 136-137,                  | 545 |
| 2007                                                                                              | 546 |
| 14. "International Technology Roadmap for Semiconductors," Semiconductor Industry Associa-        | 547 |
| tion. 2006                                                                                        | 548 |
| 15. D. Huang et al., "Terahertz CMOS Frequency Generator Using Linear Superposition Tech-         | 549 |
| nique," IEEE Journal of Solid State Circuits, vol. 43, no.12, pp. 2730–2738, Dec 2008             | 550 |
| 16. MC.F. Chang et al., "Advanced RF/Baseband Interconnect Schemes for Inter- and Intra-ULSI      | 551 |
| communications," IEEE Transactions on Electron Devices, vol. 52, no. 7, pp. 1271–1285, July       | 552 |
| 2005                                                                                              | 553 |
| 17. O. Gu, Z. Xu, J. Ko, MC.F. Chang, "Two 10Gb/s/pin Low-Power Interconnect Methods for          | 554 |
| 3D ICs" Solid-State Circuits Conference 2007 ISSCC 2007 Direct of Technical Papers                | 555 |
| EFE International no. 448, 614, 11, 15 Eeb 2007.                                                  | 556 |
| 18 L Ko L Kim Z Xu O Gu C Chien M E Chang "An RE/baseband EDMA-interconnect                       | 557 |
| to 5. Ro, 5. Rin, 2. Au, 9. Ou, or Utialo analy, Ant Chang, An River and Data Interconnect        | 559 |
| auite Conference 2005 Direct of Technical Departs ISSCC 2005 IEEE International                   | 550 |
| re 228 602 col 1 10 10 Ech 2005                                                                   | 559 |
| 10 ME Chang I Cong A Kanlan M Naile C Dainman E Saahan S. W. Terre "CMD Naturals                  | 561 |
| 17. W.F. Chang, J. Cong, A. Kapian, W. Ivaik, G. Keininian, E. Socher, SW. Tain, CMP NetWork-     | 501 |
| on-Chip Overlaid with Multi-Band KF-Interconnect, IEEE International Conference on High           | 562 |
| Performance Computer Architecture Sym, pp. 191–202, Feb. 2008                                     | 363 |
|                                                                                                   |     |

514

S.-W. Tam et al.

| 20. M.F. Chang et al., "RF/Wireless Interconnect for Inter- and Intra-chip Com-munications," Proceedings of the IEEE, vol. 89, no. 4, pp. 456–466, April 2001                              | 564<br>565 |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|
| 21. N. Miura, D. Mizoguchi, M. Inoue, K. Niitsu, Y. Nakagawa, M. Tago, M. Fukaishi, T. Sakurai,<br>T. Kuroda, "A 1 Tble 3 W Inductive-Coupling Transceiver for 3D-Stacked Inter-Chip Clock | 566<br>567 |
| and Data Link" IEEE Journal of Solid State Circuits vol 42 no. 1 no. 1111 122 Jan 2007                                                                                                     | 568        |
| 22 R T Chang N Talwalkar C P Vie S S Wong "Near speed-of-light signaling over on-chin                                                                                                      | 569        |
| electrical interconnects," IEEE Journal of Solid-State Circuits, vol. 38, no. 5, pp. 834–838,<br>May 2003                                                                                  | 570<br>571 |
| 23 B A Floyd C-M Hung K K O "Intra-chin wireless interconnect for clock distribution im-                                                                                                   | 572        |
| plemented with integrated antennas, receivers, and transmitters." IEEE Journal of Solid-State                                                                                              | 573        |
| Circuits, vol. 37, no. 5, pp. 543–552, May 2002                                                                                                                                            | 574        |
| 24. SW. Tam et al., "Simultaneous Sub-harmonic Injection-Locked mm-Wave Frequency Gener-                                                                                                   | 575        |
| ators for Multi-band Communications in CMOS," IEEE Radio Frequency Integrated Circuits Symposium, pp. 131–134, 2008                                                                        | 576<br>577 |
| 25. SW. Tam et al., "A Simultaneous Tri-band On-Chip RF-Interconnect for Future Network-on-                                                                                                | 578        |
| Chip," Accepted to be published in VLSI Symposium, 2009                                                                                                                                    | 579        |
| 26. J.A. Burns et al., "A Wafer-Scale 3-D Circuit Integration Technology," IEEE Transactions on                                                                                            | 580        |
| Electron Devices, vol. 53, no. 10, pp. 2507–2516, October 2006                                                                                                                             | 581        |
| 27. M.F. Chang et al., "Power Reduction of CMP Communication Networks via RF-Inter-                                                                                                        | 582        |
| connects," Proceedings of the 41st Annual International Sympo-sium on Microarchitecture                                                                                                    | 583        |
| (MICRO), Lake Como, Italy, pp. $3/6-38/$ , November 2008                                                                                                                                   | 584        |
| 28. J. Cong et al., "MC-Sim: An Efficient Simulation Tool for MPSoC Designs," IEEE/ACM In-                                                                                                 | 282        |
| ternational Conference on Computer-Aided Design, pp. 364–371, 2008                                                                                                                         | 586        |
| 29. J. Rabaey, A. Chandrakasan, B. Nikonc, Digital Integrated Circuits: A Design Perspective,                                                                                              | 500        |
| 2/e, Fleinice Hall, 2005<br>20. N. Kirmen et al. "I averaging Ontigel Technology in Future Pue based Chin Multiprocessors"                                                                 | 580        |
| 30th International Symposium on Microarchitecture, np. 405, 503 December 2006                                                                                                              | 500        |
| 31 M Haurylau G Chen H Chen I Zhang NA Nelson DH Al-honesi E G Friedman                                                                                                                    | 590        |
| PM Fauchet "On-Chin Ontical Interconnect Road-man: Challenges and Critical Directions"                                                                                                     | 592        |
| IEEE Journal of Selected Tonics in Quantum Elec-tronics vol 12 no 6 nn 1699–1705                                                                                                           | 593        |
| Nov. – Dec. 2006                                                                                                                                                                           | 594        |
| 32. H. Cho, P. Kapur, K. Saraswat, "Power comparison between high-speed electrical and optical                                                                                             | 595        |
| interconnects for interchip communication," Journal of Lightwave Technology, vol. 22, no. 9,                                                                                               | 596        |
| pp. 2021–2033, Sep. 2004                                                                                                                                                                   | 597        |
| 33. L. Schares, et.al., "Terabus: Terabit/Second-Class Card-Level Optical Inter-connect Technolo-                                                                                          | 598        |
| gies," IEEE Jour-nal of Selected Topics in Quantum Electronics, vol. 12, no. 5, pp. 1032–1044,                                                                                             | 599        |
| Sept. – Oct. 2006                                                                                                                                                                          | 600        |
| 34. SB. Lee, et.al., "A Scalable Micro Wireless Interconnect Structure for CMPs," ACM MOBI-                                                                                                | 601        |
| COM 2009, pp. 217–228, 20–25 September 2009                                                                                                                                                | 602        |
|                                                                                                                                                                                            |            |
|                                                                                                                                                                                            |            |

Author's Proof

Author's Proof

BookID 187644\_ChapID 010\_Proof# 1 - 20/07/10

# AUTHOR QUERIES

- AQ1. Please check the corresponding author identity
- AQ2. Please check if the edits made to the sentence "According to the above analysis..." convey the intended meaning.
- AQ3. Please check if the edits made to the sentence "In future CMP, there..." is appropriate.
- AQ4. Please check the placement of the figure 10.26.

the second secon