3-4

# A 3.6Gb/s 340mW 16:1 Pipe-Lined Multiplexer using SOI-CMOS Technology

Toru Nakura, Kimio Ueda, Kazuo Kubo\*, Warren Fernandez\*, Yoshio Matsuda and Koichiro Mashiko

System LSI Development Center, \*Information Technology R&D Center, \*ULSI Development Center

Mitsubishi Electric Corp.

4-1, Mizuhara, Itami, Hyogo, Japan

nakura@lsi.melco.co.jp

## Abstract

This paper describes a 16:1 multiplexer (MUX) using a 0.18 $\mu$ m partially-depleted SOI-CMOS technology. Owing to a selector type architecture with a pipeline structure as well as small junction capacitances of SOI-CMOS devices, the MUX achieves 3.6Gbps operation dissipating 340mW at a power supply of 2.0V.

#### Introduction

High-speed multiplexers (MUX) are the key components of optical communication systems, such as Synchronous Optical Network (SONET). Although GaAs or Bipolar devices have been the major players in this field, as CMOS technology advances into deep-submicron region, CMOS devices become promising candidates for giga-hertz operation ICs [1][2]. This enables the integration of giga-hertz operation ICs with other large-scale CMOS logic ICs on a chip, and would satisfy the strong request for low-power dissipation in communication systems.

This paper describes the architecture of a 16:1 MUX using our  $0.18\mu$ m SOI-CMOS technology. The MUX achieves 3.6Gbps operation with 340mW at 2.0V by adopting a selector type architecture with a pipeline structure. This is the fastest CMOS based MUX that has ever been reported.

#### **Circuit Design**

## A. 16:1 MUX Architecture for SOI

The speed and the power advantages of SOI-CMOS devices are much larger than bulk devices when the load capacitances of the circuits are source/drain capacitances [3], because the SOI structure dramatically reduces the capacitances. Therefore, selector type MUX architectures are especially suitable for SOI-CMOS devices, because the main loads of the selector circuits are the source/drain capacitances of the pass-transistors.

Fig. 1 shows the 4:1 selector circuit that we used in our 16:1 MUX. A SPICE simulation indicated that when the SOI-CMOS transistor is assumed to have 1/10 of the junction capacitance of the bulk-CMOS transistor, the SOI-CMOS selector circuit operates 49% faster than the bulk one.



Fig. 1. 4:1 selector circuit.



Fig. 2. Block diagram of conventional 4:1 selector architecture.

Fig. 2 shows the conventional 4:1 selector type architecture, and Fig. 3 shows the configurations of the 1/4 divider and the timing generator circuits. In the 1/4 divider, the divided clock CLK/4, CLKB/4 and the out-

puts to the timing generator have a delay of  $2T_{dff}$  from CLK through the two D-FFs. The outputs S1 to S4 are additionally delayed by  $T_{nor}$  through the timing generator, where  $T_{dff}$  and  $T_{nor}$  are the delays of the D-FF and the NOR gate, respectively. Then S1 to S4 select the data at the 4:1 selector circuit and the selected signal SOUT is latched by CLK. Hence, the delay, from the clock input to the 4:1 selector output, should be within the one clock cycle. This delay limits the maximum operating frequency of the conventional MUX to

$$f_{max1} \le 1/(2T_{dff} + T_{nor} + T_{sel} + T_{setup}) \tag{1}$$

where  $T_{sel}$  is the delay of the 4:1 selector and  $T_{setup}$  is the setup time of the D-FF.



Fig. 3. Configurations of divider and timing generator circuits.

In our proposed configuration, shown in Fig. 4, the D-FFs are inserted between: (1) the 1/4 divider and the timing generator, and (2) the timing generator and the 4:1 selector. This pipeline structure shortens the critical path in the MUX compared to the conventional configuration. The critical path in our MUX design is the timing generator between the D-FFs. Hence, the maximum operating frequency is boosted to

$$f_{max2} \le 1/(T_{dff} + T_{nor} + T_{setup}) \tag{2}$$

In addition to the pipeline structure, the phase of D1''and D2'' is shifted to increase the phase margin between the input D1'' and the select signal S1 in the selector circuit. Several ideas have been proposed to shift the phase, for example, by using additional half latches [4], or by using additional D-FFs with a delayed trigger clock [5]. In our MUX, the phase is shifted only by exchanging the CLK/4 and CLKB/4 for the upper two D-FFs latching D1, D2. Therefore, this method doesn't need any additional circuits. Fig. 5 shows the timing chart of this MUX.



Fig. 4. Block diagram of proposed 4:1 selector architecture.



Fig. 5. Timing charts of (a) conventional and (b) proposed circuits.

By employing these architectures in two stages, a 16:1 MUX was realized as shown in Fig. 6. The low-speed MUXs (left side of Fig. 6) synchronize to CLK/4, while the high-speed MUX (right side of Fig. 6) synchronize to CLK. Since the low-speed MUXs operate at a quarter of the speed of the high-speed MUX, the low-speed MUXs have two different points from the high-speed MUX. The first difference is that the D-FFs before the timing generator were removed to reduce the power dissipation. The

second difference is that all the 16 external input data are latched in the D-FFs by the same timing in order to increase the phase margin between the external input data and CLK/16''. This wider phase margin is one of the important specifications of MUX chips for practical use.



Fig. 6. Block diagram of two step 4:1 selector architecture.

## B. I/O Circuits

Ultra high-speed signals require Pseudo ECL (PECL) I/O buffers matching 50 $\Omega$  transmission lines. The PECL level for a supply voltage of 2.0V are  $V_{OL} \leq 0.3$ V and  $V_{OH} \geq 1.1$ V.

Fig. 7 shows the circuit configurations of the input and the output buffers. The input buffer consists of two stages of NMOS current mirror circuits. The output buffer is just an inverter gate but satisfies the PECL level, because  $V_{OL} = 0V$  and  $V_{OH}$  is decided by the ratio of the PMOS on-resistance and the 50 $\Omega$  termination resistance. When the PMOS on-resistance is adjusted to  $40\Omega$ , the output is 1.1V. The termination resistors are on the chip in the high-speed input buffer (clock input) to satisfy the low reflection of the signals. In the 16 lowspeed input buffers (data input), the termination resistors are attached outside the chip to prevent thermal problems with the chip. The high-speed input buffer needs differential signals, while the low-speed input buffers are single-ended with an external reference voltage  $V_{BB}$ .

## **Measurement Results**

The MUX was designed and fabricated using our  $0.18\mu$ m SOI-CMOS technology. The transistors operate in the partially depleted (PD) mode, and are isolated using shallow trench isolation (STI) technology. The thicknesses of the SOI layer and the buried oxide are 100nm and 400nm, respectively.

A chip micrograph is shown in Fig. 8. The MUX contains about 1500 transistors integrated on a chip size of

4-930813-95-6/99



Fig. 7. Input and output buffer configurations for PECL interface.

 $1.75 \text{mm} \times 1.75 \text{mm}$ . The high-speed circuits were gathered and placed near the PADs to shorten the connecting wires. Several large capacitors ( $\geq 100 \text{pF}$ ) were inserted between Vdd and GND lines in order to suppress the switching noise, especially near the output buffers.



Fig. 8. Chip micrograph of 16:1 MUX (1.75mm×1.75mm).

Measurements were performed on wafer conditions using an RF-coaxial probing card connected to  $50\Omega$  transmission lines. The 16:1 MUX operated up to 3.6Gbps consuming only 30mW without the I/O buffers and 340mW including the I/O buffers, at a 2.0V supply voltage. Fig. 9 shows the 3.6Gbps operating waveforms of the multiplexed output data and the corresponding clock. All the inputs were fixed to "1" or "0". The data output repeated "10111010110010" during this input set.

Fig. 10 shows the supply voltage dependence of

1999 Symposium on VLSI Circuits Digest of Technical Papers

the maximum operating frequency and the corresponding power dissipation. Note that the power consumed by the 50 $\Omega$  termination resistors in the input buffer (*CLK*, *CLKB* input) is not included, but the termination resistors in the oscilloscope (*DOUT*, *CLK*/16 and *CLK* outputs) are included. Even in a low supply voltage of 1.0V, the MUX operated up to 1.2Gbps, while dissipating only 2.2mW without the I/O buffers.

Fig. 11 shows the operating frequency dependence of the power dissipation. The power dissipation slopes in the core and the whole circuits are 8.4mW/Gbps and 50mW/Gbps, respectively.



Fig. 9. Output waveforms of multiplexed output data and corresponding clock.



Fig. 10. Supply voltage dependence of maximum operating frequency and corresponding power dissipation.



Fig. 11. Operation frequency dependence of power dissipation at Vdd = 2.0V.

#### Conclusions

A high-speed and low-power 16:1 MUX has been demonstrated. To take advantage of our  $0.18\mu$ m SOI-CMOS technology, two step 4:1 selector architecture and multiple pipeline architecture were adopted. The MUX achieved only 8.4mW/Gbps and 50mW/Gbps power dissipation without and with the I/O buffers. Furthermore, 3.6Gbps operation at a 2.0V supply voltage was achieved.

These results indicate the possibility of the integration of ultra-high speed circuits and large logic blocks on an SOI chip.

### References

- S. Yasuda, Y. Ohtomo, M. Ino, Y. Kado, and T. Tsuchiya, "3-Gb/s CMOS 1:4 MUX and DEMUX ICs," *IEICE.* Trans. Electron., vol. E78-C, pp. 1746-1753, Dec. 1995.
- [2] M. Kurisu, M. Kaneko, T. Suzaki, A. Tanabe, M. Togo, A. Furukawa, T. Tamura, K. Nakajima, and K. Yoshida, "2.8-Gb/s 176mW Byte-Interleaved and 3.0-Gb/s 118-mW Bit-Interleaved 8:1 Multiplexers with a 0.15μm CMOS Technology," *IEEE. J. Solid-State Circuits*, vol. 31, pp. 2024–2029, Dec. 1996.
- [3] K. Ueda, Y. Wada, T. Hirota, S. Maeda, K. Mashiko, and H. Hamano, "SOI/CMOS Circuit Design for High-Speed Communication LSIs," *IEICE. Trans. Electron*, vol. E80-C, pp. 886-892, July 1997.
- [4] Cheryl L. Stout, and Joey Doernberg, "10-Gb/s Silicon Bipolar 8:1 Multiplexer and 1:8 Demultiplexer," *IEEE. J. Solid-State Circuits*, vol. 28, pp. 339-343, March 1993.
- [5] M. Ouchi, T. Okamura, A. Sawairi, F. Kuniba, K. Matsumoto, T. Tashiro, S. Hatakeyama, and K. Okuyama, "A Si Bipolar 5-Gb/s 8:1 Multiplexer and 4.2-Gb/s 1:8 Demultiplexer," *IEICE. Trans. Electron.*, vol. E75-C, pp. 562-565, April 1992.