Abstract
This paper explores the use of transformer-coupled (TC) technique for the 2:1 MUX and the 1:2 DEMUX to serialize-and-deserialize (SerDes) high-speed data sequence. The widely used current-mode logic (CML) designs of latch and multiplexer/demultiplexer (MUX/DEMUX) are replaced by the proposed TC approach to allow the more headroom and to lower the power consumption. Through the stacked transformer, the input clock pulls down the differential source voltage of the TC latch and the TC multiplexer core while alternating between the two-phase operations. With the enhanced drain-source voltage, the TC design attracts more drain current with less width-to-length ratio of NMOS than that of the CML counterpart.

The source-offset voltage is decreased so that the supply voltage can be reduced. The lower supply voltage improves the power consumption and facilitates the integration with low voltage supply SerDes interface. The MUX and the DEMUX chips are fabricated in 65-nm standard CMOS process and operate at 0.7-V supply voltage. The chips are measured up to 40-Gb/s with sub-hundred milliwatts power consumption.

Index Terms—CMOS, Current-mode logic, DEMUX, inductive peaking, latch, MUX, SerDes, transformer-coupled technique.

LINTRODUCTION
As development of the next generation wireless and wire-line communications infrastructure moves forward, the demand of vast data traffic for various applications keeps increasing. Widespread social networking applications flourish in current mobile communication and home networks. The demand of internet bandwidth is increasing dramatically. The fourth generation Long Term Evolution-Advanced (LTE-A) is developed to increase uplink and downlink mobile communication speeds. The basic backbone optical network must increase data transmission rates to meet public demand. The high speed interface to support the system requirements with energy efficiency is still one of the critical issues.

In 2008, IEEE 802.3ba standard defined the wire-line trans-mission of 40 gigabit ethernet (40GbE) and 100 gigabit ethernet (100GbE). Fig. 1 shows the serializer deserializer (SerDes) interface that supports both data transmission rate of 40-Gb/s and 100-Gb/s with I/O data rate of 10-Gb/s and 25-Gb/s, respectively. At the transmitter side, the 2:1 MUX doubles the input data rate of \( M_A \) and \( M_B \) with the full-rate clock, of which the clock frequency operates at the input data rate. At the receiver side, the 1:2 DEMUX uses the half-rate clock to halve the input data rate of \( D_{IN} \). Besides, the multi-phase clock further slows down the received data with 1:m DEMUX
m=(4,8,16,…..). In this paper, we focus on data converting for the maximum data transmission rate to realize a 2:1 MUX in the transmitter and a 1:2 DEMUX in the receiver.

The 2:1 MUX and the 1:2 DEMUX use the master-slave master and the master-slave latches to retime the input data. For high-speed data processing, the current-mode-logic (CML) technique is used to realize the time-decision latch and the data-arrangement multiplexer. The two level series-gated CML schematics selects the two-phase operation and cooperates with the series inductor-resistor load to extend the CML bandwidth.

The modern SerDes chip decreases the supply voltage of 1.4-V, 1.2-V, and 1.0-V as the semiconductor process moves towards 90 nm, 65 nm, and 40 nm [9]–[11]. By using the CML approach, the SerDes design faces the issue of boosting voltage offset crossing two level transistors, which creates a difficulty for increasing the headroom to decrease supply voltage. Therefore, a new approach is needed to lower the supply voltage for the SerDes interface and to integrate with the low-voltage digital device. The transformer-coupled (TC) technique has been used in the low noise amplifier (LNA) and the power amplifier (PA) to deliver data signals. The TC technique has been used in the 2:1 multiplexer core to achieve the 60 Gb/s data rate. Please note that, in this paper, the term multiplexer (MUX) refers to multiplexer core plus retimers. For SerDes interface, the retimer circuit is needed in addition to the 2:1 multiplexer core. Moreover, the use of TC technique is exploited for the latch and the multiplexer core for the 2:1 five-latch MUX and the 1:2 five-latch DEMUX. Without the switching transistor, the TC design uses only one level series-gated CML circuit with the clock-coupling transformer.

The stacked transformer couples the secondary inductor current into the differential source voltage instead of passing through the isolation transistor in the TC technique.

The contributions of this paper are summarized as follows.
1) The TC technique is first used to the latch design for high speed data processing and to implement the 2:1 MUX and the 1:2 DEMUX in SerDes.
2) A modified TC circuit is exploited to save one level switching transistor to allow more headroom and to lower supply voltage of the latch and the multiplexer core. The reduced supply voltage lowers the power consumption to sub-hundred milliwatts and facilitates the integration with low supply voltage SerDes interface.
3) The reduced supply voltage does not lower the output swing in this paper. On the contrary, with our modified TC circuit design, the output swing is enhanced.

II. TRANSFORMER-COUPLED TECHNIQUE

The demand for high-speed data transmission is high for the SerDes interface. Fig.2(a) shows the conventional current-mode logic (CML) latch schematics.

The upper-level transistors produce the sensing drain current $I_{D,sense}$ and the holding drain current $I_{D,hold}$ while the lower-level switching transistors select the two-phase operation. The low threshold voltage (LTV) transistors M1-M4 are used to increase the drain current $I_D$ for low propagation delay and the high threshold voltage (HTV) transistors M5-M6 benefits to switch the two phases rapidly. To retime the data sequence, the input clock alternatively switches on the switching transistors of M5 and M6 for the differential-pair drain current $I_{D,Mf}$ and, respectively.

Thus, input data can be restored in one clock cycle. Because data transition results in drain current glitch, transconductance transistors M5-M6 separates data path from interfering clock path. The tail transistor M7 also provides stable tail current to endure data transition glitch.
Fig. 2. (a) Conventional CML latch. (b) Proposed TC latch with clock buffer.

Fig. 2(b) shows transformer-coupled (TC) latch that integrates the TC technique into the CML latch. It consists of the upper-level LVT transistors for two-phase operation and the lower-level secondary winding for two-phase determination. Through the balance transformer, the secondary inductor current $I_s$ is coupled to the TC latch and selects the drain current from the differential transistor $M_1$-$M_2$ and the cross-coupled transistor $M_3$-$M_4$. Therefore, $I_{D,M12}$ and $I_{D,M14}$ are alternatively produced to pull down the differential source voltage, $V_s = V_s^+ - V_s^-$. When $CK$ is logic-high, $I_s$ flows into the source terminal of $M_1$-$M_2$, and $V_s^+$ drops to attract for sensing phase. When $CK$ is logic-low, $I_s$ flows into the source terminal of $M_3$-$M_4$ instead, and $V_s^-$ draws for the holding phase. In the 2:1 MUX, the TC latch cooperates with the clock buffer to deliver input clock power and also isolates interference from the other circuits. Thus, the proposed TC latch saves the switching transistors $M_5$-$M_6$ and tail transistor $M_7$ to contribute small flicker noise and allows more headroom for saving supply power.

To further discuss the two-phase operation, the input signal of data $D$ and clock determines the drain current in the CML latch and the TC latch. In the sensing phase, $M_1$-$M_2$ senses the logic state change and then turns on the output $Q$ at the rising of clock. In the holding phase, $M_3$-$M_4$ holds the sensed logic state at the falling edge of clock until the next rising edge arrives.

### III. ARCHITECTURE AND SCHEMATICS DESIGN

The SerDes interface consists of the transmitter-end MUX and the receiver-end DEMUX. Fig. 4(a) shows the five-latch architecture of the 1:2 DEMUX that includes two parallel cascading the TC latches with the clock buffer. While the prestage latch $L_1$ delays input data for half of clock cycle, $T_{CK}/2$, the falling edge triggered retimer recovers the input data $D_{IN}$ for the output $D_0$. Thus, $D_1$ is recovered by the rising edge triggered retimer and aligns with $D_0$.

Fig. 4. Architecture of (a) 1:2 five-latch DEMUX and (b) 2:1 five-latch MUX.

With the mirrored architecture, Fig. 4(b) shows the 2:1 MUX that includes the data-retiming TC latch with the data-combining TC multiplexer core. The prestage latch $L_6$ ensures that the even-bit $M_A$ delays the odd-bit $M_B$ for $D$. Thus, the multiplexer core can chooses the interleaving data of and to avoid the data transition. Moreover, the clock buffer is chosen to isolate the data path interference between latch and multiplexer core via the clock path. Finally, the distributed data buffer increases the output data bandwidth for driving the following stage or the measurement instrument.

### A. TC Latch

Fig. 5 shows the proposed TC latch schematics. A balance-to-unbalance (balun) transformer transfers the single-ended clock of into the differential source voltage.
In the secondary winding, the equivalent parallel circuit shows the input impedance of \( Z_S = R_S + sL_S \), where \( L_S \) is the secondary self-inductance and \( R_S \) is the inner series resistor. The secondary inductor current decides the direction by alternating \( V_S \), of which the voltage swing is \( \Delta V_S = I_S|Z_S| \). With the grounded center tap, the secondary winding connects to the differential transistors M1-M2 and the cross-coupled transistors M3-M4 via the source terminal \( V_S \). All NMOS consist of the same size of width-to-length ratio \( W/L \), of which \( W \) is total channel width and \( L \) is channel length.

However, the differential transistors take advantage of the low threshold voltage of \( V_{TH} = 0.14v \) and the enhanced drain-source voltage of \( V_DS \) to increase the drain current \( I_D \) so that one turn-on transistor drawing \( I_{D,dyne} \) follows another turn-off transistor flowing \( I_{D,leak} \). Assume \( I_{D,leak} = 0 \) to simplify the drain current calculation. We carefully calibrate the stacked transformer for \( R_s < 0.05R \) in case of over-compressing. While is chosen at 50\( \Omega \) for output swing, the inner resistance of \( R_S = 0.89 \Omega \) is included in the secondary winding cluded in the secondary winding. With the inductive peaking technique, the load inductor and connection inductor increase the 3-dB bandwidth for high-speed data processing.

B. TC Latch with Clock Buffer
The 2:1 five-latch MUX adopts the TC multiplexer core and the TC latch to deal with the high data rate. Compared with the 1:2 DEMUX, the isolation between the data path and the clock path is essential. Fig. 6 shows the TC latch cascading down an active clock buffer for isolating the data path interference. To split input clock, the 5-port balun transformer transfers the single-ended input for clock buffer. Through the 6-port balance transformer, the clock buffer uses the primary winding as the inductive load \( L_P \) with the powered center tap. Similar to the TC latch, the secondary winding also couples \( I_S \) into and conducts to the grounded center tap. For high input impedance of \( R_{IN} \), the amplifying transistors M5-M6 plays an isolation role to avoid the output SELthe high-speed multiplexer core interfering the output Q of the time-decision TC latch L6-L10.

C. TC Multiplexer Core
The multiplexer core has the same schematics as the latch except for the differential transistor M3-M4. Fig. 7 shows the 2:1 transformer-coupled multiplexer core schematics. It consists of the upper-level differential transistors for two-phased data operation and the lower-level secondary inductor for this secondary inductor current. Between them, the isolation transistors M5-M6 provide a tunable active resistor \( 1/gm \) with the bias voltage and protect input clock path from data transition glitch.
We remove the isolation transistors to modify the TC multiplexer core. Fig. 8 shows the modified TC multiplexer core with the single-to-differential balun transformer. Without the isolation transistor, the source offset voltage lifts only, which is lower than the multiplexer core. Therefore, the output swing can be increased even if the reduced supply voltage saves the power consumption.

VI. CONCLUSION

In this work, we explore the use of transformer-coupled (TC) technique to implement the latch and the multiplexer core in SerDes interface for lowering the power consumption. The stacked transformer couples input clock into TC circuit by replacing the voltage-switching transistor and tail transistor in CML counterparts. In the two-phase operation, the transformer-coupled current alternatively pulls down the differential source voltage of the TC designs. With enhanced drain-to-source voltage, the TC approach produces larger drain current per unit of width-to-length ratio than that of CML counterparts at high data rate. Thus, the large output swing fights against the reduced supply voltage for saving power consumption. The 2:1 MUX and the 1:2 DEMUX chips are fabricated in TSMC 65-nm CMOS process to verify the proposed TC based designs.

REFERENCES


AUTHOR’S PROFILE

B. Pardhasaradhi
Received B.TECH Degree in (ECE). Currently he is doing M.TECH (VLSI Design) in SHREE Institute of technical education TIRUPATI, AP, INDIA. His General Area of Interest is Digital design, Testing.

Mr. Parasad Valluru, M.TECH., M.I.M.E.S., He received his Master of Technology degree from JNTUA. Currently working as HOD & Associate Professor in ECE department of SHREE Institute of Technical Education, affiliated to JNTUA, TIRUPATI, A.P. India. He has 11.5 years teaching experience in the stream of engineering education. He has 16 Published in International Journals and National Conferences and also 1 International conference. He has attended many workshops, FDP and Seminars. He has been awarded as „Academic Excellence” in 2011 in SVPCE. His research areas are Low Power VLSI, Digital IC Design, Signal processing, and image processing and communications systems.