

# A high-speed, low-power conditional push-pull pulsed latches with split paths technology



M.Tech in VLSI& System Design Department of ECE Anurag Engineering College Kodad, Telangana, INDIA.

Abstract: A 65-nm CMOS technology is used in this paper for introducing a novel class pulsed latches. It is having topology of conditional push-pull pulsed latch and is designed based on two split paths with conditional pulse generator. The pulse generator is the main difference, which can be either shared ( $CSP^{3}L$ ) or not (CP<sup>3</sup>L). Proposed topology outperforms than TGPL The energy efficiency of the and it is very fast. proposed latch is very high when compared to other pulsed latches. Indeed, a  $2.3 \times$  improvement in  $ED^3$ product (energy  $\times$  delay<sup>3</sup>) over TGPL was found for designs targeting minimum ED<sup>3</sup>. The characteristics of the proposed pulsed latches beyond the conventional latches whatever proposed. The main idea is to adopt a push-pull output stage, which is driven by two split paths for rise and fall output transitions, with the explicit aim of reducing both the path effort and the parasitic delay.

*Index terms:* low power, high speed, push-pull latches, TGPL, energydelay.

# **INTRODUCTION**

Energy efficiency of FFs and latches is nowadays even more critical than in the past, considering that speed can be increased only through improvements in energy efficiency, since VLSI systems are power limited. In particular, from moderate to very high performance targets, only very few topologies belong to the Paretooptimal curve of designs having minimum energy for a

Mr.N.Ravikumar, B.Tech, M.E., (PhD), MIETE Associate Professor Department of ECE Anurag Engineering College Kodad, Telangana, INDIA.

given performance [1]-[2]. The transmission gate pulsed latch (TGPL) used in various Intel microprocessors is the most energy-efficient FF in a rather wide portion of the Pareto-optimal curve, ranging from high-speed (i.e., points with minimum  $ED^{j}$  product with j > 1) to energyefficient designs (i.e., points with minimum ED). Only the skew-tolerant FF (STFF) is able to outperform transmission gate flip-flop (TGFF) for extremely highspeed design targets. Although STFF is slightly better than TGPL in terms of pure performance, but its significantly worse energy efficiency does not make it as competitive as TGPL in applications where energy efficiency is a concern. Hence, in the following, TGPL will be adopted as a reference for high-speed energyefficient designs. The traditional TGFF [3] and the recently proposed Toshiba ACFF [4] are, respectively, the most efficient among designs with balanced energydelay (i.e., minimum ED) and ultralow energy designs (i.e., minimum E j D with j > 1).

In addition, the capacitance at the output of the first stage is further reduced by adopting half-latches in the split paths and moving the cross-coupled inverters to the output node. Two versions are presented, respectively, without (CP<sup>3</sup>L) and with (CSP<sup>3</sup>L) shareable conditional pulse generator. Measurements on a 65-nm test chip demonstrate  $1.3 \times -2.3 \times$  better energy efficiency compared to TGPL, as well as  $1.5 \times -2 \times$  D–Q delay improvement even in the presence of process variations. The proposed pulsed latches have a  $1.15 \times -1.35 \times$  larger

Volume No: 2 (2015), Issue No: 7 (July) www.iimetmr.com

July 2015



area than TGPL, with a resulting increase in the area of practical VLSI systems that is well below 1%.

## **OPERATION OF PROPOSED PULSED LATCH**

The push–pull output stage in Fig. 1 is driven by two split paths that generate the active-high R (active-low set <sup>-</sup>S) pulsed signal, which resets (sets) the output when active. Pulses R and <sup>-</sup>S are alternatively generated to enable a fall/rise output transition, respectively.



Fig 1: structural design of the proposed class of pulsed latches.

These pulses are generated at the falling clock edge by the conditional pulse generator in Fig. 3, and are transferred to the output stage by either the half latch M1–M3 or M4–M6, depending on whether input D is, respectively, low or high (see below for detailed description of pulse waveforms). These half latches in the first stage within the D–Q critical path have less parasitics compared to typical clocked inverters or inverters with cascaded transmission gate. The input D drives two different paths, respectively, through an nMOS (M5) and a pMOS (M2) transistor, which is equivalent to the load of a traditional input inverter stage.

It depicts the main waveforms of the internal signals. After the falling clock edge, the pulse generator checks if the previous output1 QD is high or low. If previous output is QD = 1, next output Q can stay at the same value or make a falling transition, hence a pulse is generated in the fall path through the active-low signal  $CP_f$ , whereas nothing changes in the rise path.

At the steady state, R ( $^{-}$ S) is set to 0 (1), thereby turning OFF the output transistors M7–M8, with the output being maintained at the desired value by a keeper. In other words, the memory element within the proposed topology is actually placed at the output node, as opposed to most of the existing topologies where it is placed before the output stage (see the gated cross-coupled inverter pair, which is connected to the input of the output stage M5–M6). This permits to move the parasitics associated with the memory element to the output node, thereby making the input node of the output stage lightly loaded and hence faster and more energy efficient.

# **CP<sup>3</sup>L and CSP<sup>3</sup>L Topologies**

The proposed class of pulsed latch in Fig. 1 tends to have a lightly loaded D–Q critical path, thereby making it potentially fast and energy-efficient. Such features can be implemented in different ways.

### **Conditional Push–Pull Pulsed Latch:**

The schematic of CP3L topology is depicted in Fig. 2. The keeper (M9–M12 in Fig. 2) drives the output Q and comprises a cross-coupled inverter pair, whose forward inverter is gated to avoid current contention with the output stage M7–M8. Indeed, if R = 1 the pull-down M7 of the output stage is ON and the pull-up network of the keeper is OFF through M11. Analogously, if  $^{-}S = 0$  the pull-up M8 of the output stage is ON and the pull-down network of the keeper is OFF through M10.



Fig 2: CP<sup>3</sup>L topology

Volume No: 2 (2015), Issue No: 7 (July) www.iimetmr.com ISSN No: 2348-4845 International Journal & Magazine of Engineering, Technology, Management and Research

A Peer Reviewed Open Access International Journal

Since the two pulses R and <sup>-</sup>S are alternatively generated, either M10 or M11 in the keeper are actually subject to transitions of the gate terminal in a given cycle. In contrast, the first stage of traditional topologies must drive two transistors associated with the keeper, and both of them are subject to transitions. This clearly reduces the parasitic load of the first stage of CP3L and reduces activity at the keeper capacitances, thereby making the first stage faster and potentially more energy efficient.

Regarding the pulse generator, it comprises a clock phase generator, a pseudo-NAND for the fall path, and a pseudo-NOR gate for the rise path.

It is useful to observe that the width of CP f and CPr pulses determines the width of the transparency window of CP3L latch in which the input can affect the output. From a design point of view, the width of the transparency window can be modified by changing the delay of the inverters within the clock phase generator in Fig. 2. The effect of process variations on timing can be compensated through post-silicon tuning of the pulse width, possibly sharing the tuning circuitry among multiple latches [5], [6]. In this paper, no tune-ability is added to the considered pulsed latches since the addition of such feature would impact area/energy of any pulsed latch equally. Indeed, almost all existing pulsed latches adopt the same pulse generator topology.



Fig. 3: Clock phase generator and waveforms defining  $CP_r$  and  $CP_f$  pulses.

Without the delay stage, the output Q would be connected directly to the pseudo-NAND/NOR in Fig. 2, hence any output transition within the transparency window immediately triggers the generation of an additional (undesired) pulse., which refers to the case where Q is directly connected to the pseudo-NAND/NOR, a falling transition of Q following the same input transition immediately triggers a high pulse in CPr, as the pseudo-NOR in Fig. 2 temporarily has all pMOS transistors M22– M24 ON during the transparency window (i.e., the CPr time slot in Fig. 3).

#### **Conditional Shareable Push–Pull Pulsed Latch:**

In CP<sup>3</sup>L, the pulse generator cannot be shared among multiple latches since pseudo-NOR/NAND are driven by QD, which is different for each latch. In this subsection, we present a different implementation of the same concept by integrating the conditional logic in the latch so that the whole pulse generator can be shared. The resulting conditional shareable push–pull pulsed latch (CSP3L) topology is depicted in Fig. 4.



In CSP<sup>3</sup>L, static NAND/NOR gates are introduced in the shareable pulse generator to generate the pulses CPf,ext and CPr,ext that are distributed to multiple latches and have the same role as CP f and CPr had in CP3L. In each latch, such external pulses are enabled through the switches implemented by M16–M22 in Fig. 4, which implement the conditional pulse selection logic.

International Journal & Magazine of Engineering, Technology, Management and Research

A Peer Reviewed Open Access International Journal



Fig 5: Clock phase generator and waveforms for CSP<sup>3</sup>L

The latter comprises two transmission gates and two small keepers to maintain the same operation as before. As discussed above, the delay stage M23–M26 is introduced in the feedback path (two more than CP<sup>3</sup>L since the transmission gates need complementary control signals). The resulting transistor count is the same as CP<sup>3</sup>L, hence CSP<sup>3</sup>L area is expected to be roughly the same as CP<sup>3</sup>L.

# **SPEED POTENTIAL**

CP3L and CSP3L are comparatively evaluated to TGPL in terms of maximum achievable performance through logical effort analysis [7]. CP3L and CSP3 L are always faster than TGPL. Their theoretical maximum speed advantage is about  $2.3 \times$  and is obtained at light loads.

For typical electrical efforts ranging from 10 to 30, the potential speed advantage is  $1.4 \times -1.5 \times$ , and decreases to  $1.3 \times$  for 60 or more. Although this analysis does not

Volume No: 2 (2015), Issue No: 7 (July) www.ijmetmr.com account for wire parasitics, which will be included in the next section, it suggests that the potential advantage of CP3L and CSP3L over TGPL typically ranges from  $1.4 \times$  to  $2 \times$ .

**ISSN No: 2348-4845** 

Logical effort analysis in the Appendix permits to quantify the advantages of CP3L and CSP3L in each critical path stage. Comparison of (A1)–(A5) and (A2)–(A7) clearly shows that CP3L and CSP3 L have a speed advantage over TGPL both in the first and second stage.



Fig 5: wave forms of ACFF and TGFF SIMULATION RESULTS

CP3L and CSP3 L have very similar minimum D-Q delay, as expected. D-Q delay of CP3L (CSP3 L) is 17.3 ps (17.9 ps) for minimum-ED sizing, while it is 15.6 ps (16.1 ps) for minimum-ED3. From the same figures, the

July 2015

International Journal & Magazine of Engineering, Technology, Management and Research

A Peer Reviewed Open Access International Journal

TGPL latch under the same conditions, respectively, achieves 34.6 and 24 ps. accordingly, the TGPL is slower than CP3L (CSP3 L) by  $2.03 \times (1.92 \times)$  for the minimum ED design, and  $1.54 \times (1.47 \times)$  for minimum ED3.

This is particularly interesting, considering that TGPL is well known for being the fastest existing topology among those with reasonably high energy efficiency. Max power 8.816050e-005 at time 1.0002e-008 Min power 2.625395e-008 at time 2.20122e-008 Measure information of ACFF Measurement result summary DELAY -11.9998n = **RISE TIME** -40.0008n = FALL TIME 6.7553p =Max power 7.300117e-005 at time 5.00021e-008 Max power 7.717696e-006 at time 5.000786e-008 Measure information of ACFF Measurement result summary DELAY 45.7966p = **RISE TIME** 8.6591p = FALL TIME = 6.6029p

Max power 3.668946e-004 at time 2.00353e-008 Min power 4.358335e-007 at time 4.00453e-008

Measure information will be written to file "E:\project\COMPLETED\cp31 mod1\cp31 mod1.measure"

Measurement result summary DELAY = 46.8245p RISETIME = 8.1814p

Max power 1.620779e-003 at time 6.48504e-009 Min power 1.543050e-006 at time 5.00433e-008

Measure information will be written to file "E:\project\COMPLETED\CSP3L\CSP3L.measure"

Measurement result summary DELAY = 50.0420n RISETIME = 14.2140p FALLTIME = 9.3318p Max power 8.895480e-003 at time 2.0001e-008

Min power 4.977092e-005 at time 1.51333e-008

Measure information will be written to file "E:\project\COMPLETED\csp3lmod1\csp3lmod1.measure"

Measurement result summary

DELAY = 15.0255n RISETIME = -25.0045n FALLTIME = 9.9705n Max power 1.377463e-004 at time 2.3538e-008 Min power 4.087961e-007 at time 4.21568e-008

Measure information will be written to file "E:\project\COMPLETED\TGFF\TGFF.measure"

**ISSN No: 2348-4845** 

| Measurement | result | summary |           |
|-------------|--------|---------|-----------|
| DELAY       |        | =       | -10.3907n |
| RISETIME    |        | =       | 31.8702p  |
| FALLTIME    |        | =       | 225.9804p |

For completeness, the proposed class of pulsed latches was also compared to other existing topologies that cover a much wider range of applications, from very high performance to very low energy. In addition to TGPL, we thus considered STFF for its very high performance [8], TGFF for its high energy efficiency at moderate performance [9], and ACFF for its high energy efficiency at low performance targets.

Summarizing, the proposed class of pulsed latches outperforms the state of the art in terms of pure performance, with D–Q delay improvements in the order of  $1.5 \times$  or more.

In current power-limited VLSI systems, the more exploitable advantage of CP3L and CSP3L is their high energy efficiency, as they outperform the state of the art by more than  $2\times$  when compared to topologies targeting high speed. In addition, the proposed pulsed latches exhibits a better energy efficiency  $(1.4\times-1.9\times)$  even when compared to topologies targeting very low energy.

#### **CONCLUSION**

A novel class of pulsed latches has been introduced in this paper. Its push–pull final stage and split paths in the first stage enable a significant reduction in path and parasitic effort. The energy efficiency of the proposed pulsed latches enables a significant improvement beyond the state of the art. Finally, the CP3L and CSP3L were shown to be equivalent in terms of energy and performance, hence both topologies are equally worth considering when designing highly energy efficient systems. The choice between CP3L and CSP3L is driven by preliminary design decisions on the clocking scheme. Indeed, CP3L does not allow for sharing a pulse

Volume No: 2 (2015), Issue No: 7 (July)

July 2015

ISSN No: 2348-4845 International Journal & Magazine of Engineering, Technology, Management and Research

A Peer Reviewed Open Access International Journal

generator, but has lower area than CSP3L if the pulse generator is included.

#### **REFERENCES**

[1] M. Alioto, E. Consoli, and G. Palumbo, "Analysis and comparison in the energy-delay-area domain of nanometer CMOS flip-flops: Part I—Methodology and design strategies," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 19, no. 5, pp. 725–736, May 2011.

[2] M. Alioto, E. Consoli, and G. Palumbo, "Analysis and comparison in the energy-delay-area domain of nanometer CMOS flip-flops: Part II— Results and figures of merit," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 19, no. 5, pp. 737–750, May 2011.

[3] D. Markovic, B. Nikolic, and R. Brodersen, "Analysis and design of low-energy flip-flops," in Proc. Int. Symp. Low Power Electron. Design, Aug. 2001, pp. 52–55.

[4]B.Santosh

P.Sowmithri&T.VenkateswaraRao, Low Power and High Speed Conditional Push-Pull Pulsed Latches, IJMETMR,

Kumar.

http://www.ijmetmr.com/oljune2015/BSantoshKumar-

PSowmithri-TVenkateswaraRao-106.pdf, Volume No: 2 (2015), Issue No: 6 (June)

[5] H. Ando, Y. Yoshida, A. Inoue, I. Sugiyama, T. Asakawa, K. Morita, T. Muta, T. Motokurumada, S. Okada, H. Yamashita, Y. Satsukawa, A. Konmoto, R. Yamashita, and H. Sugiyama, "A 1.3GHz fifth generation SPARC64 microprocessor," in Proc. DAC, Jun. 2003, pp. 702–705.

[6] M. Wieckowski, Y. M. Park, C. Tokunaga, D. W. Kim, Z. Foo, D. Sylvester, and D. Blaauw, "Timing yield enhancement through soft edge flip-flop based design," in Proc. CICC, Sep. 2008, pp. 543–546.

[7] I. Sutherland, B. Sproull, and D. Harris, Logical Effort. Designing Fast CMOS Circuits. San Mateo, CA, USA: Morgan Kaufmann Publishers, 1999.

[8] N. Nedovic, V. Oklobdzija, and W. Walker, "A clock skew absorbing flip-flop," in IEEE ISSCC Dig. Tech. Papers, Feb. 2003, pp. 342–497.

[9] D. Markovic, B. Nikolic, and R. Brodersen, "Analysis and design of low-energy flip-flops," in Proc.

Int. Symp. Low Power Electron. Design, Aug. 2001, pp. 52–55.

[10] C. Teh, T. Fujita, H. Hara, and M. Hamada, "A 77% energy-saving 22-transistor single-phase-clocking D-flip-flop with adaptive-coupling configuration in 40nm CMOS," in IEEE ISSCC Dig. Tech. Papers, Feb. 2011, pp. 338–340.

Volume No: 2 (2015), Issue No: 7 (July) www.ijmetmr.com