



A Peer Reviewed Open Access International Journal

# Design & Implementation of an Efficient Multipliers using Adaptive Hold Logic Technique

## **M** Upendar

Pg Scholar(Vlsi&Amp;Es)
Dept Of Ece
Vidya Bharathi Institute Of
Technology
Janagaon, Warangal, Telangana

## **Md Moin Pasha**

Associate Professor
Dept Of Ece
Vidya Bharathi Institute Of
Technology
Janagaon, Warangal, Telangana

## **B Kranthi Kumar**

Associate Professor
Dept Of Ece
Vidya Bharathi Institute Of
Technology
Janagaon, Warangal, Telangana

#### **ABSTRACT:**

Digital multipliers are among the most critical arithmetic **functional** units. The overall performance of these-systems depends on the throughput of the multiplier. Meanwhile, the negative bias temperature instability effect occurs when a pMOS transistor is under negative bias (Vgs = -Vdd), increasing the threshold voltage of the pMOS transistor, and reducing multiplier speed. A similar phenomenon, positive bias temperature in-stability, occurs when an nMOS transistor is under positive bias. Both effects degrade transistor speed, and in the long term, the system may fail due to timing violations. Therefore, it is important to design reliable highperformance multipliers. In this paper, we propose an aging-aware multiplier design with novel adaptive hold logic (AHL) circuit. The multiplier is able to provide higher throughput through the variable latency and can adjust the AHL circuit to mitigate performance degradation that is due to the aging effect. Moreover, the proposed architecture can be applied to a column or row by passing multiplier.

#### I. INTRODUCTION:

Digital multipliers are among the most critical arithmetic functional units in many applications, such as the Fourier transform, discrete cosine transforms, and digital filtering. The through put of these applications depends on multipliers, and if the multipliers are too slow, the performance of entire circuits will be reduced. Furthermore, negative bias temperature instability (NBTI) occurs when a pMOS

transistor is under negative bias (Vgs = -Vdd). In this situation, the interaction between inversion layer holes and hydrogen-passivated Si atoms breaks the Si-H generated during the oxidation process, generating H or H2 molecules. When these molecules diffuse away, interface traps are left[1]. The accumulated interface traps between silicon and the gate oxide interface result in increased threshold voltage (Vth), reducing the circuit switching speed. When the biased voltage is removed, the reverse reaction occurs, reducing the NBTI effect. However, the re-verse reaction does not eliminate all the interface traps generated during the stress phase, and Vth is increased in the long term. Hence, it is important to design a re-liable high-performance multiplier. The corresponding affection an nMOS transistor is positive bias temperature instability (PBTI), which occurs when an nMOS transistor is under positive bias. Compared with the NBTI effect, the PBTI effect is much smaller on oxide/ poly gate transistors, and therefore is usually ignored.

However, for high-k/metal-gate nMOS transistors with significant charge trapping, the PBTI effect can no longer be ignored. In fact, it has been shown that the PBTI effect is more significant than the NBTI effect on 32-n-mhigh-k/metal-gate processes [2]–[4]. A traditional method to mitigate the aging effect is overdesign, including such things as guard-banding and gate over sizing; however, this approach can be very pessimistic and area and power inefficient. To avoid this problem, many NBTI-aware methodologies have been proposed. An NBTI-aware technology mapping technique was proposed in [7] to guarantee the performance of the circuit during its lifetime. In





A Peer Reviewed Open Access International Journal

[8], an NBTI-aware sleep transistor was de-signed to reduce the aging effects on pMOS sleep-transistors, and the m lifetime stability of the power-gated circuits under consideration was improved. Wu Marculescu [9] proposed a point logic restructuring and pin reordering method, which is based on functional symmetries detecting and stacking effects. They also proposed an NBTI tion method that considered optimiza sensitization. [12]. In [10] and [11], dynamic voltage scaling and body-basing techniques were proposed to reduce power or extend circuit life. These techniques, however, require circuit modification or do not provide optimization of specific circuits.

Traditional circuits use critical path delay as the overall circuit clock cycle in order to perform correctly. How-ever, the probability that the critical paths are activated is low. In most cases, the path delay is shorter than the critical path. For these non critical paths, using the critical path delay as the overall cycle period will result in significant timing waste. Hence, the variable-latency design was proposed to reduce the timing waste of traditional circuits. variable-latency design divides the circuit into two parts: 1) shorter paths and 2) longer paths. Shorter paths can execute correctly in one cycle, where as longer paths need two cycles to execute. When shorter paths are activated frequently, the average latency of variable latency designs is better than that of traditional designs. For example, several variablelatency adders were proposed using the speculation technique with error detection and recovery [13]–[15]. A short path activation function algorithm was proposed in [16] to improve the accuracy of the hold logic and to optimize the performance of the variable latency circuit. An instruction scheduling algorithm was proposed in to schedule the operations on non uniform latency functional units and improve the performance of Very Long Instruction Word processors. In a variable-latency pipe-lined multiplier architecture with a Booth algorithm was proposed. process-variation tolerant architecture arithmetic units was proposed, where the effect of process-variation is considered to increase the circuit yield. In addition, the critical paths are divided into two shorter paths that could be unequal and the clock cycle is set to the delay of the longer one. These research designs were able to reduce the timing waste of traditional circuits to improve performance, but they did not consider the aging effect and could not adjust themselves during the runtime. A variable-latency adder design that considers the aging effect was proposed. However, no variable-latency multiplier design that considers the aging effect and can adjust dynamically has been done.

## **II.PAPER CONTRIBUTION:**

In this paper, we propose an aging-aware reliable multiplier design with novel adaptive hold logic (AHL) circuit. The multiplier is based on the variable-latency technique and can adjust the AHL circuit to achieve re-liable operation under the influence of NBTI and PBTI effects. To be specific, the contributions of this paper are summarized as 1) novel variable-latency architecture with an AHL circuit. The AHL circuit can decide whether the input patterns require one or two cycles and can adjust the judging criteria to ensure that there is minimum performance degradation after considerable aging occurs; 2) comprehensive analysis and comparison of the multiplier's performance under different cycle periods to show the effectiveness of our proposed architecture; 3) an aging-aware reliable multiplier design method that is suitable for large multipliers. Although the experiment is performed in 32-bit multipliers, our proposed architecture can be easily extended to large designs.

## **III.PRELIMINARIES:**

#### **Row-Bypassing Multiplier:**

A low-power row-bypassing multiplier is also proposed to reduce the activity power of the AM. The operation of the low-power row-bypassing multiplier is similar to that of the low-power column-bypassing multiplier, but the selector of the multiplexers and the tri state gates use the multiplicator.





A Peer Reviewed Open Access International Journal



Figure 1:  $4 \times 4$  Row-Bypassing Multipliers

Fig. 1 is a 4 ×4 row-bypassing multiplier. Each input is connected to an FA through a tri state gate. When the inputs are11112 \* 10012, the two inputs in the first and second rows are 0 for FAs. Because b1 is 0, the multiplexers in the first row select aib0 as the sum bit and select 0 as the carry bit. The inputs are bypassed to FAs in the second rows, and the tristate gates turn off the input paths to the FAs.Therefore, no switching activities occur in the first-row FAs; in return, power consumption is reduced. Similarly, becauseb2is 0, no switching activities will occur in the second-row FAs. However, the FAs must be active in the third row because the b3 is not zero.

## **Column Bypassing Multiplier:**

cells.

A column-bypassing multiplier is an improvement on the normal array multiplier (AM). The multiplier array consists of (n-1) rows of carry save adder (CSA), in which each row contains (n-1) full adder (FA)



Figure 2:  $4 \times 4$  column-bypassing multipliers.

Each FA in the CSA array has two outputs: 1) the sum bit goes down and 2) the carry bit goes to the lower left FA. The last row is a ripple adder for carry propagation. The FAs in the AM are always active regardless of input states. A low-power column-bypassing multiplier design is proposed in which the

FA operations are disabled if the corresponding bit in the multiplicand is 0. Fig. 2 shows a 4×4 column-bypassing multiplier. Supposing the inputs are 10102 \* 11112, it can be seen that for the FAs in the first and third diagonals, two of the three input bits are 0: the carry bit from its upper right FA and the partial product aibi . Therefore, the output of the adders in both diagonals is 0, and the output sum bit is simply equal to the third bit, which is the sum output of its upper FA.

## IV.PROPOSED AGING-AWARE MULTIPLIER:

The proposed aging-aware reliable multiplier design. It introduces the overall architecture and the functions of each component and also describes how to design AHL that adjusts the circuit when significant aging occurs.

## **Proposed Architecture:**

Proposed 8-bit aging-aware multiplier architecture, which includes two m-bit inputs (m is a positive number), one 2m-bit output, one column- or row-bypassing multiplier, 2m 1-bit Razor flip-flops, and an AHL circuit.



Figure 3: Proposed architecture (md means multiplicand; mr means multiplicator).

Razor flip-flops can be used to detect whether timing violations occur before the next input pattern arrives. A 1-bit Razor flip-flop contains a main flip-flop, shadow latch, XOR gate, and mux. The main flip-flop catches the execution result for the combination





A Peer Reviewed Open Access International Journal

circuit using a normal clock signal, and the shadow latch catches the execution result using a delayed clock signal, which is slower than the normal clock signal. If the latched bit of the shadow latch is different from that of the main flip-flop, this means the path delay of the current operation exceeds the cycle period, and the main flip-flop catches an incorrect result. If errors occur, the Razor flip-flop will set the error signal to 1 to notify the sys-tem to re execute the operation and notify the AHL circuit that an error has occurred. We use Razor flip-flops to detect whether an operation that is considered to be a one-cycle pattern can really finish in a cycle. If not, the operation is re executed with two cycles. Although the re execution may seem costly, the overall cost is low because there execution frequency is low.

More details for the Razor flip-flop can be found. The AHL circuit is the key component in the aging-ware variable-latency multiplier. Fig. shows the details of the AHL circuit. The AHL circuit contains an aging indicator, two judging blocks, one mux, and one D flip-flop. The aging indicator indicates circuit has whether the suffered significant performance degradation due to the aging effect. The aging indicator is implemented in a simple counter that counts the number of errors over a certain amount of operations and is reset to zero at the end of those operations. If the cycle period is too short, the column- or row-bypassing multiplier is not able to complete these operations successfully, causing timing violations.



Figure 4: Razor flip flops.



Figure 5: Diagram of AHL (md means multiplicand; mr means multiplicator).

These timing violations will be caught by the Razor flip-flops, which generate error signals. If errors happen frequently and exceed a predefined threshold, it means the circuit has suffered significant timing degradation due to the aging effect, and the aging indicator will out-put signal 1; otherwise, it will output 0 to indicate the aging effect is still not significant, and no actions are needed. The first judging block in the AHL circuit will output 1 if the number of zeros in the multiplicand(multiplicator for the row-bypassing multiplier) is larger than n and the second judging block in the AHL circuit will output 1 if the number of zeros in the multiplicand (multiplicator) is larger than n + 1. They are both employed to decide whether an input pattern requires one or two cycles, but only one of them will be chosen at a time. In the beginning, the aging effect is not significant, and the aging indicator produces 0, so the first judging block is used.

After a period of time when the aging effect becomes significant, the second judging block is chosen. Com-pared with the first judging block, the second judging block allows a smaller number of patterns to become-one-cycle patterns because it requires more zeros in the multiplicand (multiplicator). The details of the operation of the AHL circuit are as follows: when an input pattern arrives, both judging blocks will decide whether the pattern requires one cycle or two cycles to complete and pass both results to the multiplexer. The multiplexer selects one of either result based on the output of the aging indicator. Then an OR operation is performed between the result of the multiplexer, and the Q<sup>-</sup> signal is used to determine the input of the D flip-flop. When the pattern requires





A Peer Reviewed Open Access International Journal

one cycle, the output of the multiplexer is1. The !(gating) signal will become 1, and the input flip flops will latch new data in the next cycle. On the other hand, when the output of the multi-plexer is 0, which means the input pattern requires two cycles to complete, the OR gate will output 0to the D flip-flop. Therefore, the (gating) signal will be 0 to disable the clock signal of the input flip-flops in the next cycle. Note that only a cycle of the input flip-flop will be disabled because the D flip-flop will latch 1 in the next cycle. The overall flow of our proposed architecture is as follows: when input patterns arrive, the column- or row-bypassing multiplier,

the AHL circuit execute simultaneously. and According to the number of zeros in the multiplicand (multiplicator), the AHL circuit decides if the input patterns require one or two cycles. If the input pattern requires two cycles to complete, the AHL will output 0 to disable the clock signal of the flip-flops. Otherwise, the AHL will output 1 for normal operations. When the column-or row-bypassing multiplier finishes the operation, the result will be passed to the Razor flip-flops. The Razor flip flops check whether there is the path delay timing violation. If timing violations occur, it means the cycle period is not long enough for the current operation to complete and that the execution result of the multiplier is incorrect. Thus, the Razor flip-flops will output an error to inform the system that the current operation needs to be re executed using two cycles to ensure the operation is correct. In this situation, the extra re execution cycles caused by timing violation incurs a penalty to overall average latency.

However, our proposed AHL circuit can accurately predict whether the input patterns require one or two cycles in most cases. Only a few input patterns may cause a timing variation when the AHL circuit judges incorrectly. In this case, the extra re execution cycles did not produce significant timing degradation. In summary, our proposed multiplier design has three key features. First, it is a variable-latency design that minimizes the timing waste of the noncritical paths. Second, it can provide reliable operations even after the aging effect occurs. The Razor flip-flops detect the

timing violations and re execute the operations using two cycles. Finally, our architecture can adjust the percentage of one-cycle patterns to minimize performance degradation due to the aging effect. When the circuit is aged, and many errors occur, the AHL circuit uses the second judging block to decide if an input is one cycle or two cycles.

#### **V.SIMULATION RESULTS:**

# RTL FOR PROPOSED 8X8 ROW-BYPASSING & COLUMN BYPASSING MULTIPLIER



a)



**b**)

Figure 6: a) 8X8 Row bypassing multiplier, b) 8X8 Column bypassing multiplier



A Peer Reviewed Open Access International Journal



a)



b)

Figure 7: Simulation result of a) Existing, b) proposed

Area & Delay Reports

| Device Utilization Summary (estimated values) |      |           |             |  |
|-----------------------------------------------|------|-----------|-------------|--|
| Logic Utilization                             | Used | Available | Utilization |  |
| Number of Slice LUTs                          | 22   | 9112      | 0%          |  |
| Number of fully used LUT-FF pairs             | 0    | 22        | 0%          |  |
| Number of bonded IOBs                         | 16   | 232       | 6%          |  |

a)

Fiming Summary: -----Speed Grade: -2

> Minimum period: No path found Minimum input arrival time before clock: No path found Maximum output required time after clock: No path found Maximum combinational path delay: 10.725ns

b)

Figure 8: a) Area & b) Delay for Existing

| Device Utilization Summary (estimated values) |      |           |             |  |
|-----------------------------------------------|------|-----------|-------------|--|
| Logic Utilization                             | Used | Available | Utilization |  |
| Number of Slice Registers                     | 44   | 18224     | 0%          |  |
| Number of Slice LUTs                          | 67   | 9112      | 0%          |  |
| Number of fully used LUT-FF pairs             | 18   | 93        | 19%         |  |
| Number of bonded IOBs                         | 21   | 232       | 9%          |  |
| Number of BUFG/BUFGCTRLs                      | 3    | 16        | 18%         |  |

a)

ISSN No: 2348-4845

b)

Figure 9: a) Area & b) Delay for Proposed

#### **VI.CONCLUSION:**

This paper proposed an aging-aware variable-latency multiplier design with the AHL. The multiplier is able to adjust the AHL to mitigate performance degradation due to increased delay. The experimental results show that our proposed architecture with 8x8 multiplications with Bypass multipliers as last stage instead of Normal RCA adder it will decrease the delay and improve the performance compared with previous models.

## **REFERENCES:**

- [1] Y. Cao. (2013). Predictive Technology Model (PTM) and NBTI Model [Online]. Available: http://www.eas. asu.edu/ptm
- [2] S. Zafaret al., "A comparative study of NBTI and PBTI (charge trapping) in SiO2/HfO2 stacks with FUSI, TiN, Re gates," in Proc.IEEE Symp. VLSI Technol. Dig. Tech. Papers, 2006, pp. 23–25.
- [3] S. Zafar, A. Kumar, E. Gusev, and E. Cartier, "Thresh-old voltage instabilities in high-k gate dielectric stacks," IEEE Trans. Device Mater.Rel., vol. 5, no. 1, pp. 45–64, Mar. 2005.
- [4] H.-I. Yang, S.-C.Yang, W. Hwang, and C.-T. Chuang, "Impacts of NBTI/PBTI on timing control circuits and degradation tolerant designin nanoscale CMOS SRAM," IEEE Trans. Circuit Syst., vol. 58, no. 6, pp. 1239–1251, Jun. 2011.
- [5] R. Vattikonda, W. Wang, and Y. Cao, "Modeling and miimization of pMOS NBTI effect for robust naometer design," in Proc. ACM/IEEEDAC, Jun. 2004, pp. 1047–1052.
- [6] S. V. Kumar, C. H. Kim, and S. S. Sapatnekar, "NBTI-aware synthesis of digital circuits," in Proc. ACM/IEEE DAC, Jun. 2007, pp. 370–375.





A Peer Reviewed Open Access International Journal

- [7] A. Calimera, E. Macii, and M. Poncino, "Design tech-niqures for NBTItolerant power-gating architecture," IEEE Trans. Circuits Syst., Exp.Briefs, vol. 59, no. 4, pp. 249–253, Apr. 2012.
- [8] K.-C. Wu and D. Marculescu, "Joint logic restructur-ing and pin reordering against NBTI-induced perfor-mance degradation," in Proc. DATE,2009, pp. 75–80.
- [9] Y. Lee and T. Kim, "A fine-grained technique of NBTI-aware voltage scaling and body biasing for stan-dard cell based designs," in Proc. ASPDAC,2011, pp. 603–608.
- [10] Y. Lee and T. Kim, "A fine-grained technique of NBTI-aware voltage scaling and body biasing for standard cell based designs," in Proc. ASPDAC, 2011, pp. 603–608.
- [11] M. Basoglu, M. Orshansky, and M. Erez, "NBTI-aware DVFS: A new approach to saving energy and increasing processor lifetime," in Proc. ACM/IEEE ISLPED, Aug. 2010, pp. 253–258.
- [12] K.-C. Wu and D. Marculescu, "Aging-aware

- timing analysis and optimization considering path sensitization," in Proc. DATE, 2011, pp. 1–6.
- [13] K. Du, P. Varman, and K. Mohanram, "High performance reliable variable latency carry select addition," in Proc. DATE, 2012, pp. 1257–1262.
- [14] A. K. Verma, P. Brisk, and P. Ienne, "Variable latency speculative addition: A new paradigm for arithmetic circuit design," in Proc. DATE, 2008, pp. 1250–1255.
- [15] D. Baneres, J. Cortadella, and M. Kishinevsky, "Variable-latency design by function speculation," in Proc. DATE, 2009, pp. 1704–1709.
- [16] Y.-S. Su, D.-C. Wang, S.-C. Chang, and M. Marek-Sadowska, "Performance" optimization using variable-latency design style," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 19, no. 10, pp. 1874–1883.