

A Peer Reviewed Open Access International Journal

# Ultra Low Energy Variation Aware Design Adder Architecture Study



P.Sravani PG Scholar, Department of ECE (VLSI), Madhira Institute of Technology and Science, Kodad, TS, India.



Mr.Sk Subhan, M.Tech Assistant Professor Department of ECE Madhira Institute of Technology and Science, Kodad, TS, India.



Mr.Devireddy Venkatarami Reddy Assistant Professor & HoD Department of ECE Madhira Institute of Technology and Science, Kodad, TS, India.

### Abstract

Power consumption of digital systems is an important issue in nanoscale technologies and growth of process variation makes the problem more challenging. In this brief, we have analyzed the latency, energy consumption, and effects of process variation on different structures with respect to the design structure and logic depth to propose architectures with higher throughput, lower energy consumption, and smaller performance loss caused by process variation in application specific integrated circuit design. We have exploited adders as different implementations of a processing unit, and propose architectural guidelines for finer technologies in subthreshold which are applicable to any other architecture. The results show that smaller computing building blocks have better energy efficiency and less performance degradation because of variation effects. In contrast, their computation throughput will be mid or less unless proper solutions, such as pipelined or parallel structures, are used.

Therefore, our proposed solution to improve the throughput loss while reducing sensitivity to process variations is using simpler elements in deep pipelined designs or massively parallel structures. Index Terms— Adder structures, architecture, deep pipeline, massive parallel, statistical static timing analysis (SSTA), ultra low energy, variation-aware.

#### **INTRODUCTION**

As technology advances, the density of integrated circuits grows and power consumption becomes more and more serious [1]. This problem affects the performance of design and causes heating and power supply shortage problems. One major solution is using near/subthreshold computing to reduce power consumption over the complex systems-on-chip [2]. Near and subthreshold computing is attractive in energy-constrained applications, such as sensor networks, to increase lifetime and provide energy harvesting capability for some emerging applications. In subthreshold region, both static and dynamic ingredients of power consumption are severely reduced because of lower supply voltage.

However, circuit delay grows exponentially by descending voltage level and hence, the static energy consumption is increased.

In this brief, we use SSTA method to analyze adder structures considering process variations and extract effective architectural level design guidelines to



A Peer Reviewed Open Access International Journal

improve speed performance and energy efficiency. The rest of this brief is organized as follows. In Section II, we will introduce existing designs of adders. In Section III advantages and disadvantages of some popular adder structures as basic blocks of arithmetic units. In Section IV, we will describe our method and analyze the results and introduce some key guidelines. Finally, the conclusion is drawn in Section V.

### **PREVIOUS DESIGNS**

Ripple-carry adder (RCA) has simple architecture and linearly extensible for wider computations with respect to area. However, this adder has limited performance because of long carry propagation path from LSB to MSB. Because of long critical path delays in RCA, designers have tried to look ahead carry bit for each higher bit independent of lower neighboring carry bits using a logarithmic delay tree structure, and each tree optimization strategy implies a new prefix adder.

#### PRIMILIRIES

In minimum energy point of energy-voltage curve, this increase in static energy dominates the dynamic energy consumption, and scaling supply voltage to lower levels means more delay and more total energy consumption [2], [3]. Because of feature size scaling, the impact of process variations becomes significant and near/subthreshold design intensifies the effects of variations and severely degrades the performance parameters [4]-[6]. In order to control process variation effects, we need to do careful timing analysis and employ statistical approaches rather than the classic worst case analysis. Static timing analysis (STA) was previously implemented in commercial tools [7] and worst case conditions were considered for each cell timing. Then, cell parameters were used to calculate delays of paths in a complex design by adding up delays of gates in series (n = number ofgates)

where  $\mu$ i and  $\delta$ i represent mean and standard deviation of delay for each gate, respectively. In new technologies, variation has grown and using STA yields losing much of the speed performance, unnecessarily. However, statistical STA (SSTA) is another way to analyze the timing specifications of critical paths of a design for getting more realistic results. Variation of each cell is assumed as a normal (Gaussian) variable [5], [8] (2) and (3) [9]

$$\mu_{\text{Critical-path}} = \sum_{i=1}^{n} \mu_i, \quad \delta_{\text{Critical-path}}^2 = \sum_{i=1}^{n} \delta_i^2$$
  
Delay<sub>Critical-path</sub> =  $\mu_{\text{Critical-path}} + 3 \times \sigma_{\text{Critical-path}}$ (3)

The SSTA is an accepted method based on statistical manner of variations and supported by recent commercial tools [7], [10]. In this method,  $\sigma /\mu$  [3], [5], [9] is an important ratio to compare the severity of variations in cells to have better standard cell design in deep subthreshold region. Verma et al. [11] extracted logic chains for Kogge-Stone adder (KSA) to measure delay variability in both 0.3 and 1.2 V voltages.  $\sigma / \mu$ ratio contours have been drawn based on delay variability histogram, logic depth, and gate width, and variability mitigation is performed by gate up-sizing. Newer technologies such as dual gate silicon on insulator [12] have lower variability in comparison with bulk CMOS to design robust subthreshold logic cells in 32-nm CMOS. Thakur et al. [13] analyzed the effects of variations in gate oxide thickness, supply voltage, and temperature in four adders and they tried to rank the variation effect of each parameter on delay. As a new design method in [14], SSTA is used to sieve a standard cell library with different variation constraints during synthesis of arithmetic circuits. They have verified the results by Monte Carlo simulations. Islam et al. [15] have designed a robust (lower  $\sigma /\mu$  ratio) subthreshold full adder considering power-delay product. Arthurs and Di [16] evaluate the variations of both Schmitt-trigger and NULL convention logic 1-bit adders by four-gate libraries characterized at different supply voltages for better static noise margin.



A Peer Reviewed Open Access International Journal

### SERIAL FULL ADDER (SFA) STRUCTURE

We choose adder as the key building block of arithmetic units in every processor ranging from general purpose to application specific, because it can be used to implement more complex operations such as multiplication and division or even more complex units, such as fast Fourier transform and finite-impulse response filters. We have selected six different 16-bit adder structures [17], [18] to study in subthreshold region. Ripple-carry adder (RCA) has simple architecture and linearly extensible for wider computations with respect to area.

However, this adder has limited performance because of long carry propagation path from LSB to MSB. Because of long critical path delays in RCA, designers have tried to look ahead carry bit for each higher bit independent of lower neighboring carry bits using a logarithmicdelay tree structure, and each tree optimization strategy implies a new prefix adder. The first candidate prefix adder discussed is Brent-Kung adder (BKA). This structure has balanced area and timing overheads with shortening the long carry chains  $[((2 \times \log 2 N) - 2) \log ic stages]$  which is a proper technique to co-optimize area and performance of design. In KSA, addition is performed with higher speed because of parallel computations in shorter paths with only log2 N logic stages besides higher area overhead. Han-Carlson adder (HCA) is a combination of BKA and KSA to reduce the complexity and make a tradeoff between area and delay with log2 N +1 logic stages. Another prefix adder which has minimum logic depth (log2 N) is known as Lander-Fisher adder (LFA).

In this architecture, some nodes have very high fanouts (up to N/2) to reduce the area and this may degrade the performance. Serial full adder (SFA) is a basic full adder which is combined with a flip-flop to utilize the adder unit at different clock cycles in timeserialized ripple-carry manner (Fig. 1) and the number of clock cycles that it takes is equal to the number of bits.



Fig. 1. Single-bit full adder in combination with a flipflop to do n-bit addition sequentially at different clock cycles.

#### SIMULATION RESULTS

We have coded the all Serial Full Adders techniques in Verilog HDL. All the designs are synthesized in the Xilinx Synthesis Tool and Simulated using Xilinx ISE 14.4 simulator. The synthesis and simulation results are as shown below figures.







Fig3: RTL Schematic of 16bit- SFA Adder



A Peer Reviewed Open Access International Journal



Fig4: Technology Schematic of 16bit- SFA Adder

| Project File:                                                                                                                             |                           | SFA ADDER.xise    |                           | Parser Errors:        |           |                             |                     | No Errors   |                             |  |
|-------------------------------------------------------------------------------------------------------------------------------------------|---------------------------|-------------------|---------------------------|-----------------------|-----------|-----------------------------|---------------------|-------------|-----------------------------|--|
|                                                                                                                                           | -                         |                   |                           |                       |           |                             |                     | No Errors   |                             |  |
| Module Name:                                                                                                                              | SFA_16BIT                 |                   | Imple                     | Implementation State: |           |                             |                     | Synthesized |                             |  |
| Target Device:                                                                                                                            | xc3s100e-5tq144           |                   |                           | • Errors:             |           |                             |                     | No Errors   |                             |  |
| Product Version:                                                                                                                          | ISE 14.4                  |                   |                           | • Warnings:           |           |                             | 23 Warnings (0 new) |             |                             |  |
| Design Goal:                                                                                                                              | Balanced                  |                   | Routing Results:          |                       |           |                             |                     |             |                             |  |
| Design Strategy:                                                                                                                          | Xilinx Default (unlocked) |                   | • Timing Constraints:     |                       |           |                             |                     |             |                             |  |
| Environment:                                                                                                                              | System Settings           |                   | Final Timing Score:       |                       |           |                             |                     |             |                             |  |
|                                                                                                                                           | Device                    | Utilization Summa | ary (est                  | timated               | l values) |                             |                     |             | Ŀ                           |  |
| Logic Utilization                                                                                                                         | Device                    | Utilization Summa | ary (est                  | timated<br>Availa     |           |                             | Utilization         | 1           | Ŀ                           |  |
| -                                                                                                                                         | Device                    |                   | ary (est                  |                       |           | 960                         | Utilization         | I           |                             |  |
| Number of Slices                                                                                                                          | Device                    |                   |                           |                       |           |                             | Utilization         | I           | 19                          |  |
| Number of Slices<br>Number of Slice Flip Flops                                                                                            | Device                    |                   | 17                        |                       |           | 960                         | Utilization         | I           | 19<br>09                    |  |
| Number of Slices<br>Number of Slice Flip Flops<br>Number of 4 input LUTs                                                                  | Device                    |                   | 17                        |                       |           | 960<br>1920                 | Utilization         | I           | 19<br>09<br>19              |  |
| Number of Slices<br>Number of Slice Flip Flops<br>Number of 4 input LUTs<br>Number of bonded IOBs                                         | Device                    |                   | 17<br>16<br>33            |                       |           | 960<br>1920<br>1920         | Utilization         | I           | 19<br>09<br>19<br>479       |  |
| Logic Utilization<br>Number of Silces<br>Number of Silce Flip Flops<br>Number of 4 input LUTs<br>Number of bonded IOBs<br>Number of GCL/s | Device                    |                   | 17<br>16<br>33<br>51      |                       |           | 960<br>1920<br>1920<br>1920 |                     | I           | 19<br>09<br>19<br>479<br>49 |  |
| Number of Slices<br>Number of Slice Flip Flops<br>Number of 4 input LUTs<br>Number of bonded IOBs                                         | Device                    |                   | 17<br>16<br>33<br>51<br>1 | Availa                |           | 960<br>1920<br>1920<br>1920 | Utilization         | •           | 19<br>09<br>19<br>479       |  |

Fig5: Design summary of 16bit- SFA Adder



Fig6: Test bench of 16bit- SFA Adder



Fig7: Simulation output waveform of 16bit- SFA Adder

### CONCLUSION

In this brief, we have analyzed the latency, energy consumption, and effects of process variation on different adder structures as different implementations of a popular processing unit with respect to the design structure and logic depth to propose architectural guidelines. These guidelines are applicable to any other architecture without any dependence to functionality of the design to achieve higher throughput, lower energy consumption, and smaller performance loss caused by process variation in application-specific integrated circuit design. Simulation results and analysis confirm that, SFA has smaller area, less timing fluctuations, and the highest working frequency, and its throughput is similar to RCA. Utilizing SFA in parallel architecture or pipelined version of RCA improves the throughput besides the energy efficiency and variation resistance.

Finally, we conclude that utilizing such blocks in a massively parallel architecture is another way to compensate the process variation effects and lower the frequency uncertainty plus lowering timing fluctuations due to process variations.

### REFERENCES

[1] M. B. Taylor, "A landscape of the new dark silicon design regime," IEEE Micro, vol. 33, no. 5, pp. 8–19, Sep./Oct. 2013.



A Peer Reviewed Open Access International Journal

[2] A. Wang, B. H. Calhoun, and A. P. Chandrakasan, Sub-Threshold Design for Ultra Low-Power Systems. New York, NY, USA: Springer-Verlag, 2006.

[3] Z. Bo et al., "Energy-efficient subthreshold processor design," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 17, no. 8, pp. 1127–1137, Aug. 2009.

[4] H. Iwai, "Roadmap for 22 nm and beyond (Invited Paper)," Microelectron. Eng., vol. 86, nos. 7–9, pp. 1520–1528, 2009.

[5] International Solid State Circuits Conference 2013 Trends. [Online]. Available: http://isscc.org/doc/2013, accessed 2014.

[6] X. Chen, L. Yang, R. P. Dick, L. Shang, and H. Lekatsas, "C-pack: A high-performance microprocessor cache compression algorithm," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 18, no. 8, pp. 1196–1208, Aug. 2010.

[7] Synopsys On-Line Documents. [Online]. Available: http://www. synopsys.com/support/pages/dow.aspx, accessed 2014.

[8] A. Srivastava, D. Sylvester, and D. Blaauw, Statistical Analysis and Optimization for VLSI: Timing and Power. New York, NY, USA: Springer-Verlag, 2006.

[9] S. R. Sarangi, B. Greskamp, R. Teodorescu, J. Nakano, A. Tiwari, and J. Torrellas, "VARIUS: A model of process variation and resulting timing errors for microarchitects," IEEE Trans. Semicond. Manuf., vol. 21, no. 1, pp. 3–13, Feb. 2008.

[10] M. Tehranipoor, K. Peng, and K. Chakrabarty, Test and Diagnosis for Small-Delay Defects. New York, NY, USA: Springer-Verlag, 2011. [11] N. Verma, J. Kwong, and A. P. Chandrakasan, "Nanometer MOSFET variation in minimum energy subthreshold circuits," IEEE Trans. Electron Devices, vol. 55, no. 1, pp. 163–174, Jan. 2008.

[12] R. Vaddi, S. Dasgupta, and R. P. Agarwal, "Device and circuit co-design robustness studies in the subthreshold logic for ultralow-power applications for 32 nm CMOS," IEEE Trans. Electron Devices, vol. 57, no. 3, pp. 654–664, Mar. 2010.

[13] A. Thakur, D. Chilamakuri, and D. Velenis, "Effects of process and environmental variations on adder architectures," in Proc. 49th IEEE Int. Midwest Symp. Circuits Syst. (MWSCAS), Aug. 2006, pp. 36–40.

[14] J. Crop, R. Pawlowski, N. Moezzi-Madani, J. Jackson, and P. Chaing, "Design automation methodology for improving the variability of synthesized digital circuits operating in the sub/near-threshold regime," in Proc. Int. Green Comput. Conf. Workshops (IGCC), Jul. 2011, pp. 1–6.

[15] A. Islam, A. Imran, and M. Hasan, "Robust subthreshold full adder design technique," in Proc. Int. Conf. Multimedia, Signal Process. Commun. Technol. (IMPACT), Dec. 2011, pp. 99–102.

[16] A. Arthurs and J. Di, "Analysis of ultra-low voltage digital circuits over process variations," in Proc. IEEE Subthreshold Microelectron. Conf. (SubVT), Oct. 2012, pp. 1–3.

[17] M. Talsania and E. John, "A comparative analysis of parallel prefix adders," in Proc. Int. Conf. Comput. Design, Las Vegas, NV, USA, Jul. 2013, pp. 29–36.

[18] B. Parhami, Computer Arithmetic: Algorithms and Hardware Designs. London, U.K.: Oxford Univ. Press, 2009.



A Peer Reviewed Open Access International Journal

[19] K. T. Johnson, A. R. Hurson, and B. Shirazi, "General-purpose systolic arrays," IEEE Comput., vol. 26, no. 11, pp. 20–31, Nov. 1993.

[20] M. Bekakos, I. Ž. Milovanovi'c, T. I. Toki'c, C. B. Doli' ' canin, and E. I. Milovanovi'c, "Selecting mathematical method for systolic processing," Sci. Pub. State Univ. Novi Pazar A, Appl. Math., Inf. Mech., vol. 3, no. 1, pp. 53–58, 2011.

### **Author Details**

**P.Sravani**, PG Scholar, Dept of ECE (VLSI), Madhira Institute of Technology and Science, TS, India E-mail:-sravani.pagadala82@gmail.com

**Mr.Sk Subhan**, received the Master of Technology degree in embedded systems from the VIDYA VIKAS INSTITUTE OF TECHNOLOGY-JNTUH, he received the Bachelor Of Technology degree from VNR-VJIE, JNTUH. He is currently working as assistant professor in the Department of ECE with Madhira Institute Of Technology And Sciences, kodad. His interest subjects are Embedded Systems, Microprocessors, measurement and instrumentation Systems.

**Mr.Devireddy Venkatarami Reddy,** received the Master of Technology degree in EMBEDDED SYSTEMS from the DR.PAULRAJ ENGINEERING COLLEGE-JNTUH, he received the Bachelor of Engineering degree from S.A. ENGINEERING COLLEGE-ANNA UNIVERSITY. He is currently working as Associate Professor and a Head of the Department of ECE with Madhira Institute of Technology And Sciences, kodad. His interest subjects are Embedded Systems, Microprocessors, Communication Systems, Digital Electronics and etc.

Volume No: 4 (2017), Issue No: 2 (February) www.ijmetmr.com