Power Optimized Programmable Truncated Multiplier and Accumulator Using Reversible Adder

Syeda Mohtashima Siddiqui  
M.Tech (VLSI & Embedded Systems)  
Department of ECE  
G Pulla Reddy Engineering College (Autonomous)  
Kurnool, A.P, India.

G Ramesh, M.Tech, (Ph.D)  
Assistant Professor,  
Department of ECE  
G Pulla Reddy Engineering College (Autonomous)  
Kurnool, A.P, India.

Abstract:  
In most of the DSP applications truncated multiplication is used to reduce the complexity in terms of hardware. The normal truncation will reduce the power, area, complexity and timing which may lead to high error rate in the output result. In this paper, a proposed programmable truncated multiplier (PTM) is presented which programmable truncation is enabled. In a PTMAC a full width multiplier is implemented but the number of bits to be truncated can be done at run time based on the application. By selecting active sections of a partial product matrix we can achieve it. The PTMAC provide further reduction in the power consumption by using reversible adder designed by using reversible gates. Efficient design and Power Calculation of the PTMAC in a custom DSP processor is also presented in this paper. The power results are obtained from the XILINX power analyzer.

Key words:- ALU, Reversible logic gates, PTMAC, DSP

I. INTRODUCTION

Within every DSP systems multipliers are the most important as well as fundamental building blocks in terms of power consumption. Also multiplication is frequently required in signal processing. Generally in signal processing, a truncation is performed for word size reduction. In increasing demand for portable communications and the computing devices and also the advances in mobile multimedia systems has made the systems to use efficient multipliers. Direct multiplier implementations [1] of an n x n bit multiplication yield a 2n-bit product. Hence, the DSP architecture would need an ever-growing bit width in order to keep the full accuracy of the system that would be complicated for implementation. To overcome this problem the results are usually rounded down or truncated[2]. In this paper the architecture for the PTM describes a full precision multiplier. The advantages of the PTM includes

- A Dynamic power reduction
- And Flexibility on accuracy selection

The paper is organized as follows. In Section II a review of truncated multiplication is described. The proposed architecture of the PTM is presented in Section III. Section IV describes the custom DSP. In Section V design of reversible adder has been presented. In Section VI the simulation results are analyzed. Section VII describes the power comparisons of the Existing and Proposed systems. Finally, the conclusion section VIII summarizes the effectiveness of the architecture presented.

II. TRUNCATED MULTIPLICATION

In systems where it is not essential to compute the exact least significant part of the product of multiplication and truncated multipliers achieve area, power and timing improvements by skipping the computation of the least significant part of the partial product generation. The direct truncation is a simplest scheme to obtain the truncated multiplier by removing lower columns of a partial product matrix which form the least significant part of a product result. Hence truncation helps in complexity reduction by eliminating the partial product matrix lower parts in multiplier unit[3]. Due to this it may lead to errors in
truncation. By performing this kind of truncation, a significant savings in power and complexity can be obtained, at the expense of an signal degradation. With direct truncation, multiplier requirements are almost reduced to half in both power and area which leads to more errors.

III. PROGRAMMABLE TRUNCATED MULTIPLIER

In general DSP systems need flexibility to support a generation of large output results in which the magnitude of the output is bigger than the multiplier inputs. And also they need to deal correctly with an small size operands. Since, in direct truncation small size operands cannot be possible to use. The proposed multiplier architecture implements a full width 16x16 multiplier\(^5\) and programmable truncation. Hence the truncation can be controlled with an enable signal at the runtime. Thus the presented architecture provides a method to adjust the active width of a multiplier in column-wise manner, to allow a flexible truncation scheme that makes the system capable of adapting the reduced power consumption as per our requirements. The following equation represents the programmable truncated multiplication.

\[
P_{\text{PTM}} = X \times Y = 2^1 + 2^{-N} \sum_{i=0}^{N-2} t(i+j) \times \text{ppt}[i,j] \times 2^{2N+2}
\]

Equation 1 represents the truncated multiplication for signed numbers. Whereas \(t(i+j)\) is the control input of size( 2N-1) and ppt(i,j) is the terms in the partial product matrix representation. The column based controllability of a partial product terms can be achieved by replacing the 2-input AND gates with 3-input AND gates. Every 3 input AND gate have the inputs \(a, b,\) corresponding \(t(i+j)\) bit. So, when the truncation is made programmable it may result in 20 % hardware increase in partial product matrix implementation\(^6\). A vector \(t\) can determine the number of bits to be truncated. If the value of \(t=0x7fff\) in the case of an 8x8 multiplier, then there may not be any effect of truncation in the output result. If the value of \(t=0x7f00\), which disables the LSB of the partial product matrix as of in direct truncation multiplier. Hence by selecting a suitable value of “\(t\)” we can have achieve any level of truncation.

IV. ARCHITECTURE OF THE CUSTOM DSP

In this paper, a custom DSP with a programmable truncated multiplier is presented to analyze the advantages as well as disadvantages of the DSP in terms of truncation error and reduction in power consumptions. The DSP is designed with minimum control logic\(^5\). Within the DSP system the arithmetic and logic parts are included in PTM are most important part. The PTM provides the dynamic power reduction benefits also make the DSP system capable of adjusting the accuracy as per the requirements of applications. Fig.1 shows the architecture of the custom DSP\(^5\). It consists of a control unit that operates in a 5 stage pipelined mode, two data memory blocks, program memory, ALU and finally the input and output port. The following gives the description of the main components of the DSP architecture.

Control unit:

The control unit is a simple 5 stage pipeline in which instructions are fetched and decoded to control the data flow, controls the ALU operations. The main aim of the design of the control unit is to reduce the power consumption of the internal blocks other than the arithmetic block. It allows the access of a two data memory blocks and program memory block during the instruction read operation.

Custom Instruction Set:

A custom instruction set is implemented for the DSP so as to maximize the utilization of the ALU. Optimization power is achieved with this procedure in truncated multiplier. All the Instructions designed are 32 bits wide.

Memory blocks:

The memory blocks include two data memory blocks a program memory block. Each data memory size is 512 x16 bits and the program memory is of size 1024x32 bit. The data memory is used to store and load data and program memory is used for storing the instructions. It
is possible to access all the three memory blocks in a single clock cycle.

**Arithmetic and logic unit:**
The ALU contains the 16 bit programmable truncated multiplier, and a 40 bit carry select adder, and a 40 bit barrel shifter/rotator and a 40 bit accumulator. The ALU has a multiply and accumulate structure. The arithmetic unit consists of

**PTM:**
The PTM is designed to operate as a standard 16x16 bit multiplier that enables a programmable truncation. For that it includes an extra control input for enabling and disabling the columns in the partial product matrix. Thus the extra control input “truncation control” is used to control the truncation level of the multiplier.

**Barrel shifter:**
a 40 bit barrel shifter/rotator is used for shifting as well as rotating the accumulated output. The shifter performs left shifting, left rotation, right shifting and right rotation on the 40 bit accumulator output.

**Accumulator:**
A 40 - bit accumulator stores the final result of the arithmetic operations. It is constructed from D flipflops.

**Carry Select Adder:**
A 40 - bit carry select adder is used for addition as well as subtraction operations. The carry select adder is a simple but high speed adder.

**V. FULL ADDER USING TWO PERES GATES**

**Peres Gate**
Fig 2 shows a 3x3 Peres gate. The input vector is I (A, B, C) and the output vector is O (P, Q, R). The output is defined by P = A, Q = AB and R= AB C.

![Fig 2: PERES Gate](image)

<table>
<thead>
<tr>
<th>A</th>
<th>B</th>
<th>C</th>
<th>P</th>
<th>Q</th>
<th>R</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

**Table1: Truth table of Peres gate**

A full adder victimization 2 Peres gates is as shown in Fig 3. The quantum realization of this shows that its quantum price is eight 2 Peres gates are used. One 4x4 reversible gate referred to as PFAG gate with quantum price of eight is employed to appreciate the multiplier factor.

![Fig 3: Full adder using Peres Gates](image)

**VI. SIMULATION AND SYNTHESIS RESULTS**

Xilinx ISE Design Suite 9.2 platform is used for the simulation and synthesis of the proposed architecture. The hardware description language verilog HDL is used for designing of the proposed architecture.
VII. IMPLEMENTATION AND POWER MEASUREMENTS

One of the important aim of the proposed PTMAC architecture is to reduce the power dissipated by the 16x16 multiplier. The programmable truncation designed in the multiplier using reversible logic gates could provide the power reduction benefit. Xilinx ISE Design Suite 9.2 platform provides a XPower Analyzer tool for estimating the power utilization by the whole system. The power consumed by the conventional 16x16 PTMAC multiplier is compared with power consumed by the proposed Reversible logic gates 16x16 programmable truncated multiplier.

<table>
<thead>
<tr>
<th>Power summary</th>
<th>RmA</th>
<th>PmW</th>
</tr>
</thead>
<tbody>
<tr>
<td>Total estimated power consumption</td>
<td>134</td>
<td></td>
</tr>
<tr>
<td>Vccint 1.20V</td>
<td>27</td>
<td>32</td>
</tr>
<tr>
<td>Vccint 2.50V</td>
<td>18</td>
<td>45</td>
</tr>
<tr>
<td>Vcc25 2.50V</td>
<td>23</td>
<td>56</td>
</tr>
<tr>
<td>Inputs:</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>Logic:</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>Outputs:</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>Signals:</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>Quiescent Vccint 1.20V</td>
<td>27</td>
<td>32</td>
</tr>
<tr>
<td>Quiescent Vccint 2.50V</td>
<td>18</td>
<td>45</td>
</tr>
<tr>
<td>Quiescent Vcc25 2.50V</td>
<td>2</td>
<td>5</td>
</tr>
</tbody>
</table>

VII. CONCLUSION

Here a new method for truncation is presented and also compared with the previous methods. The use of programmable truncation provides power reduction benefits in the custom DSP architectures. The observed results are obtained from the Xilinx ISE 9.2. The multiplier with programmable truncation can be used in applications where accuracy and power consumption of the system is needed to adjust as per the requirements of applications.

REFERENCES


[7] Thapliyal H, M. B. Sshrinivas.” a brand new Reversible TSG Gate and Its Application for coming up with economical Adder Circuits”. Centre for VLSI and Embedded System Technologies International Institute of data Technology, Hyderabad, 500019, India


[12] Thapliyal H, M. B. Sshrinivas.” a brand new Reversible TSG Gate and Its Application for coming up with economical Adder Circuits”. Centre for VLSI and Embedded System Technologies International Institute of data Technology, Hyderabad, 500019, India


