Implementation of RISC Processor Using Radix-4 & Radix-8 Booth Encoded Multi-Modulus Multipliers

V. Sandhya  
PG Scholar in VLSI Design,  
Department of ECE,  
Dhanekula Institute of Engineering and Technology,  
Ganguru, Krishna Dist., Andhra Pradesh, India.

K. Lakshmi Sowjanya  
Assistant Professor,  
Department of ECE,  
Dhanekula Institute of Engineering and Technology,  
Ganguru, Krishna Dist., Andhra Pradesh, India.

ABSTRACT:
Novel multi-modulus designs capable of performing the desired modulo operation for more than one modulus in Residue Number System (RNS) are explored to lower the hardware overhead of residue multiplication. Two multi-modulus multipliers that reuse the hardware resources amongst the modulo $2^n-1$, modulo $2^n$ and modulo $2^n+1$ multipliers by virtue of their analogous number theoretical properties are proposed. The former employs the radix-4 Booth encoding algorithm and the latter employs the radix-8 Booth encoding algorithm. The proposed radix-4 and radix-8 Booth encoded multi-modulus multipliers, the modulo-reduced products for the moduli $2^n-1$, $2^n$ and $2^n+1$ are computed successively. The RNS technique is used to reduce the number of partial products. This project work deals with the one of the applications, i.e., basic RISC Processor. The algorithm is modeled in Verilog HDL and the RTL code for the modulo multiplier is synthesized using cadence RTL compiler where the design is targeted for 180nm and 45nm TSMC technology with proper constraints in terms of area and power. The layout is targeted by Cadence SOC Encounter. One of the Applications, i.e. RISC Processor.

INDEX TERMS:  
Booth Multiplier, RISC Processor, RNS System.

I. INTRODUCTION:
Moduli choices of the forms \{2^n-1,2^n,2^n+1\} and \{2^n-1,2^n+1\} have receives significant attention because they offer very efficient circuits when considering the area x time^2 product and efficient converters from and to the binary system. Therefore, designing efficient modulo $2^n-1$ multipliers is an interesting issue. RNS is an unconventional number representation that is widely employed in addition-multiplication intensive applications like digital filters, FFT/DFT and cryptography. MMA is classified into fixed and variable multi-modulus architectures. Fixed multi-modulus Architecture (FMA) performs the modulo operation with respect to multiple moduli simultaneously, thereby maintaining parallelism among the moduli. However, in Variable Multi-modulus Architecture (VMA), modulo operations are performed serially, resulting in greater hardware savings.

II. RADIX-4 BOOTH ENCODED MULTI-MODULUS MULTIPLIER:
The radix-4 booth encoded digit is formatted using three bits: a sign bit and one hot encoded magnitude bits, and the proposed multi-modulus radix-4 booth encoder using radix-4 booth encoder (BE2) slices and one mux3 block is shown in fig.1. Booth encoding technique is applied to multi-modulus multiplier modulo \(2^n-1,2^n,2^n+1\) operations as the basis. The Radix-4 multiplication.

- The multiplier, \(Y\) in two’s complement form can be written as in
  \[Y = -Y_{n-1}2^{n-1} + Y_{2i}2^i; 0 < i < n-2\]
  It can be written as
  \[Y = (-2Y_{2i+1} + Y_{2i+2} + Y_{2i+1})2^{i}; 0 < i < n-2\]
  Radix-4 Booth recoding encodes multiplier bits into [-2,2].


Radix-8 Booth recoding encodes multiplier bits into [-4,4].

The radix-4 Booth encoded digit is formatted using three bits: a sign bit and one-hot encoded magnitude bits, m1i and m2i. The proposed multi-modulus radix-4 Booth encoder using radix-4 Booth Encoder (BE2) slices and one MUX3 block. The proposed multi-modulus radix-4 Booth encoder using N/2 radix-4 Booth Encoder (BE2) slices and one MUX3 block. (N=4) The input corresponding to the modulus 2^n-1, 2^n and 2^n+1 is selected when is “00,” “01,” “10,” respectively.

III. MULTI-MODULUS PARTIAL PRODUCT ADDITION FOR RADIX-4 BOOTH ENCODING
In Radix-4 it has three stages. Partial product addition is the final stage. The first two stages are Multi-Modulus Partial Product Generation stage and Bias generation stage. A multi-modulus addition can be implemented using a MUX3 in the carry feedback path that selects from 0 or. The multi-modulus addition for partial products in a CSA tree and a parallel-prefix two-operand adder. The parallel-prefix adder is constructed from the pre-processing (PP), prefix and post-processing blocks and the implementation of these blocks is shown fig2. The number of MUX3 blocks needed for multi-modulus partial product addition is n/2+2.

IV. MULTI-MODULUS PARTIAL PRODUCT ADDITION FOR RADIX-8 BOOTH ENCODING
In Radix-8 it has four stages. Partial product addition is the final stage. The first three stages are Multi-Modulus Partial Product Generation stage, Hard Multiple Generation stage and Bias generation stage. The most straightforward implementation of a final stage adder for two n-bit operands is a ripple carry adder, which requires a full adders (FAs). The carry-out of the ith FA is connected to the carry-in of the (i+1)th FA. The proposed multi-modulus partial product addition for radix-8 Booth Encoding is shown in Fig.3. A multi-modulus addition can be implemented using a MUX3 in the carry feedback path that selects from 0 or. The multi-modulus addition for partial products in a CSA tree and a parallel-prefix two-operand adder. The parallel-prefix adder is constructed from the pre-processing (PP), prefix and post-processing blocks and the implementation of these blocks. Number of MUX3 blocks needed for multi-modulus partial product addition is n/3+5.

V. RISC PROCESSOR:
The architecture of 64-bit RISC processor comprises of Control unit, general purpose register, ALU, Barrel shifter, universal shift register and accumulator. The control unit consists of two registers i.e. instruction register and instruction decoder. Instruction and data are fetched sequentially in order to reduce the latency in the machine cycle. Pipeline structure has been incorporated that further utilizes three execution cycle fetch, decode and execute. This pipeline structure helps in enhancing the speed of operation.
In fetch cycle, instruction and relevant data are inferred from the memory while in decode cycle, instruction and data drawn from the memory are bifurcated to activate component and data path for execution and in the execution cycle instruction is executed, data is manipulated and result is stored in the accumulator. The control unit accepts the op code and generate the signal that triggers the components and data path to work accordingly and perform the desired function. The control unit has two instruction decoders. These two decoders decode the instruction bits and direct the signal to either into ALU, universal shift register or barrel shift rotator. The operands are received from register A or register B. Upon receiving the operands from registers and the decoded instruction bits arithmetic and logical unit perform arithmetic and logical functions. Universal shift register and barrel shift rotator receives the input from register A and depending upon the decoded information perform the desired operation of either shifting or rotation and the result is stored in the accumulator register. Modules are the building blocks of a Processor.

![Fig:4: Architecture of RISC Processor](image)

**VLSIMULATION RESULTS:**

If A=X, B=Y, depending on the selection lines the operations are performed. If Modsel0=0, Modsel1=0 then it performs $2^n-1$ operation, Modsel0=0, Modsel1=1 then it performs $2^n$ operation and if Modsel0=1, Modsel1=0 then it performs $2^n+1$ operation. The operation is same for all.

![Fig:5:- Simulation Result for Radix4 32-bit](image)

![Fig:6:- Simulation Result for Radix8 32-bit](image)

![Fig:7:- Simulation Result for RISC Radix4 32-bit](image)

![Fig:8:- Simulation Result for RISC Radix8 32-bit](image)
VII. PERFORMANCE COMPARISON:

<table>
<thead>
<tr>
<th>Design</th>
<th>Area</th>
<th>Power(µW)</th>
<th>Delay(ns)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Technology</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>65</td>
<td>100</td>
<td>45</td>
<td>100</td>
</tr>
<tr>
<td>Radio(4-Bit)</td>
<td>50</td>
<td>120</td>
<td>120</td>
</tr>
<tr>
<td>Radio(4-Bit)</td>
<td>35</td>
<td>108</td>
<td>120</td>
</tr>
<tr>
<td>Radio(6-Bit)</td>
<td>40</td>
<td>120</td>
<td>120</td>
</tr>
<tr>
<td>Radio(6-Bit)</td>
<td>40</td>
<td>120</td>
<td>120</td>
</tr>
<tr>
<td>RISC-Radio(4-Bit)</td>
<td>45</td>
<td>120</td>
<td>120</td>
</tr>
<tr>
<td>RISC-Radio(6-Bit)</td>
<td>50</td>
<td>120</td>
<td>120</td>
</tr>
</tbody>
</table>

VIII. CONCLUSION & FUTURE SCOPE:
The equivalences in operations central to modulo multiplication, i.e., modulo negation, modulo reduction of binary weight, modulo multiplication by powers-of-two, and two-operand modulo addition for the three special moduli, $2^n-1$, $2^n$ and $2^n+1$ were demonstrated. New radix-4 and radix-8 Booth encoded modulo $2^n$ multipliers with architectures comparable to those of the corresponding modulo $2^n-1$ and modulo $2^n+1$ multipliers were introduced. With the correlation among modulo $2^n-1$, modulo $2^n$ and modulo $2^n+1$ operations as the basis, radix-4 and radix-8 Booth encoded multi-modulus multipliers that perform modulo multiplication for the three special moduli successively were developed. In RISC Processors and DSP applications modulo multiplier is major concern in calculations. Performing the multiplication process the products need to be reduced by using RNS. The future implementation of this paper is by using some other applications we can reduce the area and delay. Future work will be added by increasing the number of instructions and make a pipelined design with less clock cycles per instruction and includes the integration of the divider block.

IX. REFERENCES