Power and Area Efficient High Speed Squaring Circuit using Vedic Mathematical Techniques

Gumpina Vakula Rani¹, Dr. Bhaskar Reddy²
Research Scholar, Rayalaseema University, Kurnool, Andhra Pradesh¹
Professor and Head, S K University, Ananthapur²

Abstract: Vedic Mathematics is an ancient practice which has a unique procedures based on the sixteen Sutras dealing with the different branches of mathematics viz., arithmetic, calculus, algebra, geometry etc. The interest for high speed processing has been growing as an effect of wide computer processing applications. Very efficient math operations are important to accomplish the desired performance in several real-time systems such as Cryptography and Image Processing. Squaring is one of the key arithmetic operations in many applications like finding the transforms or the inverse transforms in signal processing and the development of fast squaring circuit has been a subject of interest over decades. The key requirements for many applications are to reduce the time delay and power consumption for many applications. In this paper, we have proposed power and area efficient fast squaring circuit using Vedic Multiplier and the proposed design is compared with the Vedic Multiplier and squaring circuits. The functionality of this circuit is verified and performance evaluation is done using Xilinx ISE design Suite 14.4 on target device xc5vlx20t-2ff363.

Keywords: Vedic Mathematics, Vedic multiplier, Squaring Circuit, Carry Save Adder, Shifter, Combinational Path Delay, Slices, Digital signal processors.

1. INTRODUCTION

Digital signal processors (DSPs) are very important in different disciplines. Fast multiplication and squaring are extremely significant operations in convolution, Fourier transformations and so on. Integer multiplication and squaring play important roles in microprocessors, DSP and other modern electronic machines. Multiplication and Squaring are the fundamental mathematical operations and require considerably more hardware circuitry and execution time than addition and subtraction. A CPU spends a lot of its time for performing numerical computations including multiplications and thus fast arithmetic coprocessors are required for improving the speed of the PC.

The demand for high speed processing has been increasing as a result of expanding computer and signal processing applications. Highly efficient arithmetic operations are necessary to achieve the desired performance in many real-time systems, cryptography and digital image processing applications. Squaring is one of the key arithmetic operations in many applications like finding the transforms or the inverse transforms in signal processing and the development of fast squaring circuit has been a subject of interest over decades. Reducing the time delay and power consumption are very the essential requirements for many applications.

Squaring is a fairly large block and the amount of circuitry involved is directly proportional to the square of its resolution i.e. a square of size n bits has n² bits as an output. For squaring algorithms performed in DSP applications latency and throughput are the two major concerns from the delay perspective. Latency is the real delay of computing a function and measures how long the inputs to a device are stable. The squaring circuit has not only a high delay but also a major source of power dissipation. One of the aim is to minimize the power consumption, it is of incredible interest to reduce the delay by utilizing the different optimization techniques. In this paper, an efficient and high speed squaring circuit using Recursive Vedic Multiplier based on “Urdhva Tiryagbhayam” is proposed which is faster and elegant when compared to the conventional methods.

This paper is structured as follows. In section 2, related work and the brief introduction to Vedic sutras is presented. Section 3, describes the Methodology, which describes 2x2 Vedic multiplier, basic 2bit squaring unit and 4bit squaring Logic. In section 4, the proposed high speed squaring architecture is presented. In section 5, design analysis and simulation results are presented. Finally, the conclusions are presented in section 6.

2. RELATED WORK

The word “Vedic” is derived from the word “veda” which means the store-house of all knowledge. Vedic Mathematics / Ancient Indian Mathematics comprises several simple and unique techniques through which problems associated with different branches of mathematics could be solved very efficiently. This consists of algorithms that can bring down large arithmetic operations to simple mind calculations. Because of that fact the Vedic mathematics approach is totally...
different and considered very close to the way a human mind works. This system was actually rediscovered from our ancient scriptures during 1911 and 1918 by Sri H. H. Bharati Krishna Tirtha swamiji [1]. “Urdhva Tiryagbhayam” is one of the powerful sutras among 16 sutras of Vedic mathematics, which means “Vertically and Cross wise”. The idea of using Vedic mathematics for design of multipliers has been discussed in [6-9]. ‘Urdhva Tiryagbhayam’ Sutra is shown to be an efficient multiplication algorithm as compared to the conventional counterparts. Authors of [11] have also shown the effectiveness of this Sutra to reduce NxN multiplier structure into an efficient 4x4 multiplier structure. However, they have mentioned that 4x4 multiplier section can be implemented using any efficient multiplication algorithm. Recently, a squaring circuit has been reported in the literature [13]. This may be noted that designing Vedic multipliers using array multiplier structures as discussed in above references provide us less delay and, thus, they are treated as high speed multipliers as compared to Booth’s algorithm using recorded multipliers and Wallace trees. Later, a high speed squaring circuit for binary numbers is proposed [10] using carry save adders (CSA). CSA is used to perform the doubling. In this proposed architecture, CSA is replaced with Left shifter to perform the doubling. The circuitry required for CSA is complex when compared with the Shifter. The proposed Squaring circuit uses one Vedic multiplier instead of four as in recursive Vedic multiplier for evaluating square of a n-bit binary number.

3. METHODOLOGY

In this section, a complete explanation about the important modules are presented which comprises the circuits of i) 2X2 Vedic Multiplication ii) 2-Bit Squaring and iii) 4-Bit Squaring.

A. 2X2 Vedic Multiplication

The advantage of Vedic Mathematics is that the complex calculations can be solved with simple techniques. The Vedic multiplier using Urdhva Tiryagbhayam sutra of size NXN will produce the 2N-1 partial products of different sizes which when combined forms (log2N + 1) partial products. The cross products are obtained by vertical and crosswise operations using the Sutra. Hence the delay is equal to adder delay. Critical path would consist of adders adding the maximum number of bits in cross product. The 2X2 Vedic multiplier is implemented using a two input AND gates and two half adder as shown in Figure 1. The implemented equations are written as

\[
P_0 \text{ (1-bit)} = A_0 \cdot B_0 \\
P_1 \text{ (1-bit)} = A_1 \cdot B_0 + A_0 \cdot B_1 + C_1(P_1(1)) \\
P_3P_2 \text{ (2-bits)} = A_1B_1 + C_2(P_2(1)) \\
\text{Product} = P_3P_2P_1P_0
\]

Figure1: 2X2 Vedic Multiplier

B 2-Bit Squaring Operation

The basic squaring circuit can be realized from 2X2 Vedic multiplier as shown in figure 2 [10]. The squaring circuit can be implemented using one half adder and one AND gate instead of two half-adders as shown in Figure 2.

\[
P_0 \text{ (1-bit)} = A_0 \\
P_1 \text{ (1-bit)} = 0 \\
P_3P_2 \text{ (2-bits)} = A_1A_0 + A_1 \\
\text{Product} = P_3P_2P_1P_0
\]

Figure2 : 2 Bit Squaring Circuit
C. 4-Bit Squaring Circuit
A 4-bit squaring circuit[10] is implemented using two 2-bit squaring circuits (as shown in Fig.2) and one 2X2 Vedic Multiplier (as displayed in Fig.1) instead of four 2X2 Vedic Multiplier modules used Recursive Vedic Multiplier. The general block diagram of the 4-bit squaring circuit is shown in Figure 3.

4. PROPOSED HIGH SPEED SQUARING ARCHITECTURE
The proposed high speed 4 bit Squaring circuit is implemented using one 2x2 Vedic multiplier ,two 2 bit squaring circuits, one left shifter instead of Carry Saver Adder to perform the doubling . The circuitry required for CSA is complex when compared with the Shifter. The beauty of Vedic multiplier is that the partial product generation and additions are done concurrently. This in turn reduces the combinational path delay.

1. Algorithm for Proposed for 4 bit Squaring Architecture

<table>
<thead>
<tr>
<th>Input: 4 bit Binary Number ((A_3A_2A_1A_0))</th>
<th>Output: Square of 4 bit Binary Number ((P_7P_6P_5P_4P_3P_2P_1P_0)).</th>
</tr>
</thead>
<tbody>
<tr>
<td>Step 1: Divide the number ((A_3A_2A_1A_0)) into parts((A_3A_2)) &amp; ((A_1A_0)).</td>
<td>Step 2: Perform the following operations.</td>
</tr>
<tr>
<td>a. Square ((A_1A_0)) using 2 bit Squaring Logic(equations 5 to 8) output is (T1) (4 bit)</td>
<td>a. (P0=\text{T1}(0)) &amp; (P1=\text{T1}(1))</td>
</tr>
<tr>
<td>b. Multiply ((A_3A_2)) with ((A_1A_0)) using Vedic Multiplier (equations 1 to 4)output is (T2) (4 bit).</td>
<td>b. (a. \text{Multiply } (A_3A_2)) with ((A_1A_0)) using Vedic Multiplier (equations 1 to 4)output is (T2) (4 bit).</td>
</tr>
<tr>
<td>c. Square ((A_3A_2)) using 2 bit Squaring Logic (equations 5 to 8) output is (T3) (4 bit).</td>
<td>c. (\text{Square } (A_3A_2)) using 2 bit Squaring Logic (equations 5 to 8) output is (T3) (4 bit).</td>
</tr>
<tr>
<td>Step 3: Apply Left Shift operation to (T2) output is (T4) (5 bit).</td>
<td>Step 4: Add (T4) with (T1(3..2)) output is (T5) (6 bit). (P2 = T5(0) &amp; P3 = T5(1))</td>
</tr>
<tr>
<td>Step 5: Add (T3) with (T5) and the output is (P7P6P5P4) (4 bit).</td>
<td>Step 6: Print the product (P7P6P5P4P3P2P1P0).</td>
</tr>
</tbody>
</table>

![Figure 3: 4 Bit Squaring Circuit](image1)

![Figure 4: Proposed 4 Bit Squaring Circuit](image2)
Similarly, 8-bit, 16-bit and 32-bit squaring circuits are designed and implemented using recursive Vedic multiplier module and squaring circuit modules of 4-bit, 8-bit and 16-bit respectively. The n-bit squaring circuit can be implemented by taking one (n/2)-bit recursive Vedic multiplier module and two (n/2)-bit squaring circuits. The module level comparison between Vedic multiplier, squaring circuit and proposed squaring circuit is presented in table: 1.

<table>
<thead>
<tr>
<th>Bit size</th>
<th>Module Description</th>
<th>VM</th>
<th>SC</th>
<th>PSC</th>
</tr>
</thead>
<tbody>
<tr>
<td>2 bit</td>
<td>AND Gate</td>
<td>4</td>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td></td>
<td>Half Adder</td>
<td>2</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>4 bit</td>
<td>2X2 VM</td>
<td>4</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td></td>
<td>2 bit SC</td>
<td>-</td>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td></td>
<td>4/5 bit Adders</td>
<td>2</td>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td></td>
<td>4 bit CSA</td>
<td>1</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td></td>
<td>4 bit Shifter</td>
<td>-</td>
<td>-</td>
<td>1</td>
</tr>
</tbody>
</table>

5. DESIGN ANALYSIS AND SIMULATION RESULTS

In this work, 4-bit, 8-bit, 16-bit and 32-bit squaring circuits are implemented using Xilinx ISE design Suite 14.4 and ModelSim SE 6.5 in VHDL. The synthesis is done using Xilinx vertex5 FPGA board of device family xc5vlx20t-2ff363. VHDL Coding was done for the functionalities using the structural modelling and the design summary and timing reports are obtained. The results are compared and presented in Table: 2. Figure 5 shows the comparison graph of 4 Bit Squaring in terms of i) 4 Input LUT's ii) Number of Slices iii) Number of XOR gates iv) Logic Levels and v) Combinational Path Delay. The comparison graph of 8 Bit, 16 Bit and 32 Bit Squaring are shown in Figures 6, 7 and 8. From the above results, it is clear that the Proposed Squaring Circuit (PSC) is faster than the Recursive Vedic Multiplier (VM) and Squaring Circuit (SC). The proposed architecture has advantages like i) Increase in the speed ii) Decrease in the delay iii) Decrease in the area iv) Decrease in power consumption.

<table>
<thead>
<tr>
<th>4 Input LUT's</th>
<th>Slices</th>
<th>XOR gates</th>
<th>Logic Levels</th>
<th>Combinational Path Delay</th>
</tr>
</thead>
<tbody>
<tr>
<td>VM SC PSC</td>
<td>VM SC PSC</td>
<td>VM SC PSC</td>
<td>VM SC PSC</td>
<td>VM SC PSC</td>
</tr>
<tr>
<td>4bit</td>
<td>33 6 6</td>
<td>20 5 5</td>
<td>25 18 14</td>
<td>7 3 3</td>
</tr>
<tr>
<td>8Bit</td>
<td>134 55 54</td>
<td>53 23 23</td>
<td>125 86 70</td>
<td>15 11 11</td>
</tr>
<tr>
<td>16Bit</td>
<td>606 259 219</td>
<td>269 126 106</td>
<td>549 346 298</td>
<td>25 22 18</td>
</tr>
<tr>
<td>32Bit</td>
<td>2562 1256 1256</td>
<td>1183 473 454</td>
<td>2293 1338 1210</td>
<td>43 36 36</td>
</tr>
</tbody>
</table>

Figure 5 - 4 bit circuit comparison
Figure 6 - 8 bit circuit comparison
The following observations are made after comparing the results of the Proposed Squaring Circuit (PSC) with Recursive Vedic Multiplier (VM) of 32-bit number.

- Number of 4 Input LUT’s used is reduced by 50.97%
- Number of Slices is reduced by 60.01%
- The Combinational path Delay or critical path delay is reduced by 14.696%
- The number of XOR gates used is reduced by 47.23%.
- Number of logic levels is decreased by 16.27%.

Figures 9 and 10 shows the RTL schematic and Technology Schematic view of 4 Bit Proposed Squaring Circuit (PSC). The Functional Verification of 4 Bit, 8 Bit, 16 Bit and 32 Bit of the proposed Squaring Circuit is presented in the Figure 11.
6. CONCLUSION

The execution of the proposed Squaring system turned out to be efficient in terms of area, power and speed. Because of its systematic and parallel arrangement, it can be implemented at the chip level. For the performance evaluation 4-bit, 8-bit 16-bit and 32-bit squaring circuits were considered in order to validate and substantiate the theoretical conclusions with simulation results.

From the experimental outcomes, we find that there is an improvement in combinational path delay of the proposed strategy. Even with respect to the number of input-output buffers required for implementation of the functionality, the proposed method provides considerable savings. An optimal solution with respect to combinational path delay is achieved by the reduction in the logic levels and subsequently lesser resource utilization when mapped onto a technology target.

The present approach is highly innovative and guarantees improved performance which is apparent from results achieved from the different cases (4-bit, 8-bit, 16-bit and 32-bit) considered for the study. Finally, it is concluded that the current method is highly innovative and elegant and gives a brilliant platform for future work.

REFERENCES