

International Journal of Advanced Research in Computer and Communication Engineering Vol. 2, Issue 7, July 2013

# Design and Implantation of High Speed FFT Processor for OFDMA System Using FPGA

G.Purna Chandra Rao<sup>1</sup>, B.Ashok<sup>2</sup>, B.Saritha<sup>3</sup> Assoc. Prof, ECE, SVSIT, Warangal, India<sup>1</sup> Assoc. Prof ,ECE, SVSIT, Warangal, India<sup>2</sup> Asst.Prof,ECE,SREC,Warangal,India<sup>3</sup>

Abstract: OFDM (Orthogonal Frequency Division Multiplexing) is one kind of the techniques of MCM (Multi-Carrier Modulation), which belongs to the field of wireless communication. The basic idea of MCM is to modulate signals onto many carriers and then combine these signals to transmit out. The modulation system has an S/P converter, which converts the high rate serial data into lower rate paralleled data. Due to the lower rate, data has longer time duration which is usually far larger than the maximum delay of the signal transmitting channel, thus makes it possible to well-combat multi-path effect and easy to equalization at the receiver. In the downlink module of Orthogonal Frequency Division Multiple Access (OFDMA) system, there is a need of an alterable point FFT processor. Therefore, it is meaningful to design a FFT processor in which input data points could be alterable. This paper presents the design of a variable input FFT processor which meets the requirements of OFDMA system. To design OFDMA system, we select the 2D Fourier transform algorithm as the kernel algorithm, using VHDL language to present in detail a

design of two-stage pipeline structure, QuartusII and ModelSim SE for the simulation, and verify on the EP3C25Q240CSN chip FPGA.Simulation results show that the way of implementation which meets the IEEES02.16e standard, at the same time the data precision is 16 bits, limit the clock frequency of 100MHz, the overall timing design stability, and it could reach the scope of realtime processing

#### Keywords: OFDM, FFT, MCM, QAM.

#### I. INTRODUCTION

OFDMA system enables us to study the MCM (Multi next generation wireless communication systems such as Carrier Modulation) techniques by integrating an FFT WiMAX and 3G-LTE standard. processor with the existing OFDMA system

Multiplexing) has been used in the last years in applications that require a huge transmission rate like ADSL modem, wireless network (Wi-Fi 802.11), DVB (Digital Video Broadcasting) and DAB (Digital Audio Broadcasting).

In all these cases, one needs to implement an integrated circuit (IC) that performs the necessary chip functions. For prototyping circuits FPGA is better choice than using ASIC (application Specific Integrated Circuit), CUSTOM or SEMICUSTOM. FPGA implementations To make use of these modern technologies of data transmission, it is necessary to develop the efficient techniques of digital modulation. The OFDM is the biggest Utilized recently, principally, because it uses the efficient FFT algorithm to do the modulation. This paper describes implementation of an OFDM modem on FPGA using VHDL for a 31 subcarriers (channels). OFDM system using 64 points radix-4 FFT time decimation, a CORDIC implementation to perform the butterfly calculus, and each channel modulation will use a 4-QAM constellation. The system is divided in a transmission section and a reception section In modem communication systems Orthogonal Frequency Division Multiple (OFDM) plays a crucial role and it will be replaced by Orthogonal

The implementation of FFT processor for Frequency Division Multiple Access (OFDMA) in the

The OFDMA PRY is based on OFDMA The OFDM technique (Orthogonal Frequency Division modulation, which comprises of OFDM modulation as well as subcarrier allocation. Therefore, it is significant to focus more attention on wireless communication technology (IEEE STD 802.16) family This OFDMA system design uses a FFT processor which requires the implementation of a Fast Fourier transform (FFT) an efficient algorithm which computes (DFT) and its inverse. "The FFT has been called the most important numerical algorithm of our lifetime. The FFT is a fast algorithm for computing the DFT, If we take the 2-point DFT and 4-point DFT and generalize them to 8-point, 16 point,....., we get the FFT algorithm. There are many distinct FFT algorithms involving a wide range of mathematics, from simple complex-number arithmetic to group theory and number theory.

> A DFT decomposes a sequence of values into components of different frequencies. This operation is useful in many fields, but computing it directly from the definition is often too slow to be practical. An FFT is a way to compute the same result more quickly: computing a DFT of N points in the naive way, using the definition, takes O(N2) arithmetical operations, while an FFT can compute the same result in only O(N log N) operations. The difference in speed can be substantial, especially for long data sets where N may be in the thousands or



### International Journal of Advanced Research in Computer and Communication Engineering Vol. 2, Issue 7, July 2013

millions, in practice, the computation time can be reduced by several orders of magnitude in such cases, and the improvement is roughly proportional to N / log (N). This huge improvement made many DFT-based algorithms practical. FFTs are of great importance to a wide variety of applications, from digital signal processing and solving partial differential equations to algorithms for quick multiplication of large integers. The most well known FFT algorithms depend upon the factorization of N, but (contrary to popular misconception) there are FFTs with O(N log N) complexity for all N, even for prime N Many FFT algorithms only depend on the fact that  $e^{-\frac{2\pi i}{N}}$  is an

Nth primitive root of unity, and thus can be applied to analogous transforms over any finite field, such as number-theoretic transforms.

#### II. AN OVERVIEW OF AN OFDMA SYSTEM

**Orthogonal frequency-division multiplexing (OFDM)**, essentially identical to **coded OFDM (COFDM)** and **discrete multi-tone modulation (DMT)**, is a frequency-division multiplexing (FDM) scheme used as a digital multi-carrier modulation method. A large number of closely-spaced orthogonal sub-carriers are used to carry data. The data is divided into several parallel data streams or channels, one for each sub-carrier. Each sub-carrier is modulated with a conventional modulation scheme (such as quadrature amplitude modulation or phase-shift keying) at a low symbol rate, maintaining total data rates similar to conventional *single-carrier* modulation schemes in the same bandwidth.

OFDM(Orthogonal frequency –division multiplexing) has developed into a popular scheme for wideband digital communication, whether wireless or over copper wires, used in applications such as digital television and audio broadcasting, wireless networking and broadband internet access.

#### A. Block Description of an OFDMA system



Fig 1. OFDMA system block structure

The system maps the input bits into complex-valued symbols X(n) in the modulation block, which determines the constellation scheme of each subcarrier. The number of bits assigned to each subcarrier is based on the signal to noise ratio of each subcarrier on the frequency range.

The adaptive bit loading algorithm will be detailed below. The IFFT block modulates X(n) onto N orthogonal subcarriers. A cyclic prefix is then added to the multiplexed output of IFFT. The obtained signal is then converted to a time continuous analog signal before it is transmitted through the channel. At the receiver side, an inverse operation is carried out and the information data is detected.

#### B. Definition and speed

An FFT computes the DFT and produces exactly the same result as evaluating the DFT definition directly; the only difference is that an FFT is much faster. (In the presence of round-off error, many FFT algorithms are also much more accurate than evaluating the DFT definition directly, as discussed below.)

Let  $x_0$ , ....,  $x_{N-1}$  be complex numbers. the computational problem for the DFT is to compute the sequence  $\{X(k)\}$  of *N* complex-valued numbers given another sequence of data  $\{x(n)\}$  of length *N*, according to the formula

$$\begin{split} X(k) &= \sum_{n=0}^{N-1} x(n) W_N^{kn} , \qquad 0 \le k \le N-1 \\ & W_N = e^{-j2 \, \pi/N} \end{split}$$

In general, the data sequence x(n) is also assumed to be complex valued. Similarly, The IDFT becomes

$$x(n) = \frac{1}{N} \sum_{n=0}^{N-1} X(k) W_N^{-nk}, \qquad 0 \le n \le N-1$$

Evaluating this definition directly requires  $O(N^2)$  operations: there are N outputs  $X_k$ , and each output requires a sum of N terms. An FFT is any method to compute the same results in  $O(N \log N)$  operations. More precisely, all known FFT algorithms require  $\Theta(N \log N)$  operations (technically, O only denotes an upper bound), although there is no known proof that better complexity is impossible.

Since DFT and IDFT involve basically the same type of computations, our discussion of efficient computational algorithms for the DFT applies as well to the efficient computation of the IDFT.

We observe that for each value of k, direct computation of X(k) involves N complex multiplications (4*N* real multiplications) and *N*-1 complex additions (4*N*-2 real additions). Consequently, to compute all N values of the DFT requires  $N^2$  complex multiplications and  $N^2$ -N complex additions.

Direct computation of the DFT is basically inefficient primarily because it does not exploit the symmetry and periodicity properties of the phase factor  $W_N$ . In particular, these two properties are :

Symmetry property:  $W_N^{k+N/2} = -W_N^k$ Periodicit y property:  $W_N^{k+N} = W_N^k$ 

Copyright to IJARCCE

www.ijarcce.com



#### International Journal of Advanced Research in Computer and Communication Engineering Vol. 2, Issue 7, July 2013

The computationally efficient algorithms described in this the result do 2 points FFT. Similarly, Calculation of the section, known collectively as fast Fourier transform (FFT) algorithms, exploit these two basic properties of the phase factor.

III. DESIGN STRUCTURE OF AN OFDMA SYSTEM The block diagram of FFT processor is shown in below Fig 2.

A. Block diagram of FFT processor:



Fig. 2. A FFT processor block structure



Fig. 3. Pipeline structure of the 64 point FFT processor

#### 1) Architecture design

For the variable point FFT processing, if cascade method and pipeline structure was used "ping-pong" memory architecture adopted in the design, each level need a lot of memory, so the system becomes too large. Therefore, this paper used 2D FFT algorithm to design the FFT processor, not only improved the system level of the parallelism, but also reduced the demand for memory systems.

#### 2) The overall design of the FFT processor

The FFT module is the core part in the OFDMA system, IEEE Std 802.16-2005 defined clearly: the core module of OFDMA physical layer is the FFT module, which can be used in the FFT points are: 2048 points (back compatible with IEEE Std 802.16-2004), 1024 points, 512 points and 128 points.

The design about variable point FFT processor is just based on FFT module in OFDMA system application. According to the idea of two-dimensional Fourier algorithm, i.e. N = 128, NJ = 2, N2 = 64. From 128 = 2 \* 64, we find that when achieve 128-point FFT.

Firstly the data is arranged in 64 lines and 2 rows, Secondly the input data will transform the 64 points FFT, then the result multiplies twiddle factor, Thirdly, let

Copyright to IJARCCE

512-point FFT firstly do the 64 points FFT, then transform further 8 points FFT; The same way to calculate the 1024 and 2048 points FFT. Block diagram of the overall design of the FFT processor as shown in Fig.2. & In Fig.3. We see that the architecture of the overall design described by VHDL language.

#### 3) 64-point FFT module

This module is the most frequently used in the design. Four kinds of input data length all must first pass through the 64 points FFT module. The same idea in the module based on 2D Fourier transform algorithm is composed of two 8-point FFT modules. So 8-point FFT module is the kernel in this part, its performance affects the whole design.

The 8-point FFT processor architecture consists of a single radix-2 butterfly (which is referred as the butterfly processing element), a dual-port FIFO RAM, a coefficient ROM, a controller and an address generation unit.

#### 4) Select and control module

This part is the kernel to complete the alterable data length in the design. It is based on the input data points to select the results stored in the middle data memory, and then choose the way to next flow. A two bits signal 'mode' is chosen as the mode signal.

When mode = 00, means to choose 2 points FFT module, to complete the 128-point FFT; when mode = 01, means to choose 8 FFT module, to complete the 512-point FFT; Equally: When the mode = 10, means to completed 1024 point FFT; when mode = 11, means to complete the 2048 point FFT.

#### 5) Mapping the constellation

The encoder of the constellation maps the m bits of the channel in a point a + jb in the constellation of the modulator. Decoding receives that point and the remap as the m transmitted bits.

#### 6) Encoder of the constellation

It is important to notice that in that mapping it is just made a conversion of bits for the faster that acts, however it is not made any modulation, as in the case of QAM, because that as shown, it is done by IFFT.

It is necessary to specify how the constellation will be to be mapped, to implement that block. However, independently of the format of the constellation, the block encoder can be made through a consultation at a conversion table, implemented by LUT that exists in LCs of FPGAs. For instance, for a 4-QAM constellation in such a way that a and b are binary numbers of 3 bits, and are converted to complement two.

www.ijarcce.com



#### International Journal of Advanced Research in Computer and Communication Engineering Vol. 2, Issue 7, July 2013

#### 7) Decoder of the constellation

In the receiver, the point of the constellation transmitted it can have changed due to the noises of the transmission channel, mistake in the time of sampling of the receiver and several other causes. Therefore it is necessary to define a threshold so that it can be made the decision on which point in the constellation the received sign is acting. That is the function of the decoder. For the system exemplified above the bit 0 is converted for 010b and the bit 1 for 110b. In that case, the decoder is implemented in a simple way, sticks to the most significant (that indicates the sign) bit to do the decoding, and generating a binary number of m bits again. For systems in that the constellation diagram is larger than 4- x(4n+2), x(4n+3), n = 0, 1, ..., N/4-1.

$$\begin{split} X(p,q) &= \sum_{l=0}^{3} \left[ W_{N}^{lq} F(l,q) \right] W_{4}^{lp} \\ F(l,q) &= \sum_{m=0}^{(N/4)-1} x(l,m) W_{N/4}^{mq} \\ p &= 0,1,2,3; \quad l = 0,1,2,3; \quad q = 0,1,2,..., \frac{N}{4} - 1 \end{split}$$

and

$$x(l,m) = x(4m+1)$$
$$X(p,q) = X(\frac{N}{4}p+q)$$

Thus the four N/4-point DFTs F(l, q)obtained from the above equation are combined to yield the N-point DFT. The expression for combining the N/4-point DFTs defines a radix-4 decimation-in-time butterfly, which can be expressed in matrix form as

$$\begin{bmatrix} X(0,q) \\ X(1,q) \\ X(2,q) \\ X(3,q) \end{bmatrix} = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & -j & -1 & j \\ 1 & -1 & 1 & -1 \\ 1 & j & -1 & -j \end{bmatrix} \begin{bmatrix} W_N^0 F(0,q) \\ W_N^q F(1,q) \\ W_N^{2q} F(2,q) \\ W_N^{3q} F(3,q) \end{bmatrix}$$

Note that each butterfly involves three complex multiplications, since WN0 = 1, and 12 complex additions.

#### **IV. SIMULATION RESULTS**

The simulation results are shown in Fig 4. & Fig 5.

PSK it will be necessary the implementation of more advanced methods, like a neural network.

#### B. Radix-4 FFT Algorithm

When the number of data points N in the DFT is a power of 4 (i.e., N = 4v), we can, of course, always use a radix-2 algorithm for the computation. However, for this case, it is more efficient computationally to employ a radix-r FFT algorithm.

Let us begin by describing a radix-4 decimation-in-time FFT algorithm briefly. We split or decimate the N-point input sequence into four subsequences, x(4n), x(4n+1),



Fig.4 A 1024 point output waveform



Fig. 5 memory architecture of 1024 point algorithm

#### V. SYNTHESIS RESULTS

In this section synthesis results corresponding to the processor are detailed. Beyond that point, the standard hardware synthesis flow is adopted in generating a net list, and estimating parameters such as gate count, area, power dissipation, etc. Synthesis results were generated using Xilinx, a powerful synthesis tool

Xilinx provides details about the area occupied by the different blocks. The technology setting was for 120nm CMOS. The value of Clock Speed (C.S.) used is 100 MHz, the area corresponding to logic blocks comes to 2.03mm<sup>2</sup>. There would be a fractional increase of approximately 20% after routing, giving an area of 2.44mm<sup>2</sup>



## International Journal of Advanced Research in Computer and Communication Engineering Vol. 2, Issue 7, July 2013

As can also be noted, the dominant area is by the memories, here chosen to allow the processor capability to scale up to an order of a 1024-point FFT.

An advantage of the twiddle factor driven approach is that, when not required, the twiddle factor memory can be turned off. This advantage becomes more prominent towards the later stages of the FFT, as the number of butterflies that make use of the same twiddle factor set increases.

#### A. FFT processor synthesized output



Fig. 6. A synthesized FFT Processor Chip

The above shown Fig. 6. is the a synthesized FFT processor chip which is obtained after simulation and synthesizing the High Speed FFT processor for OFDMA System Using FPGA.

B. Internal structure of an FFT processor



Fig. 7. A complete Architecture of FFT Processor

### VI. CONCLUSION

In keeping with the design choices enumerated in this section, the FFT processor description is carried out in Xilinx, and the corresponding application kernel is written in VHDL language.

The design uses a variable point FFT processor designed using FPGA and was applicable to OFDMA system

The processor design uses VHDL language to describe the circuit, used Xilinx software to build the model and to verify the Timing diagram.

A FFT algorithm is chosen in comparison to a DFT algorithm owing to its suitability towards the twiddle factor driven approach.

Twiddles are fetched and stored on processor memory as opposed to generating twiddles dynamically. Also incorporated is a scheme for twiddle ping ponging and inter-stage rearrangement for vectorial handling of twiddles.

Dual ported memories are used as they allow the application kernel to adopt either the constant geometry approach, or the in-place approach. The in-place approach is adopted in present work with the aim of using a twiddle factor driven approach.

The result shows that, the successful completion of the design, altered input points, FFT computation, design precision 16-bit, and limited to 100MHz clock

Frequency, the overall timing design stability and FFT processing was correct.

This design can be applied to Real time processing system, completes the main computing modules in the OFDMA system.

#### REFERENCES

- [1] IEEE Std 802.16eTM-200S and IEEE Std 802.16TM-2004/Corl-200S.
- [2] WiMAX Forum, Krishna Ramadas and Raj Jain, "WiMAX System Evaluation Methodology" version 2.1 July 2008.
- [3] 3GPP TR 25.814 v.I.S, Physical Layer Aspects for Evolved UTRA(release 7), May 2006.
- [4] Multiband OFDM Alliance.
- [5] Multiband OFDM Physical layer Proposal for IEEE 802.15 Task group 3a IEEE P802.15-03/141r1, March 2004.
- [6] Multiband OFDM Physical layer Proposal Update IEEE P802.15-04/0220r1, May 2004.
- [7] Multi-band OFDM: a new approach for UWB Batra, A.; Balakrishnan, J.; Dabak, A.; Circuits and Systems, 2004. ISCAS '04. Proceedings of the 2004 International Symposium on Volume 5, 23-26 May 2004 Page(s):V-365 - V-368 Vol.5.
- [8] Zhang Zhu-jun, " Optimum Design of FFT Processor with Cascade Structure Based on FPGA," J. Nanjing University of Science and Technology. 2009. in press.
- [9] Xie Yan-Iin. "Design and Implementation of a Scalable Pipeline FFT

Processor Based on FPGA" D. Xi Dian University.2007, in press.

- [10]An Algorithm for Machine computation of Complex Fourier Series J.W. Cooley, J. W. Tukey, Mathematics of Computation, vol. 19, pp
- 297-301, April 1965. [11]Aligning harmonics of signals to DFT grid Luo Lichun; Xiao
- Xianci; Antennas and Propagation Society International Symposium, 2003. IEEE Volume 3, 22-27 June 2003.



International Journal of Advanced Research in Computer and Communication Engineering Vol. 2, Issue 7, July 2013

#### BIOGRAPHY



**G. Purna Chandra Rao** received his B.Tech degree in Electronics & Communication Engineering from Vaagdevi College of Engineering, Warangal, JNT University, Hyderabad, received his Master's degree from Anurag College of Engineering, Kodad,

JNT University. He is working as Associate Professor in the department of Electronics & Communication Engineering, SVS Institute of Technology, Hanamkonda, Warangal, Andhra Pradesh. He is Life member of ISTE and member of IEEE. His research areas include Digital Signal Processing, Digital Systems and VLSI Systems. He is author for many papers on journal and conference proceedings.



**B.Ashok** received his B.Tech degree in Electronics & Communication Engineering from Sri Kavitha Engineering College, Karepally, JNT University, Hyderabad, received his Master's degree from Sri Kavitha Engineering College, Karepally, JNT

University,He is working as Associate Professor in the department of Electronics & Communication Engineering ,SVS Institute of Technology, Hanamkonda, Warangal, Andhra Pradesh . He is Life member of ISTE and member of IEEE. His research areas include Digital Signal Processing, Digital Systems and Signals& Systems.



**B. Saritha** received her B.Tech degree in Electronics & Communication Engineering from Sai Spurthi Institute of Technology Sattupally, JNT University, Hyderabad. She received her Master's degree from Anurag College of Engineering, Kodad. JNT

University. She is working as Assistant Professor in the department of Electronics & Communication Engineering, SR Engineering College, Ananthasagar, Warangal, Andhra Pradesh. She is Life member of ISTE. Her research area includes Signal Processing.