FAST Fourier Transform (FFT) is an effective and fast discrete Fourier transform (DFT) algorithm, which is the core of digital signal processing algorithms. FFT is widely used in radar, communications, image processing, signal detection and other fields, most of those fields call for the FFT processor with high speed and high precision real-time processing performance. In order to simplify the calculation and shorten operation time to one or two magnitudes, the ideology of FFT algorithm is sequentially dividing the N point DFT into short sequences of DFT for calculating. The introduction of Verilog Hardware Description Language (HDL) provided a modeling and simulation environment for fast prototyping digital circuits and systems on FPGA. After analyzing the radix-4 FFT algorithm, this project advanced a 64-point FFT processor implementation, which has many advantages.1,2 This paper studies the content as follows: first, describe the FFT algorithm. Then, introduce the principle of FPGA architecture, performance and characteristics, combined with the theory of FFT algorithms to determine the FPGA devices as a FFT algorithm basis. After that, using the design tools QuartusⅡ, Modelsim and hardware description language Verilog to achieve the above algorithm. Moreover, analysis of existing design ideas implementation, and compare the various implementations. I find the design structure of the FFT processor, using pipeline structure, and generating address and twiddle factor quickly. Finally, simulate each module, and debug it.3 For the FFT processor, the analysis method is that generate a random signal as the input signal. First, simulate the data signal using FFT function in Mat lab. Second, put the input signal to the FFT processor, and observed data calculated. Finally, it is compared with Mat lab simulation results. The FFT algorithm is applied in image processing.4,5
DFT calculation
For the one-dimensional DFT, x(n) is a N numbers sequence, the DFT is defined as follows:
Where this formula achieves transformation from the time domain to the frequency domain.
Cooley-Tukey algorithm
The Cooley-Tukey algorithm is used to calculate the multi-point DFT.6 For N-Point DFT, the number of complex multiplication is equal toN2. Obviously, if N-point DFT is decomposed into several short sequences DFT, the complex multiplication in DFT will be reduced greatly. For example, dividing the N-Point DFT into two N2⁄2-Point DFT, the number of complex multiplication in DFT will be reduced to N2⁄2. Assume thatin two-dimensional Cooley-Tukey fast algorithm,7 calculating the Cooley-Tukey fast algorithm is divided into five steps:
First, the x(n) is rewritten as , using the formula:
Second, dividing -point DFT into units, it can achieve.
Third, N numbersare multiplied by the corresponding twiddle factor, which compose.
Forth, dividing -point DFT into units, it can achieve.
Finally, collate sequence and obtain, where.
This project is used for a 64-point FFT processor.
, that is.
Then,
FFT transformation from to ,
.
First stage: , Radix-4 transform to , twiddle factor ;
Second stage: , Radix-4 transform to , twiddle factor ;
Third stage: , Radix-4 transform to .
Design system structure
Due to complete a 64-point FFT calculation function for the project, the schemes may be selected from the following table (Table 1).8 I use statistical results to select the most suitable circuit structure. If I select the radix-2 scheme, the BR2MDC is better relatively due to the 100 utilization rates and its hardware control is simple relatively. However, since the BR2MDC increases the number of the buffer units, the hardware area used is increased greatly. On the other hand, if I selected the radix-8 scheme, the hardware utilization rate of R8SDF is less than before, but its hardware control is very complicated.9 Thence, the radix-4 scheme is the best choice, and the entire circuit is divided into three levels. Moreover, the hardware utilization rate of R4SDF is few, and its hardware control is not complicated. Therefore, this project uses the radix-4 single-path structure (R4SDF).
Circuit Structure |
Complex Multiplier |
Complex Adder |
Delay Units |
Control Complexity |
R2SDF |
5 |
12 |
31 |
Simple |
R2MDC |
5 |
12 |
62 |
Simple |
BR2MDC |
5 |
12 |
190 |
Simple |
R4SDF |
2 |
24 |
63 |
Medium |
R4MDC |
6 |
24 |
60 |
Simple |
R8SDF |
1 |
48 |
|
Complex |
R8MDC |
7 |
48 |
|
simple |
Table 1 statistical results to select the most suitable circuit structure
This paper analyses the feature of the radix-4 FFT algorithm, advances a pipeline hardware implementation structure, which can reduce consumption of resource ad achieve easily larger points (such as 256 points, 512 points) FFT expansion. The implemented FFT processor meets the high-speed real-time image processing requirements.
The accuracy of addition and multiplication will loss in each level of 64-point FFT. When the values are quantized to 10-bits, the input and output values are 10-bits. Calculating adder and multiplier at each level need to adjust operating bits. In the 64-point processor, there are six levels addition calculation and two levels multiplication calculation. This loss of precision in the 10-bits operation is not negligible. In the future, a good improving method is that append the adder protecting measurement at each level. For example, increasing 1-bit accuracy, six levels adder will increased 6-bits.