Fuzzy optimization FFT circuit for image processing

doi:10.15406/iratj.2018.04.00136

eISSN: 2574-8092

International Robotics & Automation Journal

Mini Review Volume 4 Issue 4

Fuzzy optimization FFT circuit for image processing

Zehong Cao

Verify Captcha

Regret for the inconvenience: we are taking measures to prevent fraudulent form submissions by extractors and page crawlers. Please type the correct Captcha word to see email ID.

EEG & BCI & Computational Intelligence, China

Correspondence: Zehong Cao, EEG & BCI & Computational Intelligence, China

Received: July 30, 2018 | Published: August 9, 2018

Citation: Cao Z. Fuzzy optimization FFT circuit for image processing. Int Rob Auto J. 2018;4(4):274-276. DOI: 10.15406/iratj.2018.04.00136

Download PDF

Abstract

In the FPGA implementation of FFT, a fuzzy approach is presented to reduce the multiplication numbers of twiddle factors and memory space, which speeds up the butterflies’ computation. Also, the design of address mapping can get position of data without calculation. Simultaneously, in combination of using the pipeline structure, the speed of the FFT for FPGA implementation can be increased. Moreover, the modules have been simulated by timing and verified by data to judge the correctness of the design. This design could be applied in signal processing and image processing.

Introduction

FAST Fourier Transform (FFT) is an effective and fast discrete Fourier transform (DFT) algorithm, which is the core of digital signal processing algorithms. FFT is widely used in radar, communications, image processing, signal detection and other fields, most of those fields call for the FFT processor with high speed and high precision real-time processing performance. In order to simplify the calculation and shorten operation time to one or two magnitudes, the ideology of FFT algorithm is sequentially dividing the N point DFT into short sequences of DFT for calculating. The introduction of Verilog Hardware Description Language (HDL) provided a modeling and simulation environment for fast prototyping digital circuits and systems on FPGA. After analyzing the radix-4 FFT algorithm, this project advanced a 64-point FFT processor implementation, which has many advantages.^1,2 This paper studies the content as follows: first, describe the FFT algorithm. Then, introduce the principle of FPGA architecture, performance and characteristics, combined with the theory of FFT algorithms to determine the FPGA devices as a FFT algorithm basis. After that, using the design tools QuartusⅡ, Modelsim and hardware description language Verilog to achieve the above algorithm. Moreover, analysis of existing design ideas implementation, and compare the various implementations. I find the design structure of the FFT processor, using pipeline structure, and generating address and twiddle factor quickly. Finally, simulate each module, and debug it.³ For the FFT processor, the analysis method is that generate a random signal as the input signal. First, simulate the data signal using FFT function in Mat lab. Second, put the input signal to the FFT processor, and observed data calculated. Finally, it is compared with Mat lab simulation results. The FFT algorithm is applied in image processing.^4,5

The FFT algorithm and system structure

DFT calculation

For the one-dimensional DFT, x(n) is a N numbers sequence, the DFT is defined as follows:

$X (k) = {\sum_{n = 0}^{N - 1} x (n) e}^{- j 2 π n k / N} = \sum_{n = 0}^{N - 1} x (n) W_{N}^{k n}$ $0 \leq k \leq N - 1$

Where this formula achieves transformation from the time domain to the frequency domain.

Cooley-Tukey algorithm

The Cooley-Tukey algorithm is used to calculate the multi-point DFT.⁶ For N-Point DFT, the number of complex multiplication is equal toN². Obviously, if N-point DFT is decomposed into several short sequences DFT, the complex multiplication in DFT will be reduced greatly. For example, dividing the N-Point DFT into two N2⁄2-Point DFT, the number of complex multiplication in DFT will be reduced to N2⁄2. Assume that $N = r_{1} r_{2}$ in two-dimensional Cooley-Tukey fast algorithm,⁷ calculating the Cooley-Tukey fast algorithm is divided into five steps:

First, the x(n) is rewritten as $x (n_{1}, n_{0})$ , using the formula:

$x (n) = x (r_{2} n_{1} + n_{0}) = x (n_{1}, n_{0}), {\begin{matrix} n_{1} = 0, 1, 2, ..., r_{1} - 1 \\ n_{0} = 0, 1, 2, ..., r_{2} - 1 \end{matrix}$

Second, dividing $r_{1}$ -point DFT into $r_{2}$ units, it can achieve $X_{1} (k_{0}, n_{0})$ .

Third, N numbers $X_{1} (k_{0}, n_{0})$ are multiplied by the corresponding twiddle factor $W_{N}^{k_{0} n_{0}}$ , which compose $X_{1} (k_{0}, n_{0})$ .

$X_{1} (k_{0}, n_{0}) = \sum_{n_{1} = 0}^{r_{1} - 1} x (n_{1}, n_{0}) W_{r_{1}}^{n_{1} k_{0}}, k_{0} = 0, 1, \cdot \cdot \cdot, r_{1} - 1$

Forth, dividing $r_{2}$ -point DFT into $r_{1}$ units, it can achieve $X_{2} (k_{0}, k_{1})$ .

Finally, collate sequence and obtain $X_{1} (k_{1}, k_{0}) = X (k)$ , where $k = r_{1} \times k_{1} + k_{0}$ .

$X_{1} (k_{1}, k_{0}) = X_{2} (k_{0}, k_{1})$

This project is used for a 64-point FFT processor.

$N = r_{1} \times r_{2} \times r$ , that is $64 = 4 \times 4 \times 4$ .

$x (n) = x (r_{1} r_{2} n_{2} + r_{3} n_{1} + n_{0}) = x (n_{2}, n_{1}, n_{0}), {\begin{matrix} n_{2 = 0, 1, 2, ...., r_{1} - 1} \\ n_{1} = 0, 1, 2, ...., r_{2} - 1 \\ n_{0} = 0, 1, 2, ...., r_{3} - 1 \end{matrix}$

Then,

$x (n) = x (16 n_{2} + 4 n_{1} + n_{0}) = x (n_{2}, n_{1}, n_{0}), {\begin{matrix} n_{2 = 0, 1, 2, 3} \\ n_{1} = 0, 1, 2, 3 \\ n_{0} = 0, 1, 2, 3 \end{matrix}$

FFT transformation from $x (n_{2}, n_{1}, n_{0}), n = 16 n_{2 +} 4 n_{1} + n_{0}$ to $X (k_{0}, k_{1}, k_{2})$ ，

$k = 16 k_{2 +} 4 k_{1} + k_{0}$ .

First stage: $x (n_{2}, n_{1}, n_{0})$ , Radix-4 transform to $G (k_{0}, n_{1}, n_{0})$ , twiddle factor $W_{64}^{k_{0} (4 n_{1} + n_{0})}$ ;

Second stage: $G (k_{0}, n_{1}, n_{0})$ , Radix-4 transform to $H (k_{0}, k_{1}, n_{0})$ , twiddle factor $W_{64}^{k_{1} n_{0}}$ ;

Third stage: $H (k_{0}, k_{1}, n_{0})$ , Radix-4 transform to $X (k_{0}, k_{1}, k_{2})$ .

Design system structure

Due to complete a 64-point FFT calculation function for the project, the schemes may be selected from the following table (Table 1).⁸ I use statistical results to select the most suitable circuit structure. If I select the radix-2 scheme, the BR2MDC is better relatively due to the 100 utilization rates and its hardware control is simple relatively. However, since the BR2MDC increases the number of the buffer units, the hardware area used is increased greatly. On the other hand, if I selected the radix-8 scheme, the hardware utilization rate of R8SDF is less than before, but its hardware control is very complicated.⁹ Thence, the radix-4 scheme is the best choice, and the entire circuit is divided into three levels. Moreover, the hardware utilization rate of R4SDF is few, and its hardware control is not complicated. Therefore, this project uses the radix-4 single-path structure (R4SDF).

Circuit Structure	Complex Multiplier	Complex Adder	Delay Units	Control Complexity
R2SDF	5	12	31	Simple
R2MDC	5	12	62	Simple
BR2MDC	5	12	190	Simple
R4SDF	2	24	63	Medium
R4MDC	6	24	60	Simple
R8SDF	1	48		Complex
R8MDC	7	48		simple

Table 1 statistical results to select the most suitable circuit structure

Experiment results

Figure 1 shows the RTL viewer result. Then, run the Test beach file, and the functional simulation result is described in Figure 2. Finally, check the FFT processor is correct, and Figure 3 shows this FFT processor is applied for image processing.

Figure 1 RTL viewer.

Figure 2 Functional simulations.

Figure 3 Image processing.

Conclusion

This paper analyses the feature of the radix-4 FFT algorithm, advances a pipeline hardware implementation structure, which can reduce consumption of resource ad achieve easily larger points (such as 256 points, 512 points) FFT expansion. The implemented FFT processor meets the high-speed real-time image processing requirements.

The accuracy of addition and multiplication will loss in each level of 64-point FFT. When the values are quantized to 10-bits, the input and output values are 10-bits. Calculating adder and multiplier at each level need to adjust operating bits. In the 64-point processor, there are six levels addition calculation and two levels multiplication calculation. This loss of precision in the 10-bits operation is not negligible. In the future, a good improving method is that append the adder protecting measurement at each level. For example, increasing 1-bit accuracy, six levels adder will increased 6-bits.