当前位置:首页 >> 经济学 >>

Silicon Realization of an OFDM Synchronization Algorithm


Silicon Realization of an OFDM Synchronization Algorithm
Stefan Johansson, Daniel Landstr?m and Peter Nilsson
Department of Applied Electronics, Competence Center for Circuit Design Box 118, 221 00 Lund, Sweden. E-mail: stefan.johansson@tde.lth.se, daniel.landstrom@tde.lth.se, peter.nilsson@tde.lth.se.
The purpose of this work is to implement this synchronization unit in an ASIC.

ABSTRACT
In this paper a hardware architecture for an OFDM synchronizer is presented. The proposed synchronization unit can be used in any OFDM system that uses a cyclic prefix. The algorithm is based on the correlation introduced by the cyclic prefix, which is exploited in the time domain where both time and frequency offset are estimated simultaneously. The synchronization unit also performs frequency correction, which means that no feedback to the analog parts is necessary. Although the algorithm is too complex to be implemented on today’s most powerful standard DSP, a hardware architecture that is optimized for the algorithm can be implemented with moderate complexity. The unit contains 32 kbit RAM and 5000 gates and the sample rate is 25 Msamples/s.

DPSK modulator

IFFT

Cyclic Prefix

D/A

Analog Front end

Analog Front End

A/D

Synchronizer

FFT

DPSK demod.

Figure 1: The mobile OFDM system.

2. SYNCHRONIZATION ALGORITHM
The Synchronization algorithm that is implemented is presented in [6]. The time and frequency offsets are estimated simultaneously using the correlation introduced by the cyclic prefix. The estimates are based on the log-likelihood function

1. INTRODUCTION
Interest in OFDM [1] has increased over the past few years. OFDM modulation schemes are now used in both wired and wireless systems and have been standardized for systems such as DAB, DVB and ADSL. OFDM has also been suggested for future generations of wireless systems, such as WLAN [2] and third generation mobile systems [3,4]. This work is a part of a Swedish research project in microelectronics, which is a cooperation between Swedish universities and industry. The purpose of the project is to implement a radio transceiver in CMOS, intended for fourth generation mobile systems. The project’s mobile radio system is a packet data system with 20 Mbit/s data rate in 25 MHz bandwidth at 1.8 GHz carrier frequency. It uses OFDM with 1024 subcarriers and a cyclic prefix length of 128 symbols. Each subcarrier is modulated with a 4- or 8-point Differential Phase Shift Keying (DPSK), which is differential in frequency. The system does not use any preamble or pilot tones, except for a reference tone for the differential modulation. A block diagram of the system is shown in Figure 1. The system is designed for bursty data, which means short packages, sometimes only one frame. If synchronization can be performed on the first frame, without adding pilots or preamble, latency can be kept at a minimum while maintaining the data rate efficiency. Since the synchronization unit is to be placed in a mobile terminal, low power consumption is also very important.

Λ (θ , ε ) = γ (k ) ? ρΦ (k )
where

γ ( m) ≡

m + M ?1 k =m

∑ r (k )r

?

(k + N )

is the correlation,

Φ (m) ≡

1 m + M ?1 2 2 ∑ r ( k ) + r (k + N ) 2 k =m

is the energy, and

ρ≡

SNR SNR + 1

From the log-likelihood function the time and frequency estimates are calculated. The timing estimate is given by

? θ ML = arg max{Λ(θ , ε )}

and the frequency estimate by

The frequency estimation is used for frequency correction by back rotating the symbols according to
j 2 πεn/ N ? r (n) = r(n)e

? ε ML = angle{γ (θ?ML )}.
To allow an efficient implementation the algorithm is simplified and a trade-off between complexity and performance is found. The simplified log-likelihood function is

By doing frequency correction in the digital domain, no feedback to the oscillator is necessary.

3. HARDWARE ? Λ (θ , ε ) = re{γ } + im{γ }
An example of a simplified log-likelihood function over three OFDM frames is shown in Figure 2.
600 500 400 300 200 100 0

0

500

1000

1500

2000

2500 3000 Time[samples]

3500

4000

4500

5000

Figure 2: The simplified log-likelihood function over 3 OFDM frames, at 10 dB SNR. The estimated time and frequency values are used for time and frequency synchronization. In the log-likelihood function, we see the peaks which position points to the start of an OFDM frame. The start of a peak is detected by a threshold mechanism, see Figure 3.

Threshold

Amplitude X 256 256 X

Frame Start

Figure 3. Method of finding the position of a peak. Amplitude is a schematic of Figure 3. When the amplitude is larger than the threshold a search is started. The search is performed for the next 256 samples, and the position, X, of the maximal amplitude is found. After X more cycles, the frame start signal is generated. The frame start signal is generated exactly 256 samples after the actual frame start and thus the input samples need to be delayed 1024 + 256 cycles.

The estimated computational requirement of the algorithm in [6] is more than 2000 MIPS. No single standard DSP today can reach such performance, which makes an implementation in ASIC the only reasonable solution. To obtain a good hardware implementation with respect to area and power consumption, some optimization procedures are necessary. How to optimize depends on the application and algorithm. It is preferable for several reasons to have a low chip clock rate. It is easier to supply an external clock and the power consumption will be lower. The clock rate is therefore chosen to be the same as the symbol rate, 25 MHz. This means all computations must be performed in one cycle. The latency of the algorithm is at least one frame and a few extra cycles are therefore not critical. As a result, pipelining is used to facilitate this design. The hardware size is dominated by memory, and the architecture should, therefore, be optimized with respect to memory usage. With this algorithm, a minimum of one frame must be buffered to be able to estimate the offsets, correct for them, and then demodulate. By performing the calculations on line, additional buffering is avoided. An extra delay of 256 symbols is used to give the algorithm enough time to find the frame start position. The resulting memory size is 1024 + 256 symbols. The memory bandwidth is also rather high, a total of 12*2*3*25*106 = 1.8Gbits/s. This will consume a lot of power especially if external memories are used. Low power on-chip memories are therefore prefered. The algorithm simplification is followed by word length optimization. The idea is to minimize bits in complex parts where large savings in size can be done. According to the specification, the in- and output data is 12 + 12 bits per symbol. This means that the main buffer must be 1280 * 24 bits. The frequency correction also requires 12 or more bits to avoid large rounding errors. However, calculation of the correlation can be done with fewer bits. Four bits from every input symbol is enough to calculate the correlation with 12 bits accuracy. After these optimizations, a hardware architecture is developed which is shown in Figure 4. The FIFOs are implemented with RAMs as shown in Figure 5. A ring counter is used to generate the address. The FIFO must be able to perform one read and one write every cycle. To avoid using a double port RAM, the word length of the RAM is twice that of the FIFO, but the length is only half. Since each access is twice as large, the number of accesses can be halved, alternating

between a read and a write every other cycle. Some of the FIFOs have an extra MUX for reset. The MUX will ensure that the FIFO output will be zeros until the first data that is written has passed through to the output.
12

Wait for peak

FIFO
1024*2*12 bits 4 Correlator γ Abs

FIFO
256*2*12 bits

CORDIC Derotator

12

1
After x cycles Peak detected

4

Controller ^ θ Counter

Update Angle Frame start 12

Count x cycles

3

2

Count 256 cycles

Arg ^ Max γ( θ ) 12

Calc Angle

ε

After 256 cycles

Figure 4: The top-level hardware architecture.
D RAM D
Addr R/W

Figure 7: Controller state diagram. The angle calculation is done with a CORDIC [7] unit, see Figure 8. CORDIC is an iterative algorithm that needs one iteration per significant bit in the output. Since the angle is needed with 12 significant bits to be able to correct for the frequency offset accurately, it takes 12 cycles to complete the calculation.

MUX

D

MUX

D

Ring Counter

Reset

Figure 5: The FIFO with optional reset. The averaging for the correlation is calculated by adding the new data to an accumulator and subtracting the 128 cycles old data as shown in Figure 6. To reset the correlator, the FIFO and the accumulator must have a reset mechanism.
Re( γ)

?

+/Shifter

Shifter

+

+/+ Im( γ)

A B

A*B

+
FIFO 128 words

?

D

Figure 6: The correlator. The “Arg Max” unit contains three registers to store the max amplitude and the corresponding correlation and counter value. If the incoming amplitude is larger than the stored max amplitude, the registers are updated with the new values. The unit is reset by the controller before a search for a new peak is begun. The procedure of finding the peak is handled by the controller. The state machine for the controller has three states that are cycled through in order, see Figure 7. 1. 2. 3. Search for peak start. Go to state two when amplitude > threshold. Count samples, x, to position of maximum amplitude. Go to state three after 256 samples. Delay for x samples. Generate frame start signal then go to state one.

?

+/Lookup Table

Angle

Figure 8: The CORDIC unit. The back rotation of the output symbols is also done with a CORDIC unit. 12 iterations are required to reach the required 12 bits accuracy of the output symbols. To produce one output symbol per every cycle, one rotation must be performed per cycle. The CORDIC unit is therefore pipelined in 12 stages, see Figure 9.

Re(r)

?

+/-

?

+/-

?

5. CONCLUSIONS
A hardware architecture for an OFDM synchronization unit is presented. The algorithm is computationally intensive, but by making proper simplifications, it is shown that a hardware implementation is feasible. Several methods to reduce the complexity with only minor decrease in performance are presented. The synchronizer unit can be used in any OFDM system that uses a cyclic prefix. The hardware architecture is implemented in VHDL and is currently being synthesized to a 0.35? standard CMOS process.

Im(r)

?

+/-

?

+/-

?

Angle

?
+/-

?
+/-

?
e

e

Figure 9: Two stages of the pipelined CORDIC.

4. RESULTS
The synchronization unit is able to perform both timing and frequency synchronization on the data in one frame. The timing accuracy is only within +/- a few samples, but, since DPSK modulation is used, this precision is sufficient. The unit will also be able to correct a frequency error of up to 50% of the inter-carrier spacing. Figure 11 shows the uncoded symbol error rate (SER) for an AWGN channel with no frequency offset and 4-DPSK modulation. As seen in Figure 11 the implemented synchronization algorithm has a very small performance degradation compared with a perfectly synchronized system.
10
0

6. ACKNOWLEDGMENTS
This work has Electronic Systems Strategic Research, National Board Development). been funded by the Integrated Program in the Foundation for and by NUTEK (the Swedish for Industrial and Technical

7. REFERENCES
[1] J. Bingham, “Muticarrier Modulation for Data Transmission: An Idea Whose Time Has Come”, IEEE Communications Magazine, pp. 5-14, May 1990. Broadband Radio Access Networks (BRAN); Inventory of Broadband Radio Technologies and Techniques, Technical Report, reference DTR/BRAN-030001, February 1998. Available from the ETSI Secreteriat, F06921 Sophia Antipolis Cedex, France. European Telecommunications Standards Institute (ETSI), OFDMA Evaluation Report - The Multiple Access Scheme Proposal for the UMTS Terrestrial Radio Air Interface (UTRA), Technical Document Tdoc 896/97, ETSI SMG meeting no.24, Madrid, December 1997. Available from the ETSI Secreteriat, F-06921 Sophia Antipolis Cedex, France. M. Wahlqvist, M. Ericsson, C. ?stberg, L. Olsson, M. Johansson, W. Ye, P. ?dling, O. Edfors, D. Landstr?m, J.J. van de Beek, and J. Martinez Arenas, `Description of Telia’s OFDM based proposal (working document in the OFDM concept group)', Technical Document Tdoc 180/97, ETSI STC SMG2 meeting no.22, Bad Aibling, Germany, May 12-16, 1997. Available from the ETSI Secreteriat, F-06921 Sophia Antipolis Cedex, France. P. H. Moose. “A Technique for Orthogonal Frequency Division Multiplexing Frequency Offset Correction”, IEEE Transactions on Communications, COM-42 (No. 10), pp. 2908-2914, Oct. 1994. J. J. Van de Beek, M. Sandell, and P.O. B?rjesson, “ML Estimation of Time and Frequency Offset in OFDM Systems”, IEEE Transactions on Signal Processing, vol. 45, no. 7, pp. 1800-1805, July 1997. B. Parhami, Computer Arithmetic in 20 Lectures, Lecture Notes, Aug 27, 1997.

[2]

10

?1

Uncoded Symbol Error Rate

[3]

10

?2

10

?3

[4]
10
?4

4

5

6

7

8

9 SNR [dB]

10

11

12

13

14

Figure 11: Plot of uncoded symbol error rate for perfect synchronization (solid) and hardware architecture (dotted). The synchronization unit, which will contain a total of 32 kbit RAM plus approximately 5000 gates and is clocked at 25 MHz. The design is verified using a bit and cycle true C model. The C model was used to simulate and test different performance and implementation tradeoffs. Post-synthesis simulations using Synopsys show that reaching a clock speed of 25 MHz is a simple task.
[5]

[6]

[7]


赞助商链接
相关文章:
...Algorithms,+Complexities+and+Challenges
realization of the inner receiver, especially for ...An overview on synchronization for OFDM can for ...ensure low power consumption and small silicon ...
OFDM系统中同步算法的研究
Synchronization Algorithm for OFDM Systems Li Peng ...So it’s not conducive to the realization of ...载波同步与载波不同步情况的示意图 a 0 ,i aN ?...
更多相关标签: