## PRACTICAL PHYSICAL LAYER DESIGN OF 120 MBPS WIRELESS LINK WITH 16APSK MODULATION

Zhugang Wang<sup>1,2</sup>, Weiming Xiong<sup>2</sup>, Liguo Shi<sup>1</sup>, Hongjie Hou<sup>1</sup>, Jiaxin Liu<sup>2</sup> Jiaqiang Zhu<sup>2</sup> and Wenjian Zhao<sup>1</sup>

<sup>1</sup>University of Chinese Academy of Sciences No. 19 A, Yuquan Rd., Shijingshan District, Beijing 100049, P. R. China wangzg@nssc.ac.cn

<sup>2</sup>National Space Science Center Chinese Academy of Sciences No. 1, Nanertiao, Zhongguancun, Haidian District, Beijing 100190, P. R. China xwm@nssc.ac.cn

Received January 2017; accepted April 2017

ABSTRACT. High order linear modulation is still a relatively common choice between complexity, spectral efficiency and power amplifier compatibility. This paper demonstrates physical layer design of a communication system with 16 Amplitude Phase Shift Keying (16APSK) modulation, transmitting 120 Mbps of data within 45 MHz bandwidth in the C band. The main physical architecture of the design is based on the DVB-S2 standard and CCSDS Blue Book, but we optimize it in several aspects, including the roll-off factor for RF power amplifier linearity, frame structure for better delay performance, FPGA multipath pipeline design in insert filter for better throughout capacity, carrier phase recovery and lower complexity architecture. All these proposed modifications perform robustly and cost efficiently, and the final test shows that the symbol error rate result is satisfied in a moderate signal-to-noise ratio.

Keywords: 16APSK, PAR, TDD frame, Multi-path pipeline

1. Introduction. In many high bandwidth point-to-point wireless communication applications, Amplitude Phase Shift Keying (APSK) represents an attractive modulation scheme for digital transmission over nonlinear channels due to its power and spectral efficiency combined with its inherent robustness against nonlinear distortion [1]. The modulation and demodulation architecture are detailed in the second generation of the digital video broadcasting satellite standard, DVB-S2 [2], and have been adopted as standards of the CCSDS (Consultative Committee for Space Data Systems) Blue Book for high rate space telemetry applications [3], but the standard has many considerations regarding large carrier frequency offset, VCM (Variable Coding and Modulation) applicability, etc. Such complexities are unnecessary in fixed rate transmissions. Numerous results can be found in the literature that cover the optimal design of the Peak to Average Power Ratio (PAPR) and the linearization of power amplifier [4,5]. The second topic is about the frame structure, the DVB-S2 Physical Layer is dedicated to VCM [6], and AOS (Advanced Orbiting Systems) Space Data Link Protocol [7] is focusing on the throughput and flexibility, while we need a new structure optimizing for the shortest delay of the highest priority information source. There are few works which concern about the FPGA implementation of how to design the pipeline of the data stream [8,9], while we developed a new multi-path pipeline method.

Our team has modified many DVB-S2 16APSK standard parameters in the physical layer and data-link layer, combined with some techniques in CCSDS Blue Book for highrate space telemetry applications. The efforts have resulted in a large reduction in complexity. The hardware system is realized in Xilinx 7 series FPGA (Field Programmable Gate Array) and AD9364 zero-IF chip.

The rest of the paper is organized as follows. We will study the best roll-off factor, minimizing the PAPR in Section 2. Afterwards, a short length frame structure is proposed in Section 3. Section 4 presents the architecture of the digital demodulator. There are many improvements in receiving side digital signal processing described in Sections 5 and 6. End-to-end SER (Symbol Error Rate) performance is shown in Section 7 and the conclusions are drawn in Section 8.

2. Roll-off Factor and Power Back-off. With the conclusion that the optimal radius ratio is approximately 2.73 of 4+12APSK using the Minimum Euclidean Distance (MED) criterion [10], DVB-S2 has defined three Square Root Raised Cosine (SRRC) roll-off factor choices to determine the spectrum shape. These are  $\alpha = 0.35$ , as in DVB-S, and two others, namely  $\alpha = 0.25$ ,  $\alpha = 0.20$  [1,2], for tighter bandwidth shape restriction. In our communication system, the end user has not defined this parameter because the bandwidth of the signal is not the most important parameter.

For our RF power amplifier, our team chose the back-off linearity method, which is the least complicated strategy. We investigated the relationship between roll-off factors with PAR. After simulation, we found that the PAR is not monotonically decreasing while roll-off factors are increasing. The  $\alpha$ -PAR is presented in Figure 1.



FIGURE 1. PAR with different roll-off factors

The RF amplifier is consistent with the GaAs InGaP HBT driver amplifier and GaN power amplifier (31.5 dBm output). The Error Vector Magnitude (EVM) shown in Figure 2 is approximately 3.3% before the RF amplifier and 3.8% after the amplifier while back-off is 4.5 dB below P<sub>-1</sub> dB.

Figure 3 illustrates the spectral regrowth before and after the amplifier. To compare the sideband spectrum density, the peaks of the spectrums are aligned.

From Figure 3, we can reach the conclusion that the spectral regrowth has not changed the -30 dBc bandwidth of the signal and can be accepted.



FIGURE 2. EVM of the modulated signal



FIGURE 3. Spectral regrowth by the amplifier

| Pilot | Frame counter | Valid data domain | RS parity symbol |
|-------|---------------|-------------------|------------------|
| 4B    | 1B            | 219B              | 32B              |

| Figure 4. | Frame | structure | design |
|-----------|-------|-----------|--------|
|-----------|-------|-----------|--------|

| Block Longth Counter | Data domain   |      |       |      |      |      |      |  |
|----------------------|---------------|------|-------|------|------|------|------|--|
| DIOCK LENGTH COUNTER | Fiber channel | UART | Audio | BITE | Eth1 | Eth2 | Idle |  |
| 6B                   | P1            | P2   | P3    | P4   | P5   | P6   | P7   |  |

FIGURE 5. Valid data domain structure

3. Physical Layer Frame Structure Design. The frame structure consideration is based on several factors: the first factor is that the DVB-S2 Physical Layer is dedicated for VCM [4], but it is not necessary for fixed rate transmissions. The available channel coding IP (Reed Solomon Encoder and Decoder [11], not LDPC) is the second factor, and the requirement of multi-source multiplexer is the third factor. We design an inner frame multiplexing structure similar to Time Division Duplexing (TDD). The total byte of a frame is occupied by 512 APSK symbols (256B) with 8 pilot symbols (4B).

The valid data domain consists of 6 data sources and 1 idle channel with a Block Length Counter ahead of it. To reach the specification of the shortest delay of the highest priority information source, the 6 data source priorities are assigned from highest to lowest in Figure 5. The highest priority data channel will occupy all the bandwidth that the frame can carry. The inner frame multiplexing algorithm mostly improves the delay performance (only approximately 0.2 ms, without path delay) of the highest priority channel as shown in Figure 6.



FIGURE 6. Delay improvement with inner frame multiplexing

4. Architecture of the Digital Demodulator. The DVB-S2 standard employs the state of the art architecture of modem design [2], and the parameter is mainly based on Doppler effects caused by satellite movement, terminal LNB oscillator instabilities, etc. After studying all of the contributions, the worst case of the carrier frequency error is only approximately 60 kHz, (5 GHz carrier frequency and 3e-6 unitary reference frequency error over working temperature), which is 2 orders of magnitude lower than the 5 MHz described in DVB-S2 standard, and the symbol rate error is merely 180 Hz in a 30 M symbol rate. Based on these results, a digital demodulator with greatly reduced complexity is introduced in Figure 7.



FIGURE 7. Block diagram of the digital demodulator

The block diagram of the demodulator that is shown above only consists of the digital part; while the architecture of a demodulator comprises an LNA and Zero-IF RF part (AD9364) as well that directs down-convert the C band signal to 0. The matched filter is included in the chip. With the limitation of the digital interface transmission rate, the two I-Q components are sent to the FPGA at a 60 M sample rate, which is only two times the symbol rate (30 M); therefore, an up sample and half band filter is inserted before the Gardner symbol recovery block.

Symbol clock recovery was performed using the well-known Gardner's algorithm. The DAGC algorithm is detailed in [12]. The carrier phase recovery differs from the traditional methods and will be explained later in this paper.

The entire algorithm before Frame SYNCH is non-data aided and thus can be run without any frame synchronization in place.

To debug the FPGA implementation of the design, we have developed a Gigabytes Ethernet method that can transmit the intermediate signal to a PC for analysis. The isolation of all these components greatly reduced the debug time.

5. Multi-path Pipeline Design in Insert Filter. After the up sampling and half band filter, the sample rate is increased to 120 M samples per second, and the original form of the insert filter (Parabolic interpolation) shown in Figure 8 in the Gardner clock recovery did not meet the setup/hold timing requirement of the FPGA. From the timing report, the maximum clock frequency was only approximately 38 MHz when implemented in Xilinx XC7K325T.



FIGURE 8. Original form of the insert filter

The main concept of increasing the throughput is to divide the combinatorial logic into smaller parts and insert the register simultaneously. However, because the insert filter has many "Inputs", "Paths", and "Nodes", we developed a general method for dealing with this problem.

The model can be introduced by an example in Figure 9.

The expression of the output is:

$$Y_k = aX_k + bX_{k-1} + cX_{k-2}$$

In many cases, such as an insert filter, the delay version of the output can be accepted with all of the relative delay of the "path" being the same as illustrated in Figure 10.

The delay version of output formula is:

$$Y_{k-m} = aX_{k-m} + bX_{k-1-m} + cX_{k-2-m}$$

The "m" in the formula is the maximum clock delay in the module according to the pipeline design guideline. All the other paths will be added this number of delays and then the output will be the "m" delays version compared with the original version.



FIGURE 9. Example of multi-path pipeline



FIGURE 10. Delay version of the output



FIGURE 11. Multi-path pipeline model

In this diagram, the combinatorial logic "Three input adder" is divided into two adders, while the output logic maintains all of the path delays as relatively equal.

The model of multi-path pipeline can be described as in Figure 11. All of the alphabet letters are nodes that can be represented by combinatorial logic, and the lines (paths) denote the register or delay.

A concept "Influence Domain" that is introduced here can largely simplify the combinatorial logic decomposition and appending delay. "Influence Domain": all the neighbor nodes of it is divided into smaller ones and add to the register.

"Path Align Operating Procedure": add a delay in each "Influence Domain".

**Example 5.1.** When extracting the node "e" into "e1", "delay", and "e2", the "Influence Domain" is the neighbor node i/k, and we should add a register after these two nodes. There are many alterative choices such as adding a register after "i", "h", or "j" instead.

The modified insert filter in the Xilinx System Generator mode is displayed in Figure 12.



FIGURE 12. Multi-path pipeline mode of the insert filter

After the pipeline design, the timing report of the insert filter maximum frequency is improved to approximately 182 MHz.

6. Carrier Phase Recovery (CPR) Algorithm. As described before, the carrier phase error is much less than that in DVB-S2 specification. According to calculations, the CPR algorithm was satisfied with only a feedback phase recovery unit [13], and feed forward CPR or data-aided estimation is not required.

As mentioned before, frame synchronization is behind carrier phase recovery, and CPR or frame synchronization must deal with all of the possible 12 types of phase ambiguity within the 16APSK modulation. We chose a plan where the CPR algorithm detects only 4 inner ring constellations and the frame synchronization addresses the 4 types of phase ambiguity. The simulation and test results showed that it worked well.

After constellations selection, the four inner ring constellations can be defined as:

$$C_i = r_2 e^{j\theta_i + \Delta\theta} = x_i + jy_i$$

 $r_2$  is the constellation radius of the inner ring, and  $\Delta \theta$  is the phase error. The phase error can be extracted by multiplying the input with the conjugate of its ideal position:

$$sign\_conjugate(C_i) = \sqrt{2}e^{-j\theta_i} = a_i + jb_i \quad (a_i \in \{+1, -1\}, b_i \in \{+1, -1\})$$
$$C_i \cdot sign\_conjugate(C_i) = \sqrt{2}r_2e^{j\Delta\theta}$$

The image component of the error and its expression in Cartesian coordinates is:

$$\operatorname{Im}\left(\sqrt{2}r_2e^{j\Delta\theta}\right) = \sqrt{2}r_2\sin\Delta\theta = x_ib_i + a_iy_i$$

The phase detector gain will be:

$$k_{\theta} = \left. \frac{d(PD(\theta))}{d\theta} \right|_{\theta \to 0} = \sqrt{2}r_2$$

The  $k_{\theta}$  should be multiplied by 1/3 for the inner ring proportion.

The 5 kHz bandwidth of the loop is deduced by the phase lock time with a 60 kHz frequency offset. The loop SNR verification with this bandwidth is:

$$\overline{E_s} = \left(12 * R^2 E_{s2} + 4E_{s2}\right) / 16, \quad E_{s2} \approx 0.1712 \overline{E_s}$$

When  $E_s/N_0 = 18$  dB, the loop SNR is:

$$SNR = \frac{E_{s2}R}{2N_0B_L} = 0.1712 \frac{R}{2B_L} \frac{\overline{E_s}}{N_0} = 37.6 \,\mathrm{dB}$$

The loop is sufficiently stable.

7. End-to-End SER Performance. The demodulation loss in float point and fix point simulation is approximately  $0.1 \sim 0.3$  dB while  $E_s/N_0$  ranges from 18 dB to 21 dB.

The hardware end-to-end test of the modulation and demodulation loss is shown in Figure 13 (without channel coding and decoding). The result shows that the loss is slightly less than 2 dB@SER = 1e-3. The successor frame synchronization and RS channel decoding work well in SER when lower than 1e-3.



FIGURE 13. Hardware end-to-end SER test result

The  $E_s/N_0$  is calculated from the formula:  $\frac{E_s}{N_0} = \frac{C/R}{N_n NF}$ , while C is the power of the signal,  $N_n$  is -174 dBm/Hz, R is the symbol rate and NF is Noise Figure of AD9364.

For the difficulty of Noise Figure measurement in analog and digital mixed chips, the NF of AD9364 is treated as 4 dB@4.6GHz from its datasheet.

8. **Conclusions.** There will be many trade-offs in practical design of a real communication system, including the roll-off factor optimization, physical Layer Frame structure design, multi-path pipeline aligned operating procedure and phase recovery algorithm. The final test result shows that the symbol error rate is satisfied in a moderate signalto-noise ratio. All the efforts we have taken mainly focused on complexity reduction and increasing the robustness of the entire system.

According to our investigation, the main modulation and demodulation loss (about 2 dB) is the phase noise of the transmitter and receiver local oscillator, modeling, analyzing and testing all the factors of the phase noise are the further research problems.

## REFERENCES

- E. Casini, R. De Gaudenzi and A. Ginesi, DVB-S2 modem algorithms design and performance over typical satellite channels, *International Journal on Satellite Communication Networks*, vol.22, no.3, pp.281-318, 2004.
- [2] User Guidelines for the Second Generation System for Broadcasting, Interactive Services, News Gathering and Other Broadband Satellite Applications (DVB-S2), Version 1.1.1, ETSI Technical Report 102 376, 2005.
- [3] Flexible Advanced Coding and Modulation Scheme for High Rate Telemetry Applications, CCSDS 131.2-B-1, Blue Book, Issue 1, 2012.
- [4] M. Baldi, F. Chiaraluce et al., A comparison between APSK and QAM in wireless tactical scenarios for land mobile systems, *EURASIP Journal on Wireless Communications and Networking*, vol.2012, no.1, p.317, 2012.
- [5] M. Yang, D. Guo, K. Zhao and L. Lu, A nonlinear distortion compensation algorithm for 32APSK modulation over satellite channels, *Unifying Electrical Engineering and Electronics Engineering*, pp.1475-1482, 2014.
- [6] F.-W. Sun, Y. Jiang and L.-N. Lee, Frame synchronization and pilot structure for DVB-S2, International Journal on Satellite Communication Networks, vol.22, no.3, pp.319-339, 2004.
- [7] Recommendation for Space Data System Standards, AOS Space Data Link Protocol, CCSDS 732.0-B-2, Blue Book, Washington, DC, USA, 2006.
- [8] W. Sun, M. J. Wirthlin and S. Neuendorffer, FPGA pipeline synthesis design exploration using module selection and resource sharing, *IEEE Trans. Computer-Aided Design of Integrated Circuits* and Systems, vol.26, no.2, pp.254-265, 2007.
- M. Pecot, Enabling High-Speed Radio Designs with Xilinx All Programmable FPGAs and SoCs, Xilinx White Paper, WP445 (v1.0) January 20, 2014.
- [10] Y. Deng, L. Ma and Z. Wang, Optimal design of 16APSK constellation for simplified demapping algorithm, *ICIC Express Letters*, vol.9, no.6, pp.1643-1650, 2010.
- [11] TM Synchronization and Channel Coding, CCSDS 131.0-B-2, Blue Book, 2011.
- [12] R. De Gaudenzi and M. Luise, Design and analysis of an all-digital demodulator for trellis coded 16-QAM transmission over a non-linear satellite channel, *IEEE Trans. Communications*, vol.43, no.2/3/4, pp.659-668, 1995.
- [13] R. D. Gaudenzi, A. G. Fàbregas and A. Martinez, Turbo-coded APSK modulations design for satellite broadband communications, *International Journal of Satellite Communications and Networking*, vol.24, no.4, pp.261-281, 2006.