1、Review of AVS Audio Coding StandardAudio Video Coding Standard (AVS) is a second?generation source coding standard and the first standard for audio and video coding in China with independent intellectual property rights. Its performance has reached the international standard. Its coding efficiency i
2、s 2 to 3 times greater than that of MPEG?2. This technical solution is more simple, and it can greatly save channel resource. After more than ten years development, AVS has achieved great success. The latest version of the AVS audio coding standard is ongoing and mainly aims at the increasing demand
3、 for low bitrate and high quality audio services. The paper reviews the history and recent development of AVS audio coding standard in terms of basic features, key techniques and performance. Finally, the future development of AVS audio coding standard is discussed. Audio Video Coding Standard (AVS)
4、 ; audio coding; AVS1 audio; AVS2 audio 1 Introduction he Audio Video Coding Standard (AVS) workgroup of China was approved by the Science and Technology Department under the former Ministry of Industry and Information Technology of Peoples Republic of China in June, 2002 1. The goal of the AVS work
5、group is to establish the generic technical standards for high?quality compression, decompression, processing and representation of digital audio and video. The AVS workgroup also aims to provide the digital audio?video equipment and systems with high?efficient and economical coding/decoding technol
6、ogies 2. The formal name of AVS is “Information Technology?Advanced Audio and Video Coding”, including four main technical standards: system, video, audio, digital rights management and the supporting standards, such as conformance testing. The members of AVS workgroup are domestic and international
7、 institutions and enterprises that focus on the research of digital audio and video coding technology and the development of related products. Since 2002, the AVS audio subgroup has drafted a series of audio coding standards, including AVS1?P3, AVS1?P10 and AVS?Lossless. Since 2009, the AVS audio su
8、bgroup started drafting the next generation audio coding standards. To identify the difference between these two series, the former serial is called AVS1 audio coding standards and the later one is called AVS2 audio coding standards. The AVS1 audio coding standards has been finished and widely used
9、in various applications so far. The AVS2 audio coding standards is still under development and will be released soon. This paper reviews AVS audio coding standards. It is organized as follows. In section 2, the series standards of AVS1 audio including AVS1?P3, AVS1?P10 and AVS?Lossless will be intro
10、duced. AVS2 audio coding scheme are presented in section 3. At last, the conclusion is given in section 4. 2 The Development of AVS1 Audio Coding Standards The AVS audio subgroup started drafting the first generation AVS audio standard in 2003. The prime goal of the AVS audio subgroup is to establis
11、h an advanced audio codec standard with general performance equivalent or superior to MPEG AAC, on the premise of developing our own intellect property 2. The first generation of AVS audio codec standard includes three parts: Information TechnologyAdvanced Audio Video Coding Part 3: Audio ( AVS1?P3)
12、 , Information TechnologyAdvanced Audio Video Coding Part 10: Mobile Voice and Audio (AVS1?P10 or AVS1?P10 audio) , and AVS Lossless Audio Coding Standard (AVS?LS). Tables 1 and 2 show the development of AVS1 audio coding standards 3. 2.1 AVS1P3 After three years of effort, in 2005 the AVS working g
13、roup finished the first AVS audio standard. The audio codec supports the scalable audio coding and is applied in mass information industries such as digital broadcasting of high resolution, intense laser digital storage media, wireless wideband multimedia communication, wideband stream media on the
14、Internet and other related applications 1. The AVS1?P3 encoder supports mono, dual and multichannel PCM audio signal. One frame audio signal includes 1024 samples. It is separated into 16 blocks,every 128?point block with 50% overlap is hanning windowed. The transform length is determined by the lon
15、g/short window switching module: 2048 for long and 256 for short in order to accommodate both stationary and transient signals. The sampling rate ranges from 8 kHz to 96 kHz for the input signal. The output bitrate ranges from 16 kbps to 96 kbps per channel. The transparent audio quality could be pr
16、ovided at 64 kbps per channel. The compression ratio is 10-16 2. 2.1.1 Basic Encoding Process Fig. 1 shows the framework of the AVS1?P3 audio codec 2. Audio input PCM signals are analyzed by a psychoacoustic model. Then long/short window switch module determines the length of the analysis block depe
17、nding on the transients. The signals are transformed to frequency domain by integer Modified Discrete Cosine Transform (intMDCT). For stereo signals, Square Polar Stereo Coding (SPSC ) may be applied to the encoder if there are strong correlations between the channel pair. After that, frequency doma
18、in signals undergo nonlinear quantization. Context?Dependent Bitplane Coding (CBC) is used for entropy coding of quantized spectrum data. Finally the coded bits are written to the output bitstream based on the format defined in AVS1?P3 standard. 2.1.2 Key Technologies in AVS1?P3 The structure of the
19、 encoder in the AVS1?P3 standard is similar to that in Advanced Audio Coding?Low Complexity (AAC?LC). It has higher coding efficiency and better sound quality as a result of the use of new coding technologies, such as multi?resolution analysis, linear prediction in frequency domain, vector quantizat
20、ion and fine granularity scalable coding (FGSC). Window switching is applied to reduce pre?echoes. A two?stage window switch decision is recommended in the AVS audio coding standard, which is called Energy and Unpredictability Measure Based Window Switching Decision (ENUPM?WSD) 2. In the first stage
21、, one frame audio signal is separated into 16 blocks, and then the energy variation of every subblock is analyzed. If the maximum energy variation of a subblock meets a given condition, the second stage based on unpredictability measurement in the frequency domain is applied. Otherwise, it judges th
22、e window type by analyzing signal characteristics in the time domain and frequency domain. ENUPM?WSD has the merits of low complexity and high accuracy. Considering the extension of lossless compression for the future, AVS audio subgroup adopted Integer MDCT as the time?frequency mapping module inst
23、ead of traditional DCT 2. IntMDCT can be used in lossless audio coding or combined perceptual and lossless audio coding. The advantages of MDCT, such as a good spectral representation of the audio signal, critical sampling and overlapping of blocks are all reserved. SPSC is applied as an efficient s
24、tereo coding scheme in AVS. Compared with Mid/Side stereo coding in AAC, its coding efficiency is higher. When SPSC is applied, one channel transmits the bigger value of the channel pair, and the other transmits the difference. In terms of quantization noise, the final decoded audio noise is smaller
25、 in SPSC than in M/S. Because in SPSC, noise superposition happens at only one channel at decoder end, while in Mid/Side, noise disperses to both channels. In entropy coding, AVS adopts Context?dependent Bit?plane Coding (CBC). It is more efficient compared with Huffman coding. CBC entropy coding te
26、chnology gets an improvement of 6% of bitrate in comparison with Huffman coding at 64 kbps/channel. The most striking characteristic of CBC is its Fine Grain Scalability (FGS) . CBC coded bitstream are evenly layered (16 to 96 layers, each layer 1 kbps) , and as it changes from higher layer to lower
27、 layer, the audio quality downgrades from high to low, but still audible 1. 2.1.3. Performance Figs 2, 3 and 4 show the informal subjective listening tests between AVS1?P3 and other three most popular audio compression formats, MP3 (lame 3.96) , AAC (FAAC 1.24) and WMA (WMA 10) 1. The tests use ITU?
28、T P.800/P.830 test basic model. Four sequences, es02, sc02, si02, and sm02 were used in this test. These are speech, complex mixage audio, single instrument sound and simple mixage audio, respectively. Bitstreams were coded at 128 kbps. The following conclusion could be obtained from Figs 2-4: at 12
29、8 kbps, AVS1?P3 is superior to MP3 (lame 3.93) , the same as AAC (FAAC 1.24) , but slightly worse than WMA (WMA 10.0) 1. 2.2 AVS1P10 With the development of third?generation mobile communication, many challenges have arisen. There is a growing demand for low bitrate and high fidelity quality audio c
30、odec. At present, there have been many international audio standards for mobile applications such as G.XXX series standard (ITU?T) and AMR series standard (3GPP). In order to provide a mobile audio standard with independent intellectual property rights for the quickly developing mobile communication
31、 system, the AVS Audio Subgroup started drafting the AVS1?P10 in August 2005. The AVS1?P10 Final Committee Draft (FCD) was completed in December 2009. It is approved as the national standard in December 2013 and has wide applications in 3G communication, wireless broad?band multimedia communications
32、, broadband Internet streaming service and more. The advantages of AVS1?P10 include high efficiency, flexible compression quality, low complexity and strong error prevention mechanism 4. The encoder supports mono and stereo Pulse Code Modulation (PCM) signals with sampling rate of 8 kHz,16 kHz,24 kH
33、z,32 kHz,48 kHz,11 kHz,22 kHz and 44.1 kHz. The output bitrate for mono ranges from 10.4 kbit/s to 24.0 kbit/s, and for stereo, ranges from 12.4 kbit/s to 32.0 kbit/s. 2.2.1 Basic Encoding Process AVS1?P10 adopts the basic framework of AMR?WB+ (Fig. 5). It firstly convert the sampling frequency of t
34、he input signal into an internal sampling frequency FS. For mono mode, the low frequency (LF) signal adopts Algebraic Code Excited Linear Prediction/Transform Vector Coding (ACELP/TVC) codec mode, while the high?frequency signal is encoded using a bandwidth extension (BWE) approach. For stereo mode,
35、 the same band decomposition as in the mono case is used. The HF part of the left channel and right channel is encoded by using parametric BWE on the two stereo channels. The LF part of the left channel and right channel is down mixed to main channel and side channel (M/S). The main channel is encod
36、ed by ACELP/TVC module. The stereo encoding module processes the M/S channel and produces the stereo parameters 5. 2.2.2 Key Technologies in AVS1?P10 1) ACELP/TVC Mixed Encoding Module The core codec module in AVS1?P10 is ACELP/TVC mixed encoding module. AVS1?P10 codec integrates ACELP coding and th
37、e Transform Vector Coding (TVC) into a mixed orthogonal encoder. It can choose the best encoding mode between two coding modes according to the signal type. ACELP mode is based on time?domain linear prediction, so it is suitable for encoding speech signals and transient signals 6. On the other hand,
38、 TVC mode is based on transform domain coding, so it is suitable for encoding music signals. Thus it can encode a variety of complex audio signals. Several coding methods, such as ACELP256, TVC256, TVC512, and TVC1024, can be applied to one superframe. There are 26 mode combinations of ACELP/TVC for
39、 each superframe 7. The mode can be selected by adopting the closed?loop search algorithm or the open?loop search algorithm. The latter is relatively simple, but the mode selected may be not optimum. 2) High?Band Encoding AVS1?P10 adopts BWE approach to code HF signal with the frequency components a
40、bove FS/4kHz of the input signal. In BWE, energy information is sent to the decoder in the form of spectral envelop and gain 5. However, the fine structure of the signal is extrapolated at the decoder from the decoded excitation signal in the LF signal. Besides, in order to keep the continuity of the signal spectrum at the FS/4, the HF gain needs to be adjusted according to the correlation between the