Max's Blog

Jan 03 2008

音频编码概览(Audio Formats Overview)

Category: 技术 — ssmax @ 12:49:54

Format Description

AAC AAC means “Advanced Audio Coding”, and in the beginning it was also called MPEG-2 NBC for “Non-Backwards Compatible” as opposed to the MPEG-1 and MPEG-2 BC (with 5.1 channels) standards. It is now considered to be the actual “state of the art” in general audio coding and the natural successor of MPEG-1/2 Layer III / MP3 in the new multimedia standard MPEG-4 that uses MP4 as the container format for all kinds of content.AAC is able to include 48 full-bandwidth (up to 96 kHz) audio channels in one stream plus 15 low frequency enhancement (LFE, limited to 120 Hz) channels and up to 15 data streams. Besides it has further multi-language capacities.MPEG formal listening tests demonstrated that AAC provides an audio quality at 96 kbps which is slightly better than MP3 at 128 kbps and MP2 at 192 kbps.

Dolby Digital AC3 (Multichannels) Dolby Digital (AC-3) is Dolby’s third generation audio coding algorithm. It is a perceptual coding algorithm developed to allow the use of lower data rates with a minimum of perceived degration of sound quality.
Dolby can be stereo or surround and has allowable stereo bitrates from 128k to 384k.
Usually uses on DVD.

ADPCM (MS, IMA) Compressed WAV format. ADPCM (Adaptive Differential Pulse Code Modulation) is an audio compression scheme which compresses from 16-bit to 4-bit for a 4:1 compression ratio.ADPCM stands for Adaptive Differential Pulse Code Modulation. ADPCM is a lossy compression mechanism. There are various flavors of ADPCM. This particular algorithm was suggested by Microsoft; its quality is similar to IMA (Interactive Multimedia Association) ADPCM. MS ADPCM compresses data recorded at various sampling rates. Sound is encoded as a succession of 4-bit nibbles. Each nibble represents the difference between the current sampled signal value and the previous value. The compression ratio obtained is relatively modest: 16-bit data samples encoded as 4-bit differences result in 4:1 compression format.Microsoft ADPCM is directly supported on most Windows implementations as a native format. Although the quality of IMA ADPCM voice files is not great, the files are portable. There is a real advantage in having compact files that can be played on most Windows PCs.

CCUIT A-LAW Compressed WAV format. A-Law (or CCITT standard G.711) is an audio compression scheme common in telephony applications. It is a slight variation of the u-Law compression format, and is found in European systems. This encoding format compresses original 16-bit audio down to 8 bits (for a 2:1 compression ratio) with a dynamic range of about 13-bits. Thus, a-law encoded waveforms have a higher s/n ratio than 8-bit PCM, but at the price of a bit more distortion than the original 16-bit audio. The quality is higher than you would get with 4-bit ADPCM formats. Encoding and decoding is rather fast and generally, widely supported.

AIFC AIFF is Audio Interchange File Format, a format for storing digital audio samples in a file. This standard format for sound files was defined by Apple.AIFC is short for AIFF(C) or AIFF-C, i.e. the Audio Interchange File Format with optional compression. AIFC is a newer version of the format that includes the ability to compress the audio data.

AMR

Adaptive Multi-Rate CodecThe AMR format is used by many mobile phones right now, for sound recordings and for MMS (message with sound, picture and text for view in cell phones) and will be used in future GSM systems (3G).

Features	AMR Narrowband
Bandwidth	200-3400 Hz
Sampling Rate	8000 Hz
Bit-rate (kb/s) audio samples	4.75, 5.15, 5.9, 6.7, 7.4, 7.95, 10.2, 12.2
Type	ACELP

DSP

Compressed WAV format. DSP Group True Speech (TM) format. DSP Group’s TrueSpeech is a family of high quality, low bit rate, speech compression algorithms which compress speech down to as little as 1/40th its original size. Several different versions of TrueSpeech at different compression rates are available for licensing, from 8.5 Kbps through 3.9 Kbps. All offer excellent communications over a 14.4 Kbps or better modem.TrueSpeech 8.5 is the 8.5 Kbps member of DSP Group’s TrueSpeech family of software products. It is a low complexity speech coder, which is an integral component of Microsoft windows and has also been endorsed by Dialogic for computer telephony products. TrueSpeech 8.5 should be used when compatibility with Microsoft is a prerequisite.

Speech information can now be exchanged compatibly between different applications. For example, using TrueSpeech 8.5 for digital simultaneous voice and data applications (DSVD), it may be feasible to utilize the same DSP chip for both speech compression and high speed modem data pump tasks. At the sampling rate of 8 KHz, continuous digital speech is compressed from 128 Kbps to 8.5 Kbps, a 15:1 compression ratio, while maintaining good speech quality. With slightly lower voice quality and lower levels of compression, TrueSpeech 8.5 requires only about half the MIPS and program memory space as TrueSpeech 6.3 and 5.3.

GSM Compressed WAV format. Good for keeping of human speech. It is lossy speech compression that allows to get telephone quality speech with 13 kbit/s. It is a standard used for telephone sound compression in European countries and its gaining popularity because of its quality.GSM 06.10 stands for Global System for Mobile Communications and is a variant of LPC called RPE-LPC (Regular Pulse Excited – Linear Predictive Coder) and is a European standard originally for use in encoding speech for satellite distribution to mobile phones. It can be found in use in various telephony products such as voice mail applications.It compresses 160 13-bit samples into 260 bits (or 33 bytes), i.e. 1650 bytes/sec (at 8000 samples/sec). It results in very good compression with good quality output but is very costly in terms of performance.

CCUIT G721 Used for computer telephony. 32 kbit/s adaptive differential pulse code modulation (ADPCM).
Good for keeping of human speech.

CCUIT G723 Used for computer telephony. Extensions of Recommendation G.721 adaptive differential pulse code modulation to 24 and 40 kbit/s for digital circuit multiplication equipment application. Good for keeping of human speech.

CCUIT G723.1

Microsoft CCUIT G.723.1 format (read-only).G.723.1(Originally called TrueSpeech 6.3/5.3) is a member in the TrueSpeech Family of high quality, low bit rate, speech compression algorithms from DSP Group, Inc., it produces digital voice compression levels of 20:1 and 24:1 (6.3 Kbps and 5.3 Kbps). After an extensive series of quality tests and evaluations of various coders, the International Telecommunications Union (ITU) selected TrueSpeech 6.3/5.3 Kbps (G.723.1) as the voice compression standard for the H.324 videoconferencing standard. H.324 standardizes videoconferencing/telephony over public telephone networks, such as the Internet. G.723.1 is also recommended as the low bit rate speech technology for the ITU H.323 audio and video standard which is supported by Microsoft, Intel and hundreds of other companies as the standard for communications on the InternetThis algorithm is applicable for real-time video and teleconferencing applications where reduced bandwidth and very high quality voice is required. Thus, this technology is ideal for Internet video, VOIP (Voice Over Internet Protocol) applications, audio, videoconferencing, VOD (Video On Demand) applications and Internet telephony applications which enables interoperability between telephony applications both on, and off, the Net.

DSP Group offers Integrated Digital Telephony Processors based on TrueSpeech. DSP Group also licenses a G.723.1 algorithm for videoconferencing, computer telephony, Internet, and numerous other multimedia applications.

CCUIT G726 Used for computer telephony. 40, 32, 24, 16 kbit/s adaptive differential pulse code modulation (ADPCM). Good for keeping of human speech.

CCUIT G729 ITU-T recommendation G.729 annex A (referred as G.729A) is the reduced complexity version of G.729 recommendation and operates at 8 Kbps. The performance of this codec may not be as good as the G729 in certain Because of its processing delay (frame size of 10ms), G.729A is well designed to offer telephone quality voice over systems. G.729A provides near toll quality service.

MP2 (MPEG 1 Layer 2 MPEG Layer-2 format. Compression ratio is 1:6…1:8 corresponds to to 256..192 kbps for a stereo signal. The extensions are *.mp2 or *.mpa.

MP3 (MPEG 1/ 2/ 2.5 Layer 3)

MPEG Layer-3 format. Very popular format for keeping of music.The mp3 algorithm development started in 1987, with a joint cooperation of Fraunhofer iis-a and the university of erlangen. it is standardized as iso-mpeg audio layer 3. it soon became the de facto standard for lossy audio encoding, due to the high compression rates (1/12 of the original size, still remaining considerable quality), the high availability of decoders and the low cpu requirements for playback. (486 dx2-66 is enough for real-time decoding). it supports multichannel files (although there’s no implementation yet), sampling frequencies from 16khz to 24khz (mpeg2 layer 3) and 32khz to 48khz (mpeg1 layer 3). formal and informal listening tests have shown that mp3 at the 192-256 kbps range provide encoded results undistinguishable from the original materials in most of the cases.mp3 uses the following for compression:

– huffman coding;
– quantization;
– m/s matrixing;
– intensity stereo;
– channel coupling;
– modified discrete cosine transform (mdct);
– polyphase filter bank.Compression ratio is 1:10…1:12 corresponds to 128..112 kbps for a stereo signal.

MPEG Version 2.5 was added lately to the MPEG 2 standard. It is an extension used for very low bitrate files, allowing the use of lower sampling frequencies. If your decoder does not support this extension, it is recommended for you to use 12 bits for synchronization instead of 11 bits.

MPC (MPEG plus/MusePack)

MusePack (.mpc) is a lossy compressed format that is considered to be the best of all the codecs at moderate to high bitrates. At lower bandwidths of 128 Kbps, any benefits over OGG or WMA are less clear. The most significant downside to MPC is that as of today, no hardware devices or portable audio players support the format.MPC is a new space-saving audio format which was formerly known as MPEG Plus (.mp+). Very similar to MPEG Layer 2, but uses subband-based selectable channel coupling, Huffman coding, differential Huffman coding. Typical data rates are between 160 and 200 kbps. MPEGplus encoder uses a frequency range that can reach up to 22KHz. This is because many people can hear sounds above 16KHz even when it is sometimes hard to hear anything but it makes a lot of difference in sound dynamics.The MPEGplus format’s most important technique to reduce the bitrate is the exploitation of psychacoustic effects. The psychacoustic effect is determined by doing sound test and to check which sounds the human ear can hears and which sounds not. This means that when you hear a very hard explosion it is almost impossible to hear a drop of water falling at the same time. This is why the sound of the drop of water is faded out because it’s impossible to hear and so it will not be noticed. When this sound of the drop of water is faded away this preserves the bitrate. The M/S Stereo technique uses the stereo field to compare both channels and when the sound on both channels is the same or almost the same it has to be encoded just once, preserving the bitrate.

it has simple stereo support and is limited to a frequency of 44.100hz, although stream version 8 [sv8] will be able to encode 32/48khz streams, as well as multichannel ones.

informal listening tests have demonstrated that mpc is the best publicly available lossy audio encoder at bitrates higher than 160kbps. being a subband encoder and given their inherint nature to be less efficient than transform coders, it is worse than aac and ogg vorbis in bitrates lower than 160kbps.

it uses for compression:

– mp2 compression technologies, plus;
– subband-based selectable channel coupling;
– huffman coding;
– differential huffman coding;
– vastly improved psymodel;
– non-linear spreading function;
– ans (adaptive noise shaping);
– cvd (clear voice detection);
– temporal masking with variable time constants.

PCM Standard Windows WAV format for non-compressed audio files. Pulse Code Modulation (PCM) is the standard method of digitally encoding audio. It is the basic uncompressed data format used in file types such as Windows .wav.

Quick Time Apple format for the Macintosh, read only. Although QuickTime was developed by Apple for the Macintosh, QuickTime files are the closest thing the Web has to a standard cross-platform movie format (with MPEG a close second). QuickTime movies have the extension .qt or .mov.QuickTime supports many different codecs, particularly CinePak and Indeo, both of which can be used cross-platform.

CCIUT u-Law Compressed WAV format. u-Law (or CCIUTT standard G.711) is an audio compression scheme and international standard in telephony applications. u-Law is very similar to A-Law, a variation of u-Law found in European systems. This encoding format compresses original 16-bit audio down to 8 bits (for a 2:1 compression ratio) with a dynamic range of about 13-bits. Thus, u-Law encoded waveforms have a higher s/n ratio than 8-bit PCM, but at the price of a bit more distortion than the original 16-bit audio. The quality is higher than you would get with 4-bit ADPCM formats. Encoding and decoding is rather fast and generally, widely supported.

VOX Dialogic ADPCM format. The Dialogic ADPCM format is commonly found in telephony applications, and has been optimized for low sample rate voice. It will only save mono 16-bit audio, and like other ADPCM formats, it compresses to 4-bits/sample (for a 4:1 ratio). This format has no header, so any file format with the extension .VOX will be assumed to be in this format.

RAW Raw format of audio files. Doesn’t contain header of an audio file.

Ogg Vorbis Ogg Vorbis format. Ogg Vorbis is an audio compression format. It is roughly comparable to other formats used to store and play digital music, such as MP3, VQF, AAC, and other digital audio formats. Ogg Vorbis is a fully open, non-proprietary, patent-and-royalty-free, general-purpose compressed audio format for mid to high quality (8kHz-48.0kHz, 16+ bit, polyphonic) audio and music at fixed and variable bitrates from 16 to 128 kbps/channel.

WAV It is not an audio codec. It is the file format. This format was created by Microsoft and IBM, and it has unfortunately become a popular standard. It specifies an arbitrary sampling rate, number of channels and sample size. It also specifies a number of application-specific blocks within the file. It has a plethora of different compression formats.It is the files with .wav extension. But this files can be converted by different codecs: Microsoft PCM

Microsoft ADPCM

DSP

GSM

VOX

A-law

U-law

CCUIT G723.1

CCUIT G721

CCUIT G723

CCUIT G726

CCUIT G729 (A)

WMA Windows Media Audio format. A special type of advanced streaming format file for use with audio content encoded with the Windows Media Audio codec. The .wma extension indicates a file format and how the content is encoded.

Comments (0)

This site uses Akismet to reduce spam. Learn how your comment data is processed.

音频编码概览(Audio Formats Overview)

Leave a Reply