Home > Basics > Compressed File Formats

Compressed File Formats

The idea behind compression is to have the highest audio quality possible at the lowest bit-rate in order to save either storage space or bandwidth transmission requirements. Some Codec applications rely upon algorithmic equation (procedure) to read the signal mathematically and eliminate some information based on psychoacoustic perceptual audio coding. The data that is retained is only that which is within the range of human hearing or is not masked by another frequency. The compression of digital audio information then is characterized by the amount of audio data actually being lost in the process. The codec analyzes a signal that is already in digital form and essentially assigns a new bit value to each sample interval. The value is determined as in ADC by the amplitude (height) of the wave at the point of the sample. The quantinization rate is again the bit rate used to collect the sample at its given interval. This captures the dynamic range of the sound wave, from its highest to its lowest amplitude. Thus, the higher the codec level per amplitude sample point the better the quality of the reproduced sound. It is also a goal of compression to make sure that the information that is filtered out does not remain as background noise.

Many times you will see compressed files expressed in terms of a bit rate of kbps (Kilobits per second) rather than the total actual file size. For instance, it requires 1,411 kbps (Kilobits per second of data when the file is opened and data is transferred by a media player application to the soundcard) to represent one second of a digital audio file PCM sampled at 44.1 kHz and at 16-bit Resolution. When a file is compressed using the Fraunhofer .mp3 codec, the reduction in file size can result in a bit rate of 128 kbps to represent one second of digital audio (or even 64 kbps or 32 kbps).

The encoding/compression may also be done at a variable bit rate (VBR). Some frequencies will be more complex than other frequencies. Rather than encode the signal at a fixed bit rate as the algorithm analyzes the signal, a VBR encoder will allocate less bits to encode a simple frequency and use a higher bit rate to encode the more complex frequency. There is greater efficiency as the simple frequencies are not over-encoded and there is capacity to encode very complex frequencies that a constant bit rate (CBR) may not encode accurately. CBR is a constant bit rate during the encoding regardless of the bit resolution of the underlying amplitude. This results in and encoding process where they may be excess bits left over if a signal is not complicated but there may be insufficient bits for a more complex signal. Overall, CBR sacrifices quality for a reduced file size.

Encoding / Compression formats relying upon perceptual coding include are MPEG, ACE/MACE, ADPCM, Dolby (AC-3), ATRAC, MPEG 2-AAC, EPAC, WMA, DTS and VQF.

MPEG (Moving Pictures Experts Group, established 1988) audio/video codecs were originally developed for inclusion of audio with video. However, the audio codec algorithm can be separated and applied individually. The audio compression is satisfactory on a aural sound quality to file size proportion but also has quality deterioration at higher compression levels. The initial development of the format was done under the auspices of the International Organization for Standardization by the MPEG sub-committee.

There is an MPEG-1 (ISO/IEC 11172-3, approved 1992) (MPEG-1, Layer 1, Layer 2 and Layer 3) and an MPEG-2 (ISO/IEC 13818-3, approved 1994) (MPEG-2, Layer 1, Layer 2 and Layer 3) formats (MPEG-2 Audio, designed for encoding multi-channel video/audio data is backward compatible with stereo MPEG-1), both of which are more closely associated with the encoding and compression of film and video.

The term Layer corresponds to an increasing level of compression (actually, the removal of data based on a psychoacoustic model incorporated into the the mathematical algorithm) applied to the stereo digital signal (Layer 3 applies a higher level of compression and aural information loss compared to Layer 1). A digital circuit employing the MPEG-1 Audio encoding algorithm at a specific sampling rate will reduce the file size of a PCM digital audio file (1.4Mbps). The MPEG-1, Layer 1 algorithmic application results in a stereo digital audio file that requires 384 Kbps bit rate to represent one second of the stereo audio data / signal. An MPEG-1, Layer 2 algorithmic application results in a 256 to 192 Kbps bit rate while the Layer 3 application results in a 192 Kbps bit rate. MPEG-2, Layer 1, results in a multi-channel digital audio file that produces a bit rate of 640 Kbps to represent one second of the digital multi-channel audio data.

The famous MP3 format developed by the Fraunhofer Institut and other MPEG members utilizes the MPEG-1, MPEG-2 codecs, and application of Layer 3 compression (a proprietary adjustment by Fraunhofer to the psychoacoustic model of the Layer 3 algorithm originally developed by Fraunhofer and other ISO members) and an MPEG-2.5, Layer 3 codec (this is a non-ISO extension). The popular MP3 is a PCM file format that has been compressed (encoded a second time) by removing redundant information and eliminating audio information outside the range that humans can normally here. The retained sound quality is still fairly good. The MPEG-2 will support even higher sampling rates from 16KHz to 48KHz, but the Layer 3 encoding can result in a file that requires only 128 to 112 Kbps (at just over 15KHz channel bandwidth) to represent one second of digital stereo audio (although increased or decreased compression levels are capable with most encoders made available to the public). If you own a Mac operating system computer, you can still download an MP3 file and then convert it to AIFF format as long as you have installed appropriate software applications or downloaded a player/encoder. The encoding bitstream rate ranges from 384 Kbps (higher quality) to 32 Kbps. Most of the software encoder/decoders that one can download are based on codecs either from Fraunhofer, LAME or Xing.

Coding Technologies Sweden and the Fraunhofer Institute for Integrated Circuits (IIS) , in cooperation with Thomson Electronics as Licensing agent, are promoting MP3Pro, an updated version of the existing MP3 (which is backward compatible). The format produces similar quality 128-kbps MP3 at a more efficient 64-kbps, with superior quality at a still efficient 96 kbps. The encoded file benefits from an improved efficiency or bandwidth from the company’s proprietary decoding algorithm Spectral Band Replication (SBR). The algorithm reconstructs the more complex higher frequencies of the sound file by analyzing the lower frequencies. Thus, the data can originally be encoded at lower bit rates. The format is compatible with Windows 9x, Windows 2000, Windows ME, XP, Linux and Mac OS.

MPEG-4 (ISO/IEC 14496) is also capable of separating out the audio codec algorithm and be applied individually for audio compression (along with video graphics and images). MPEG-4 is based partly on the MPEG-2 AAC codec and has the support of Sony, Apple Computer, Cisco Systems, Sun Microsystems, Texas Instruments, IBM and Warner Music. It is particularly designed for low bandwidth transmission due to its high compression factor of files. For instance, it is capable of encoding data (audio and video) and streaming to a level of 2 Kbps (CBR) for speech coding and at 4 Kbps for general audio coding. The key is that MPEG-4 does not have to be encoded for a specific streaming speed. Rather, the digital audio file is encoded once but can be streamed at any rate depending on the connection and the level of traffic on the given network. This is accomplished by constructing the MPEG-4 standard to be supportive of Java Language (MPEG-J) applications that manipulate the bit stream. This standard is also particularly applicable to streaming media to portable, wireless handheld devices (3G). However, it is also capable of very high quality audio encoding (in excess of 128 kbps), thus its strength is its versatility. The format also supports digital rights management. The MPEG-2 AAC codec may also be incorporated into the base layer to further improve low bit rate encoding with a low delay level.

Apple compression standards are known as ACE (Audio Compression / Expansion) and MACE (Macintosh Audio Compression / Expansion, MACE-3, MACE-6). This is a sound loss, predictive method of encoding.

ADPCM, Adaptive Differential Pulse Code Modulation developed by Microsoft and IBM, produces a high-quality sound than found in the WAV format and a compression ratio of 4 to 1. This is a conversion of a PCM bit format that, similar to DPCM, attempts to predict the value of each successive sample during the encoding process. It also utilizes a variable bit rate procedure to reduce the difference between sample amplitude levels. Thus, it very efficient and reduces file size by encoding the difference between successive samples rather than expending all encoding bit resources on reproducing the sample. There is both a Microsoft ADPCM and an IMA (Interactive Multimedia Association) ADPCM (used in Sony MiniDisc in tandem with ATRAC). One can use IMA ADPCM in the WAV, AIFF and SND formats.

AC-3, Audio Code Number 3Dolby Digital Stereo (Surround / DSD AC-3), is a proprietary format originally developed by Dolby Laboratories, Inc., for HDTV. The format delivers a matrix of six (5.1) completely discrete (separate) channels of sound. Five of the channels will support a range of frequencies while the sixth channel will support low frequency (LFE, Low Frequency Effect). The AC-3 format is not Dolby Surround itself, which can be in an uncompressed format as well as an analog format. Dolby Surround is first processed by 4-2-4 matrixing, which separates the channels. Rather, AC-3 is a multi-channel, digital audio signal compression algorithm that can compress all six channels (5.1 Surround) of digitized data in less space than a single channel of a CD requires. The bit stream itself is floating point (Mantissa and exponent form). The format is the further development of the AC-1 and AC-2 adaptive delta modulation algorithm (in the case of AC-3 the algorithm sums and difference between the matrixed channels are encoded rather than the actual channel), which based on a psychoacoustic, hybrid forward/backward adaptive encoding. Although designed for movie theater, Dolby Digital is capable of being played on a DVD player. The format will support bit rates from 32 kbps to 640 kbps.

What is very interesting about AC-3 is that the encoding allows dynamic range control values to be inserted in the bit stream. Thus, on decoding, the Ac-3 Decoder has a range of compression values that can be set in response to the audience or venue requirements. The second capability of AC-3 is Downmixing: if there are insufficient speakers available for the playback all of the encoded channels, the bit stream can be downmixed to match the available amount of speakers and still retain a standardized audible quality.

Several Dolby formats tend to get merged together. However, there are: Dolby Stereo Digital (the original movie theater format), Dolby Pro Logic (a digital 3.1 channel, or parsed six channel surround format with improved channel separation), Dolby Surround Digital (the AC-3 5.1 channel codec in movie theaters and consumer products). There is also an earlier Dolby Surround Analog.

ATRAC (Adaptive Transform Acoustic Coding) and ATRAC-3 are proprietary formats used by Sony on its MiniDisc and Memory Stick equipment. ATRAC utilizes psychoacoustic principles in algorithmic filtering (Quadrature Mirror Filter) to split the digital audio signal (which begins as a 16-bit resolution / 44.1kHz sampling rate stero audio signal) into various frequency sub-bands (based on time-frequency analysis) wich results in a decomposed signal, the sub-bands are transformed to into a frequency by Modified Discrete Cosine Transform, and then allocates bits based on the complexity of the various blocks of the split signal. The encoding of PCM, 16-bit, 44.1 kHz still results in very good sound quality with a reduced file size of approximately 1/5 the size of the original file.

ATRAC3 is related to MDLP, MiniDisc Long Play, which is a dual mode ATRAC encoding process that allows for either LP2 mode / 160 minutes stereo audio or LP4 mode / 320 minutes stereo audio. ATRAC3 is also the audio format stored on the Sony Memory Stick.

MPEG 2-AAC (Advanced Audio Coding, approved 1997). The AAC compression format has better compression format (128 Kbps for near stereo CD Audio aural level) than MP3 encoding. This format incorporates more advanced coding techniques and filtering systems (interframe backward prediction). You will also sometimes see reference to Main, Low Complexity and Scaleable Sampling Rate variants of the format. The format will also support 5.1 channel surround sound encoding. The two most well known applications of AAC are Liquid Audio and A2b (the two formats are not interchangeable, however A2b is no longer in use). Liquid Audio (as was A2b) is encrypted and not compliant with other AAC players. AAC was developed by AT&T, Dolby, the Fraunhofer Institute, Lucent Technologies, Sony and several other companies. The format is supported by RioPort, Matsushita, Texas Instruments and is utilized in Apple Computers iTunes Music Store.

Dolby Laboratories promotes and licenses (with co-licensors and patent-holders AT&T, Fraunhofer and Sony) worldwide a MPEG-4 AAC format. Encoder and decoder applications are being promoted for usage in the streaming of audio over 3G wireless networks. The MPEG-4 AAC codec is capable of delivering stereo audio at bit rates as low as 48 to 40 kbps.

EPAC (Enhanced Perceptual Audio Coder) was developed by Lucent Technologies. The algorithm encodes at variable bit rates, up to 128 kilobits per second

VQF or TwinVQ or Transform-domain Weighted Interleave Vector Quantization Format has a higher compression rate than MP3 but still fairly good sound quality after compressing PCM, 16-bit, 44.1 kHz .wav format. The format was developed by the NTT Human Interface Laboratories (Japan), with Yamaha being the major licensee. In this format, audio data is combined into a multiple frame pattern and then transmitted in a predetermined compression code depending on the profile of the original data. The actual encoding is a little slower than the time it takes to encode an MP3 at 128 kbps.

WMA (Windows Media Audio) is the proprietary codec of Microsoft and will compress to a file size less than MP3 with good audio quality. It can be utilized for either storing files to memory on an MS Windows operating system computer (.wma) or streamed over the internet (.wma in an .asf wrapper file). The most recent version is Windows Media Audio 9.0. Please see below for the profile of the codec. The new Windows Media Audio 9.0 is now actually a losslesss codec.

DTS (Digital Theatre Systems) also based on perceptual coding techniques (coherent acoustics algorithm), provides good quality a just a slightly larger file size than MP3. It will encode PCM, including multi-channel 5.1 Surround at 20-bit Resolution and 48 kHz sampling rate into a single bit stream. Although designed for movie theatre, DTS is capable of being played on a DVD player (the format is not compatible with standard CD players). The DVD player decodes the DTS back to a 5.1 Surround Sound.

DVD Audio actual digital sample is in PCM format but with an improved frequency range resolution than is found with CD Audio. It will also support Dolby Digital and DTS (Digital Theatre Systems) audio encoding. Secondly, the storage capacity is much larger from a minimum 4.7 G bytes (one-sided DVD) to 17 G bytes. Nor is it limited to two channel stereo. The format will also support up to six channels. The PCM sampling rate will support standard CD 44.1Khz to 192KHz (two channel), and up to 96KHz on multi-channel, with a peak audio frequency of 48kHz. Resolution is to a maximum 24-bits. The Data Rate is also superior up to a maximum 9.6 Mbps compared to 1.4 Mbps for CD Audio. The DVD-A format also supports encryption, digital watermarking and copyright management.

When an encoding format of 24-bit Resolution and 96kHz is used, it can only be achieved as a lossless format by incorporating Meridian Lossless Packing (MLP) algorithmic compression which does not compress the digital data but compresses the file for delivery. MLP is capable of handling the compression of all of the channels in a 5.1 channel project. It accomplishes this by using a variable rate to encode parts of the music composition that do not require the entire 24-bits per sample, borrowing bandwidth from one or more channel that is not utilizing its entire bandwidth, utilizing a matrix to eliminate inter-channel correlation (as some channels tend to share common data).

DVD-A also incorporates System Managed Audio Resource Technique (SMART). Once the disc is inserted into a player, and the player has a playback system capable of only stereo, the SMART application will automatically remix the Surround Sound 5.1 multi-channel to a two channel stereo mix.

Super Audio Compact Disc (SACD), developed by Sony and Philips, uses Direct Stream Digital (DSD) encoding to get a sampling rate of up to 2.8MHz (2.82 million) samples per second, with a peak audio frequency of 100kHz. It does it at an extremely efficient manner utilizing one bit per sample (non-linear, sigma delta modulation) and using such a high sampling rate (however it is inefficient in the storage of the volume of data and bandwidth required for data trnasmission). This is a non-PCM format, and the resulting file can still be converted down to any bit-rate, linear PCM. DSD also introduces some filtering to reduce quantization error distortion that is a result of the sigma-delta modulation sampling. The one-bit sample does not actually perform a representation of the amplitude at the sample point. Rather, it distinguishes whether the signal amplitude is increasing or decreasing. Not only does the format support 2.82 MHz (64 x 44.1 kHz or 64x the CD rate) at 1-bit resolution, it also supports 192 kHz at 24-bit resolution, or 96 kHz and 24-bit resolution, in discrete multi-channel audio (5.1 surround), with a dynamic range of 120dB. The SACD format also supports encryption, digital watermarking and copyright management. SACD is released with one layer of the disc providing a DSD multi-channel surround sound version and another layer providing a regular CD-Audio version (in order to be backward compatible the 1-bit DSD can be converted to 16-bit / 44.1 kHz for CD distribution). This is done by Sony’s filter system titled Super Bit Mapping (SBM) Direct Down conversion in order to retain good quality in a 16-bit format. SBM applies dither during the conversion (signal shaping) in order to reduce any frequency noise that may be a result of the actual conversion process due to from the change in word length / bit Resolution. Please also see hardware specifications for SACD disc.

Categories: Basics Tags:
  1. No comments yet.

Please copy the string 1uJ64U to the field below: