Advanced Video Coding
H.264 is a high compression digital video codec standard written by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC Moving Picture Experts Group (MPEG) as the product of a collective partnership effort known as the Joint Video Team (JVT). This standard is identical to ISO MPEG-4 part 10, and is also known as AVC, for Advanced Video Coding. The final drafting work on the first version of the standard was completed in May of 2003.
H.264 is a name related to the ITU-T line of H.26x video standards, while AVC relates to its ISO/IEC MPEG roots. It is usual to call the standard as H.264/AVC, or AVC/H.264 to emphasize the common heritage. The name H.26L, also related to its ITU-T history, is far less common, but still used. Occasionally, it has also been referred to as "the JVT codec", in reference to the JVT organization that developed it. (Such partnership and multiple naming is not unprecedented, as MPEG-2 video also arose from a partnership between MPEG and the ITU-T, and MPEG-2 video is also known in the ITU-T community as H.262.)
The intent of H.264/AVC project has been to create a standard that would be capable of providing good video quality at bit rates that are substantially lower (e.g., half or less) than what previous standards would need (e.g., relative to MPEG-2, H.263, or MPEG-4 part 2), and to do so without so much of an increase in complexity as to make the design impractically expensive to implement. An additional goal was to do this in a flexible way that would allow the standard to be applied to a very wide variety of applications (e.g., for both low and high bit rates and low and high resolution video) and to work well on a very wide variety of networks and systems (e.g., for broadcast, DVD storage, RTP/IP packet networks, and ITU-T multimedia telephony systems).
Since the completion of the original version of the standard in May of 2003, the JVT has done one round of "corrigendum" errata corrections and has developed a set of enhanced-functionality extensions called the "Fidelity Range Extensions" (FRExt). An additional round of corrigendum work is now nearing completion and should be finished in early 2005.
H.264/AVC contains a number of new features that allow it to compress video much more effectively than older codecs and to provide more flexibility for application to a wide variety of network environments. In particular, some such key features include:
- Multi-picture motion compensation using previously-encoded pictures as references in a much more flexible way than in past standards -- allowing up to 32 reference pictures to be used in some cases (unlike in prior standards, where the limit was typically one or, in the case of "B pictures", two). This particular feature usually allows modest improvements in bitrate and quality in most scenes. But in certain types of scenes, for example scense with rapid repetitive flashing or back-and-forth scene cuts or uncovered background areas, it allows a very significant reduction in bit rate.
- Variable block-size motion compensation (VBSMC) with block sizes as large as 16x16 and as small as 4x4, enabling very precise segmentation of moving regions.
- Quarter-pel precision for motion compensation, enabling very precise description of the displacements of moving areas. In fact, for chroma, the motion compensation has even more precision -- down to one-eighth pel.
- Weighted prediction, allowing an encoder to specify the use of a scaling and offset when performing motion compensation, and providing a significant benefit in performance in special cases -- such as fade-to-black, fade-in, and cross-fade transitions.
- An in-loop deblocking filter which helps prevent the ringing and blocking artifacts common to other DCT-based image compression techniques.
- An exact-match integer 4x4 spatial block transform (similar to the well-known DCT design), and in the case of the new FRExt "High" profiles, the ability for the encoder to adaptively select between a 4x4 and 8x8 transform block size for the integer transform operation.
- A secondary Hadamard transform performed on DC coefficients of the primary spatial transform (for chroma DC coefficients and also luma in one special case) to obtain even more compression in smooth regions.
- Spatial prediction from the edges of neighboring blocks for "intra" coding (rather than the DC-only prediction found in MPEG-2 and the transform coefficient prediction found in H.263+ and MPEG-4 part 2).
- Context-adaptive binary arithmetic coding (CABAC), which is a clever technique to losslessly compress syntax elements in the video stream.
- Context-adaptive variable-length coding (CAVLC), which is a lower-complexity alternative to CABAC for the coding of quantized transform coefficient values. Although lower complexity than CABAC, CAVLC is more elaborate and more efficient than the methods typically used to code coefficients in other prior designs.
- A common simple and highly-structured variable length coding (VLC) technique for many of the syntax elements not coded by CABAC or CAVLC, referred to as an Exponential-Golomb (Exp-Golomb) code.
- A network abstraction layer (NAL) definition allowing the same video syntax to be used in many network environments, including features such as sequence parameter sets (SPSs) and picture parameter sets (PPSs) that provide more robustness and flexibility than provided in prior designs.
- Switching slices (called SP and SI slices), features that allow an encoder to direct a decoder to jump into an ongoing video stream for such purposes as video streaming bit rate switching and "trick mode" operation. When a decoder jumps into the middle of a video stream using the SP/SI feature, it can get an exact match to the decoded pictures at that location in the video stream despite using different pictures (or no pictures at all) as references prior to the switch.
- Flexible macroblock ordering (FMO, also known as slice groups) and arbitrary slice ordering (ASO), which are techniques for restructuring the ordering of the representation of the fundamental regions (called macroblocks) in pictures. Typically considered an error/loss robustness feature, FMO and ASO can also be used for other purposes.
- Data partitioning (DP), a feature providing the ability to separate more important and less important syntax elements into different packets of data, enabling the application of unequal error protection (UEP) and other types of improvement of error/loss robustness.
- Redundant slices (RS), an error/loss robustness feature allowing an encoder to send an extra representation of a picture region (typically at lower fidelity) that can be used if the primary representation is corrupted or lost.
- A simple automatic process for preventing the accidental emulation of start codes, which are special sequences of bits in the coded data that allow random access into the bitstream and recovery of byte alignment in systems that can lose byte synchronization.
- Supplemental enhancement information (SEI) and video usability information (VUI), which are extra information that can be inserted into the bitstrem to enhance the use of the video for a wide variety of purposes.
- Auxiliary pictures, which can be used for such purposes as alpha blend compositing.
- Frame numbering, a feature that allows the creation of "sub-sequences" (enabling temporal scalability by optional inclusion of extra pictures between other pictures) and the detection and enables the concealment of losses of entire pictures (which can occur due to network packet losses or channel errors).
- Picture order count, a feature that serves to keep the ordering of the pictures and the values of samples in the decoded pictures isolated from timing information (allowing timing information to be carried and controlled/changed separately by a system without affecting decoded picture content).
These techniques, along with several others, help H.264 to perform significantly better than any prior standard can, under a wide variety of circumstances in a wide variety of application environments. H.264 can often perform radically better than MPEG-2 -- typically obtaining the same quality at half of the bitrate or less.
The JVT recently completed the development of some extensions to the original standard that are known as the Fidelity Range Extensions (FRExt). These extensions support higher-fidelity video coding by supporting increased sample accuracy (including 10-bit and 12-bit coding) and higher-resolution color information (including sampling structures known as YUV 4:2:2 and YUV 4:4:4). Several other features are also included in the Fidelity Range Extensions project (such as adaptive switching between 4x4 and 8x8 integer transforms, encoder-specified perceptual-based quantization weighting matrices, efficient inter-picture lossless coding, support of additional color spaces, and a residual color transform). The design work on the Fidelity Range Extensions was completed in July of 2004, and the drafting was finished in September of 2004.
H.264/AVC is already widely used for videoconferencing, including its support in products of the two main companies in that market (Polycom and Tandberg). It has also been preliminarily adopted as a mandatory part of both of the major rival formats for future enhanced DVD uses, which are known as the HD-DVD and Blu-Ray disc formats. The Digital Video Broadcast (DVB) organization in Europe has recently approved the use of H.264/AVC for European broadcast television. A number of broadcasters in Japan and Korea have announced future support for the codec, and it is under consideration for other broadcast use -- for example, it is under consideration in the United States' Advanced Television Systems Committee (ATSC) standards body. In the wireless world, it has been adoped as part of design release 6 of the 3rd-Generation Partnership Project (3GPP).
As of the time of this writing (late 2004), four companies are producing sample custom chips capable of decoding H.264/AVC video (specifically, Broadcom, Conexant, Sigma Designs, and ST Micro). Such chips will allow widespread deployment of low-cost devices capable of playing H.264/AVC video at standard-definition and high-definition television resolutions.
Like other ISO/IEC MPEG video standards, H.264/AVC has a reference implementation that can be freely downloaded. Its main purpose is to give examples of H.264/AVC features, rather than being a useful application per se.
A tweaked variant of this codec is implemented in the form of the Sorenson codec, as was found by an FFmpeg developer working on reverse-engineering the Sorenson codec. (The reliability of this information is unknown.)
As with MPEG-2 and MPEG-4 part 2, the vendors of H.264/AVC products and services are expected to pay patent licensing royalties for the patents that their products use. The primary source of licenses for patents applying to this standard is a private organization known as MPEG-LA, LLC (which is not affiliated in any way with the MPEG standardization organization, but which also administers patent pools for MPEG-2 and MPEG-4 part 2 video).
Applications
The HD-DVD format planned for product deployment in late 2005 by the DVD Forum includes H.264/AVC as a mandatory player feature.
The Blu-Ray Disc format planned for product deployment in late 2005 by the Blu-Ray disc Association (BDA) includes H.264/AVC as a mandatory player feature.
The Digital Video Broadcast (DVB) standards body in Europe approved the use of H.264/AVC for broadcast television in Europe in late 2004.
The Advanced Television Systems Committee (ATSC) standards body in the United States is in final consideration work on potential use of H.264/AVC for U.S. broadcast television.
The Digital Multimedia Broadcast (DMB) service in the Republic of Korea will use H.264/AVC.
Major broadcasters in Japan have announces support of H.264/AVC for mobile-segment terrestrial broadcast services.
The 3rd Generation Partnership Project (3GPP) has approved the inclusion of H.264/AVC as an optional feature in release 6 of its mobile multimedia telephony services specifications.
The ITU-T has adopted H.264/AVC in its H.32x suite of multimedia telephony systems specifications. Essentially all new videoconferencing products now include support for H.264/AVC, including in particular the products of the two market leaders Polycom and Tandberg.
MPEG has fully integrated support of H.264/AVC into its system standards (e.g., MPEG-2 and MPEG-4 systems) and its ISO media file format specification.
Apple Computer is working on integrating H.264 into Tiger, the next version of Mac OS X, version 10.4. Apple has also incorporated H.264/AVC directly into QuickTime.
The PlayStation Portable console will feature hardware decoding of video files in the H.264 format.
Ahead Software has integrated a H.264 encoder into it's Nero Recode package.
External links
- H.264/AVC overview paper including new FRExt enhancements (Sullivan, Topiwala and Luthra)
- Various papers on H.264/AVC and related topics (Wiegand)
- More papaers on H.264/AVC and related topics (Marpe)
- H.264/AVC Software Coordination (Suehring)
- H.264/MPEG-4 Part 10 Tutorials (Richardson)
- Book: H.264 and MPEG-4 Video Compression (Richardson)
- JVT Experts Group document archive
- MPEG LA Terms of H.264/MPEG-4 AVC Patent License
- A fast GPL H.264 encoder library with support for most H.264 features