This document provides an overview of HEVC (High Efficiency Video Coding) including:
- HEVC aims to provide roughly half the bitrate of H.264/AVC at the same quality.
- It uses block-based hybrid video coding with improved intra-prediction, transform, quantization and entropy coding techniques.
- HEVC supports a wide range of resolutions, color spaces and bit depths for 4K and beyond.
6. 6
VIDEO CODER ARCHITECTURE
• Image / Video Coding Based on Block-Matching
– Assume frame f-1 has been encoded and reconstructed, and frame f is the
current frame to be encoded
• Exploiting the redundancies
– Temporal
MC-Prediction (P and B frames)
– Spatial
Block DCT
– Color
Color Space Conversion
• Scalar quantization of DCT coefficients
• Zigzag scanning, runlength and Huffman coding of the nonzero
quantized DCT coefficients
7. 7
VIDEO CODER ARCHITECTURE…
• Video Encoder
– Divide frame f into equal-size blocks
– For each source block,
Find its motion vector using the block-matching algorithm based on the
reconstructed frame f-1
Compute the DFD of the block
– Transmit the motion vector of each block to decoder
– Compress DFD’s of each block
– Transmit the encoded DFD’s to decoder
9. 9
VIDEO CODER ARCHITECTURE…
• Video Decoder
– Receive motion vector of each block from encoder
– Based on the motion vector ,find the best-matching block from the
reference frame
ie,, Find the predicted current frame from the reference frame
– Receive the encoded DFD of each block from encoder
– Decode the DFD.
– Each reconstructed block in the current frame = Its decompressed DFD +
the best-matching block
12. 12
VIDEO CODEC STANDARDS…
• Based on the same fundamental building blocks
– Motion-compensated prediction (I, P, and B frames)
– 2-D Discrete Cosine Transform (DCT)
– Color space conversion
– Scalar quantization, runlengths, Huffman coding
• Additional tools added for different applications:
– Progressive or interlaced video
– Improved compression, error resilience, scalability, etc.
• MPEG-1/2/4, H.261/3/4
– Frame-based coding
• MPEG-4
– Object-based coding and Synthetic video
16. 16
HEVC…
• MPEG-H
– High Efficiency Coding and Media Delivery in
Heterogeneous Environments a new suite of
standards providing technical solutions for
emerging challenges in multimedia industries
– Part 1: System, MPEG Media Transport (MMT)
Integrated services with multiple components in a hybrid
delivery environment, providing support for seamless and
efficient use of heterogeneous network environments,
including broadcast, multicast, storage media and mobile
networks
– Part 2: Video, High Efficiency Video Coding
(HEVC)
Highly immersive visual experiences, with ultra high definition
displays that give no perceptible pixel structure even if
viewed from such a short distance that they subtend a large
viewing angle (up to 55 degrees horizontally for 4Kx2K
resolution displays, up to 100 degrees for 8Kx4K)
– Part 3: Audio, 3D-Audio
Highly immersive audio experiences in which the decoding
device renders a 3D audio scene. This may be using 10.2 or
22.2 channel configurations or much more limited speaker
configurations or headphones, such as found in a personal
tablet or smartphone.
17. 17
HEVC…
• Transport/System Layer Integration
– On going definitions (MPEG, IETF,…,DVB): benefit from H.264/AVC
– MPEG Media Transport (MMT) ?
18. 18
HEVC…
• HEVC = High Efficiency Video Coding
• Joint project between ISO/IEC/MPEG and ITU-T/VCEG
– ISO/IEC: MPEG-H Part 2 (23008-2)
– ITU-T: H.265
• JCT-VC committee
– Joint Collaborative Team on Video Coding
– Co-chairs: Dr. Gary Sullivan (Microsoft, USA) and Dr. Jens-Reiner Ohm (RWTH
Aachen, Germany)
• Target
– Roughly half the bit-rate at the same subjective quality compared to H.264/AVC (50%
over H.264/AVC)
– x10 complexity max for encoder and x2/3 max for decoder
• Requirements
– Progressive required for all profiles and levels
Interlaced support using field SEI message
– Video resolution: sub QVGA to 8Kx4K, with more focus on higher resolution video
content (1080p and up)
– Color space and chroma sampling: YUV420, YUV422, YUV444, RGB444
– Bit-depth: 8-14 bits
– Parallel Processing Architecture
20. 20
HEVC…
• Potential applications
– Existing applications and usage scenarios
IPTV over DSL : Large shift in IPTV eligibility
Facilitated deployment of OTT and multi-screen services
More customers on the same infrastructure: most IP traffic is video
More archiving facilities
– Existing applications and usage scenarios
1080p60/50 with bitrates comparable to 1080i
Immersive viewing experience: Ultra-HD (4K, 8K)
Premium services (sports, live music, live events,…): home theater, Bars
venue, mobile
HD 3DTV Full frame per view at today’s HD delivery rates
What becomes possible with 50% video rate reduction?
27. 27
HEVC…
• Video Coding Techniques : Block-based hybrid video coding
– Interpicture prediction
Temporal statistical dependences
– Intrapicture prediction
Spatial statistical dependences
– Transform coding
Spatial statistical dependences
• Uses YCbCr color space with 4:2:0 subsampling
– Y component
Luminance (luma)
Represents brightness (gray level)
– Cb and Cr components
Chrominance (chroma).
Color difference from gray toward blue and red
28. 28
HEVC…
• Video Coding Techniques : Block-based hybrid video coding
– Motion compensation
Quarter-sample precision is used for the MVs
7-tap or 8-tap filters are used for interpolation of fractional-sample
positions
– Intrapicture prediction
33 directional modes, planar (surface fitting), DC (flat)
Modes are encoded by deriving most probable modes (MPMs) based
on those of previously decoded neighboring PBs
– Quantization control
Uniform reconstruction quantization (URQ)
– Entropy coding
Context adaptive binary arithmetic coding (CABAC)
– In-Loop deblocking filtering
Similar to the one in H.264 and More friendly to parallel processing
– Sample adaptive offset (SAO)
Nonlinear amplitude mapping
For better reconstruction of amplitude by histogram analysis
29. 29
HEVC…
• Coding Tree Unit (CTU) - A picture is partitioned into CTUs
– The CTU is the basic processing unit instead of Macro Blocks (MB)
– Contains luma CTBs and chroma CTBs
A luma CTB covers L × L samples
Two chroma CTBs cover each L/2 × L/2 samples
– HEVC supports variable-size CTBs
The value of L may be equal to 16, 32, or 64.
Selected according to needs of encoders - In terms of memory and
computational requirements
Large CTB is beneficial when encoding high-resolution video content
– CTBs can be used as CBs or can be partitioned into multiple CBs using
quadtree structures
– The quadtree splitting process can be iterated until the size for a luma
CB reaches a minimum allowed luma CB size (8 × 8 or larger).
30. 30
HEVC…
• Block Structure
– Coding Tree Units (CTU)
Corresponds to macroblocks in earlier coding standards (H.264, MPEG2, etc)
Luma and chroma Coding Tree Blocks (CTB)
Quadtree structure to split into Coding Units (CUs)
16x16, 32x32, or 64x64, signaled in SPS
31. 31
HEVC…
• A new framework composed of three
new concepts
– Coding Units (CU)
– Prediction Units (PU)
– Transform Units (TU)
• The decision whether to code a
picture area using inter or intra
prediction is made at the CU level
Goal: To be as flexible as possible and to adapt the
compression-prediction to image peculiarities
32. 32
HEVC…
• Block Structure
– Coding Units (CU)
Luma and chroma Coding Blocks (CB)
Rooted in CTU
Intra or inter coding mode
Split into Prediction Units (PUs) and Transform Units (TUs)
33. 33
HEVC…
• Block Structure
– Prediction Units (PU)
Luma and chroma Prediction Blocks (PB)
Rooted in CU
Partition and motion info
36. 36
HEVC…
• Intra Prediction
– 35 intra modes: 33 directional modes +
DC + planar
– For chroma, 5 intra modes: DC, planar,
vertical, horizontal, and luma derived
– Planar prediction (Intra_Planar)
Amplitude surface with a horizontal and
vertical slope derived from boundaries
– DC prediction (Intra_DC)
Flat surface with a value matching the
mean value of the boundary samples
– Directional prediction (Intra_Angular)
33 different directional prediction is
defined for square TB sizes from 4×4 up
to 32×32
37. 37
HEVC…
• Intra Prediction
– Adaptive reference sample filtering
3-tap filter: [1 2 1]/4
Not performed for 4x4 blocks
For larger than 4x4 blocks, adaptively performed for a subset of modes
Modes except vertical/near-vertical, horizontal/near-horizontal, and DC
– Mode dependent adaptive scanning
4x4 and 8x8 intra blocks only
All other blocks use only diagonal upright scan (left-most scan pattern)
38. 38
HEVC…
• Intra Prediction
– Boundary smoothing
Applied to DC, vertical, and horizontal modes, luma only
Reduces boundary discontinuity
– For DC mode, 1st column and row of samples in predicted block are
filtered
– For Hor/Ver mode, first column/row of pixels in predicted block are filtered
39. 39
HEVC…
• Inter Prediction
– Fractional sample interpolation
¼ pixel precision for luma
– DCT based interpolation filters
8-/7- tap for luma
4-tap for chroma
Supports 16-bit implementation
with non-normative shift
– High precision interpolation and
biprediction
– DCT-IF design
Forward DCT, followed by
inverse DCT
40. 40
HEVC…
• Inter Prediction
– Asymmetric Motion Partition (AMP) for Inter PU
– Merge
Derive motion (MV and ref pic) from spatial and
temporal neighbors
Which spatial/temporal neighbor is identified by
merge_idx
Number of merge candidates (≤ 5) signaled in slice
header
Skip mode = merge mode + no residual
– Advanced Motion Vector Prediction (AMVP)
Use spatial/temporal PUs to predict current MV
41. 41
HEVC…
• Transforms
– Core transforms: DCT based
4x4, 8x8, 16x16, and 32x32
Square transforms only
Support partial factorization
Near-orthogonal
Nested transforms
– Alternative 4x4 DST
4x4 intra blocks, luma only
– Transform skipping mode
By-pass the transform stage
Most effective on “screen content”
4x4 TBs only
42. 42
HEVC…
• Scaling and Quantization
– HEVC uses a uniform reconstruction quantization (URQ)
scheme controlled by a quantization parameter (QP).
– The range of the QP values is defined from 0 to 51
43. 43
HEVC…
• Entropy Coding
– One entropy coder, CABAC
Reuse H.264 CABAC core algorithm
More friendly to software and hardware
implementations
Easier to parallelize, reduced HW area, increased
throughput
– Context modeling
Reduced # of contexts
Increased use of by-pass bins
Reduced data dependency
– Coefficient coding
Adaptive coefficient scanning for intra 4x4 and 8x8
▫ Diagonal upright, horizontal, vertical
Processed in 4x4 blocks for all TU sizes
Sign data hiding:
▫ Sign of first non-zero coefficient conditionally hidden in
the parity of the sum of the non-zero coefficient
magnitudes
▫ Conditions: 2 or more non-zero coefficients, and
“distance” between first and last coefficient > 3
44. 44
HEVC…
• Entropy Coding - CABAC
– Binarization: CABAC uses Binary Arithmetic Coding which means that only binary decisions (1 or
0) are encoded. A non-binary-valued symbol (e.g. a transform coefficient or motion vector) is
"binarized" or converted into a binary code prior to arithmetic coding. This process is similar to the
process of converting a data symbol into a variable length code but the binary code is further
encoded (by the arithmetic coder) prior to transmission.
– Stages are repeated for each bit (or "bin") of the binarized symbol.
– Context model selection: A "context model" is a probability model for one or more bins of the
binarized symbol. This model may be chosen from a selection of available models depending on
the statistics of recently coded data symbols. The context model stores the probability of each bin
being "1" or "0".
– Arithmetic encoding: An arithmetic coder encodes each bin according to the selected probability
model. Note that there are just two sub-ranges for each bin (corresponding to "0" and "1").
– Probability update: The selected context model is updated based on the actual coded value (e.g. if
the bin value was "1", the frequency count of "1"s is increased)
45. 45
HEVC…
• Parallel Processing Tools
– Slices
– Tiles
– Wavefront parallel processing (WPP)
– Dependent Slices
• Slices
– Slices are a sequence of CTUs that are processed in the order
of a raster scan. Slices are self-contained and independent
– Each slice is encapsulated in a separate packet
46. 46
HEVC…
• Tile
– Self-contained and independently decodable rectangular regions
– Tiles provide parallelism at a coarse level of granularity
Tiles more than the cores Not efficient Breaks dependencies
47. 47
HEVC…
• WPP
– A slice is divided into rows of CTUs. Parallel processing of rows
– The decoding of each row can be begun as soon a few decisions have
been made in the preceding row for the adaptation of the entropy coder.
– Better compression than tiles. Parallel processing at a fine level of
granularity.
No WPP with tiles !!
48. 48
HEVC…
• Dependent Slices
– Separate NAL units but dependent (Can only be decoded after part of
the previous slice)
– Dependent slices are mainly useful for ultra low delay applications
Remote Surgery
– Error resiliency gets worst
– Low delay
– Good Efficiency Goes well with WPP
49. 49
HEVC…
• Slice Vs Tile
– Tiles are kind of zero overhead slices
Slice header is sent at every slice but tile information once for a sequence
Slices have packet headers too
Each tile can contain a number of slices and vice versa
– Slices are for :
Controlling packet sizes
Error resiliency
– Tiles are for:
Controlling parallelism (multiple core architecture)
Defining ROI regions
50. 50
HEVC…
• Tile Vs WPP
– WPP
Better compression than tiles
Parallel processing at a fine level of granularity
But …
Needs frequent communication between processing units
If high number of cores Can’t get full utilization
– Good for when
Relatively small number of nodes
Good inter core communication
No need to match to MTU size
Big enough shared cache
51. 51
HEVC…
• In-Loop Filters
– Two processing steps, a deblocking filter (DBF) followed by an
sample adaptive offset (SAO) filter, are applied to the
reconstructed samples
The DBF is intended to reduce the blocking artifacts due to block-
based coding
The DBF is only applied to the samples located at block
boundaries
The SAO filter is applied adaptively to all samples satisfying
certain conditions. e.g. based on gradient.
52. 52
HEVC…
• Loop Filters: Deblocking
– Applied to all samples adjacent to a PU or TU boundary
Except the case when the boundary is also a picture boundary, or
when deblocking is disabled across slice or tile boundaries
– HEVC only applies the deblocking filter to the edge that are
aligned on an 8×8 sample grid
This restriction reduces the worst-case computational complexity
without noticeable degradation of the visual quality
It also improves parallel-processing operation
– The processing order of the deblocking filter is defined as
horizontal filtering for vertical edges for the entire picture first,
followed by vertical filtering for horizontal edges.
53. 53
HEVC…
• Loop Filters: Deblocking
– Simpler deblocking filter in HEVC (vs H.264 )
– Deblocking filter boundary strength is set according to
Block coding mode
Existence of non zero coefficients
Motion vector difference
Reference picture difference
54. 54
HEVC…
• Loop Filters: SAO
– A process that modifies the decoded
samples by conditionally adding an
offset value to each sample after the
application of the deblocking filter,
based on values in look-up tables
transmitted by the encoder.
– SAO: Sample Adaptive Offsets
New loop filter in HEVC
Non-linear filter
– For each CTB, signal SAO type and
parameters
– Encoder decides SAO type and
estimates SAO parameters (rate-
distortion opt.)
55. 55
HEVC…
• Special Coding
– I_PCM mode
The prediction, transform, quantization and entropy coding are bypassed
The samples are directly represented by a pre-defined number of bits
Main purpose is to avoid excessive consumption of bits when the signal
characteristics are extremely unusual and cannot be properly handled by hybrid
coding
– Lossless mode
The transform, quantization, and other processing that affects the decoded picture
are bypassed
The residual signal from inter- or intrapicture prediction is directly fed into the
entropy coder
It allows mathematically lossless reconstruction
SAO and deblocking filtering are not applied to this regions
– Transform skipping mode
Only the transform is bypassed
Improves compression for certain types of video content such as computer-
generated images or graphics mixed with camera-view content
Can be applied to TBs of 4×4 size only
56. 56
HEVC…
• High Level Parallelism
– Independently decodable packets
– Sequence of CTUs in raster scan
– Error resilience
– Parallelization
– Independently decodable (re-entry)
– Rectangular region of CTUs
– Parallelization (esp. encoder)
– 1 slice = more tiles, or 1 tile = more slices
– Rows of CTUs
– Decoding of each row can be parallelized
– Shaded CTU can start when gray CTUs in
row above are finished
– Main profile does not allow tiles + WPP
combination
57. 57
HEVC…
• Profiles, Levels and Tiers
– Historically, profile defines collection of coding
tools, whereas Level constrains decoder
processing load and memory requirements
– The first version of HEVC defined 3 profiles
Main Profile: 8-bit video in YUV4:2:0 format
Main 10 Profile: same as Main, up to 10-bit
Main Still Picture Profile: same as Main, one
picture only
– Levels and Tiers
Levels: max sample rate, max picture size,
max bit rate, DPB and CPB size, etc
Tiers: “main tier” and “high tier” within one
level
58. 58
HEVC…
• Complexity Analysis
– Software-based HEVC decoder capabilities
(published by NTT Docomo)
Single-threaded: 1080p@30 on ARMv7
(1.3GHz),1080p@60 decoding on i5
(2.53GHz)
Multi-threaded: 4Kx2K@60 on i7 (2.7GHz),
12Mbps, decoding speed up to 100fps
– Other independent software-based HEVC
real-time decoder implementations published
by Samsung and Qualcomm during HEVC
development
– Decoder complexity not substantially higher
More complex modules: MC, Transform, Intra
Pred, SAO
Simpler modules: CABAC and deblocking