An Introduction to Versatile Video Coding (VVC) for UHD, HDR and 360 Video

1
MOHIEDDIN MORADI
mohieddinmoradi@gmail.com
DREAM
IDEA
PLAN
IMPLEMENTATION

• Video coding standardization organizations
• Video coding concept
• HEVC Spatial Coding Structures
• Motivation for improved video compression
• JEM and VVC Timeline
• Joint Call for Proposals (CfP) on video compression with capability beyond HEVC
• Summary of proposed new tools
• VVC block partitioning
• VVC intra-picture prediction
• VVC inter-picture prediction
• Geometric (GEO) partitioning
• NN for video Coding
• VR / 360° video
• VR / 360° video spatial coding tools
Outline
2

History of video coding standardization (1985 ~ 2013)
3

Video coding standardization organisations
4
− ISO/IEC MPEG = “Moving Picture Experts Group”
(ISO/IEC JTC 1/SC 29/WG 11 = International Standardization Organization and International Electrotechnical
Commission, Joint Technical Committee 1, Subcommittee 29, Working Group 11)
− ITU-T VCEG = “Video Coding Experts Group”
(ITU-T SG16/Q6 = International Telecommunications Union – Telecommunications Standardization Sector (ITU-T,
a United Nations Organization, formerly CCITT), Study Group 16, Working Party 3, Question 6)
− JVT = “Joint Video Team”
Collaborative team of MPEG & VCEG, responsible for developing AVC (discontinued in 2009)
− JCT-VC = “Joint Collaborative Team on Video Coding”
Team of MPEG & VCEG , responsible for developing HEVC (established January 2010)
− JVET = “Joint Video Experts Team”
Exploring potential for new technology beyond HEVC (established Oct. 2015 as Joint Video Exploration Team, renamed
Apr. 2018)

The scope of video standardization
Only Specifications of the Bitstream, Syntax, and Decoder are standardized:
• Permits optimization beyond the obvious
• Permits complexity reduction for implementability
• Provides no guarantees of quality
Pre-Processing
Source
Destination
Post-Processing
& Error Recovery
Scope of Standard
Encoding
Decoding
5

Current Stage
Used since early days of video compression
standards, e.g. MPEG-1/-2/-4, H.264/AVC, HEVC and
also in most proprietary codecs (VC1, VP8 etc.)
Video coding concept
Input Frame 1
,Q
6

Input Frame 1 DCT
,Q
7

Quantized
010011101001…
Input Frame 1 DCT
,Q
8

QuantizedInput Frame 1 DCT
010011101001…
Reconstructed Frame 1 (Inverse DCT and Inverse Q)
,Q
9

Input Frame 2 Reconstructed Frame 1 (in buffer)
Comparison
,Q
10

010011101001…
Motion Estimation
(MVs Generation)
Input Frame 2 Reconstructed Frame 1
Entropy Coded MVs
,Q
11

Input Frame 2
– =
Residual without MC
Reconstructed Frame 1 with MC Residual with MC (Frames 1&2)
,Q
12

Residual with MC
(Frames 1&2)
DCT
,Q
13

010011101001…
QuantizedDCT
Residual with MC
(Frames 1&2)
,Q
14

Reconstructed Residual
with MC (Frames 1&2)QuantizedDCT
Residual with MC
(Frames 1&2)
,Q
15

+
=
Reconstructed Frame 2
(to frame buffer)
Reconstructed Residual
with MC (Frames 1&2)
Reconstructed Frame 1
with MC
,Q
16

HEVC spatial coding structures
• Hybrid Video Coding
− Transform coding: DCT-
like transform to compact
the energy of the signal.
− Predictive coding: Intra or
Inter (motion
compensated) prediction
− Entropy coding: context-
adaptive binary
arithmetic coding
(CABAC).
17

• Blocks and Units
 Block: Square or rectangular area in a color component array
 Unit: Collocated blocks of the (three) color components,
associated syntax elements and prediction data (e.g. motion
vectors)
• Picture partitioning
 Coding Tree Units / Coding Tree Blocks (CTUs / CTBs)
 Independent slice segment: full header, independently
decodable
 Dependent slice segment: very short header, relies on
corresponding independent slice, inherits CABAC state
• Slice types
 I-slice: Intra prediction only
 P-slice: Intra prediction and motion compensation with one
reference picture list
 B-slice: Intra prediction and motion compensation with one or two
reference picture lists
18

Scaling &
Inverse Transform
Intra-Picture
Prediction
Inter-Picture
Prediction
Entropy
Coding
In-Loop Filter
Output
010110...
Bitstream
Quant. Transf. Coeffs.
Transform,
Scaling & Quantization
Output
Video Signal
Motion Estimation
Decoder
-
Subdivision
into Blocks
Input
Video Signal

Output
Transform, Entropy
Scaling & Quantization Coding 010110...
- Bitstream
Scaling &
Decoder Inverse Transform
Intra-Picture In-Loop Filter
Prediction
Output
Inter-Picture
Prediction Video Signal
Motion Estimation
Coding Tree Blocks
Coding Blocks Transform Blocks
Input
Video Signal
Prediction Blocks

Slices
− A video is split into blocks
− Those blocks are split into smaller blocks
− The prediction and transformation are done on the smallest blocks.
CTU
(Max CU Size 64×64)

22
CTU
Prediction block partitioning (Prediction Unit)
Transform block partitioning (Transform Unit)
TU size 4x4, 8x8, 16x16, 32x32 DCT, and 4x4 DST
Asymmetric motion partitioning (AMP)
2N×2N
4×4
Intra Prediction
CU
CU
CU
CU CU
CU
CU CU
CUCU
TU TU
TU
TU TU
TUTU
2N×nU 2N×nD nL×2N nR×2N
2N×2N 2N×N N×2N N×N
Inter Prediction

• Prediction Block (PB) partitioning of a 2N×2N CB
 Each prediction block in a coding block uses the same
prediction mode (intra or inter)
• Transform Block (TB) partitioning of a CB
 Quadtree partitioning of CB → Residual Quad Tree (RQT)
 Transform size 4×4 to 32×32
 PB boundaries inside TBs allowed 23
Asymmetric motion partitioning (AMP)
2N×2N
4×4
Intra Prediction
2N×nU 2N×nD nL×2N nR×2N
2N×2N 2N×N N×2N N×N
Inter Prediction

– Coding Tree Unit (CTU)
− Corresponds to macroblocks in earlier coding standards.
− Each CTU in exactly one slice segment
− Maximum CTU size: 64×64 pixels
− Split into Coding Units (CU)
– Coding Unit (CU)
− CU size 64x64, 32x32, 16x16, 8x8
− For Intra or inter coding mode decision
− Split into Prediction Units (PUs) and Transform Units (TUs)
– Prediction Unit (PU), the elementary unit for predication
− Partition and motion info
– Transform Unit (TU), the units for transform and quantization
− TU size 4x4, 8x8, 16x16, 32x32 DCT, and 4x4 DST
24

Example (Coding Quadtree:):
− The numbers indicate the coding order of the transform blocks (Z–Scan Order)
− The transform blocks chosen identical to the corresponding coding blocks are not explicitly marked in this
figure
Blue lines: coding tree
Red lines: non-degenerated
residual quadtrees
25

Intra-Picture
Prediction
Output
Transform, Entropy
Bitstream
Scaling &
Inverse Transform
In-Loop Filter
Output
Video Signal
Inter-Picture
Prediction
Motion Estimation
coder
-
De
Input
Video Signal

Intra and Inter Prediction
27

28

EX: Intra coding of one 64×64 CTB with all intra
coding.
− a) Original.
− b) Applicable intra prediction modes on the
CU basis
− c) Prediction signal.
− d) Residual with the corresponding CB (solid
lines) and TB partitioning (dashed lines)
29

• Intra prediction modes
− Planar prediction: mode 0 (P)
− DC intra prediction: mode 1(DC)
− Numbering from diagonal-up to diagonal-down
− Modes 2 – 18: Horizontal
− Modes 19 – 34: Vertical
− Horizontal: mode 10
− Vertical: mode 26
• Intra prediction block size
− Intra prediction mode coded per CU
− Prediction block size derived from residual quadtree
− Boundary samples of neighboring block used for prediction
− Efficient representation
− Local update of prediction source
Intra Prediction Modes
30

Inter-Picture
Prediction
Output
Transform, Entropy
Bitstream
Scaling &
Inverse Transform
In-Loop Filter
Output
Video Signal
Motion Estimation
coder
Intra-Picture
Prediction
-
De
Input
Video Signal

32

• Prediction from reference picture lists
• Uni-prediction
 P-slices only with List0, B-slices with List0 or List1
 HEVC: Minimum PB size 8×4 or 4×8
• Bi-prediction, only in B-slices
 One predictor from List0, one predictor from List1
 HEVC: Minimum prediction block size 8×8
Picture Order Count (POC)
Motion Compensated Prediction

Scaling &
Inverse Transform
In-Loop Filter
Transform,
Output
Entropy
Coding 010110...
Bitstream
Output
Video Signal
-
Decoder
Intra-Picture
Prediction
Input
Inter-Picture
Video Signal Prediction
Motion Estimation

In-Loop Filter
Transform,
Scaling & Quantizati
Scaling &
Inverse Transform
Output
Entropy
on Coding 010110...
Bitstream
Output
Video Signal
-
Decoder
Intra-Picture
Prediction
Input
Inter-Picture
Motion Estimation

• Two filters to remove coding artifacts and preserve edges (not effective in intra prediction):
− Deblocking Filter: operated only on 8x8 block boundaries (not 4x4) with 4-sample units.
− Sample Adaptive Offset filter: to add corrective offset values for attenuating:
• Systematic Errors introduced by quantization and phase shifts from inaccurate motion vectors
• Ringing Artefacts (Gibbs Phenomenon), introduced mainly by large transform sizes.
Deblocking Filter Effect

Entropy
Coding
Output
010110...
Bitstream
Output
Video Signal
Transform,
-
Scaling &
Decoder Inverse Transform
Intra-Picture In-Loop Filter
Prediction
Input
Inter-Picture
Motion Estimation

• Context-based Adaptive Binary Arithmetic Coding (CABAC):
− Usage of adaptive probability models for most symbols
− Exploiting symbol correlations by using contexts
− Restriction to binary arithmetic coding based on table look-ups and shifts only

UHDTV 1
3840 x 2160
8.3 MPs
Digital Cinema 2K
2048 x 1080
2.21 MPs
4K
4096 x 2160
8.84 MPs
SD (PAL)
720 x 576
0.414MPs
HDTV 720P
1280 x 720
0.922 MPs
HDTV 1920 x 1080
2.027 MPs
UHDTV 2
7680 x 4320
33.18 MPs 8K
8192×4320
35.39 MPs
Wider Viewing Angle
More Immersive
Motivation for improved video compression: “Spatial Resolution”
39

Motion Blur
Motion Judder
Conventional Frame Rate
Wider Viewing Angle
Increased perceived motion artifacts
Higher frame rates is needed
50fps minimum (100fps being vetted)
Motivation for improved video compression: “HFR (High Frame Rate)”
40

– Deeper Colors
– More Realistic Pictures
– More Colorful
Wide Color Space (ITU-R Rec. BT.2020)
75.8%, of CIE 1931
Color Space (ITU-R Rec. BT.709)
35.9%, of CIE 1931
WCG CIE 1931 Color Space
Motivation for improved video compression: “WCG (Wide Color Gamut)”
41

Standard Dynamic Range
High Dynamic Range
(More Vivid, More Detail)
Motivation for improved video compression: “HDR (High Dynamic Range)”
OR
42

10 bits quantization, 1024 Levels8 bits quantization, 256 Levels
Motivation for improved video compression: Quantization (Bit Depth)
– More colours
– More bits (10-bit)
– Avoiding Banding or Contouring
43
Banding or Contouring

Motivation for improved video compression: Stereo, multi-view, 360° video
44

History of video coding standardization (1985 ~ 2020)
45
H.263/+/++
(1995-2000+)
MPEG-4
Visual
(1998-2001+)
MPEG-1
(1993)
ISO/IECITU-T
H.261
(1990+)
H.262 / 13818-2
(1994/95-1998+)
(2003-2018+) (2013-2018+)
Videotelephony
H.120
(1984-1988)
Computer
SD HD
H.264 / 14496-10
AVC
4K UHD
H.265 / 23008-2
HEVC
(Advanced Video Coding
developed by JVT)
(High Efficiency Video Coding
developed by JCT-VC)
(MPEG-2)
(2020-...)
8K, 360, ...
H.26x / 23090-3
VVC
(Versatile Video Coding to be
developed by JVET)
1990 1994 2003 2013 2020

50% bitrate saving – Direct-to-home
30% bitrate saving – Contribution
50% bitrate saving – Direct-to-home
30% bitrate saving – Contribution
46
2020
VVC
2020
(JVET)
≈50% bitrate saving – Direct-to-home
≈30% bitrate saving – Contribution
Versatile Video Coding (VVC) for Contribution and Distribution

47
Jan
2013
HEVC
v1
Oct
2015
JVET
Formed
Mar
2017
Call for
Evidence
Oct
2017
Call for
Proposal
Apr
2018
Common base
Stablished (VVC)
Oct
2020
VVC
v1
− “Joint Exploration Model“ (JEM) developed by JVET
 Experimental software to explore new coding tools
 Intended to investigate potential for better compression beyond HEVC
 Was initially started extending HEVC software by additional compression tools, or replace existing tools
• Call For Evidence (CfE): Subjective verification of the JEM coding efficiency compared to HEVC
• Call for Proposals (CfP): Submission and subjective evaluation of new video coding technologies
JEM and VVC Timeline

− 32 companies in 21 proponent groups
− 46 category-specific submissions
 22 in SDR video
 12 in HDR video
 12 in 360° video
− All responses clearly better than HEVC, some evidently better than JEM
The subjective quality of best performing proposals is always equal or sometimes better (~1/3 of cases) than HEVC over all
categories with approx. 40% less rate
Very successful Call for Proposals (CfP) (April 2018)
48
JVET documents available at http://phenix.it-
sudparis.eu/jvet

− SDR-A: 3840×2160
5 UHD sequences (from 950 kbit/s to 10 Mbit/s)
− SDR-B: 1920×1080
5 HD sequences (from 400 kbit/s to 3.8 Mbit/s)
− HDR (PQ HD, HLG 4K)
4 HD sequences, PQ curve (350 kbit/s to 3
Mbit/s)
3 UHD sequences, HLG curve (from 640 kbit/s
to 10 Mbit/s)
− 360 Video (8K, 6K)
1 sequence 6K x 3K (2 Mbit/s to 10 Mbit/s)
4 sequences 8K x 4K (400kbit/s to 7 Mbit/s)
VVC Call for Proposal test sequences
49
FoodMarket4 60p CatRobot1 60p DaylightRoad2 60p ParkRunning3 50p Campfire 30p
BasketballDrive 50p Cactus 50p BQTerrace 60p RitualDance 60p MarketPlace 60p
Market3 HD50p Hurdles HD50p Starting HD50p ShowGirls2 HD25p Cosmos1 HD24p
DayStreet 60p PeopleInShop... SunsetBeach 60p
ChairliftRide 30p KiteFlite 30p Harbor 30p Trolley 30p Balboa 60p

− New elements (some come with high complexity):
 Decoder side estimation for mode/MV derivation
 Finer partitioning: Asymmetric, geometric
 Neural networks for prediction, loop filtering, upsampling, (encoder control)
 Additional non-linear, de-noising and statistics-based loop filters
 Additional linear and non-linear elements in prediction
 Intra block copy (current picture referencing)
− HDR specific:
 New adaptive reshaping and quantization, also in-loop
 HDR-specific modifications of existing tools, e.g. deblocking
− 360-video specific:
 Variants of projection formats, geometry-corrected face boundary padding
 Modification and disabling of existing tools at face boundaries
What was proposed in CfP?
50

− QT/BT/TT (QT: Quadtree. BT: Binary tree. TT: Ternary tree)
− Remove unnecessary partitioning restrictions
− Implicit splitting at picture boundaries
− Separate trees for intra slices
− Position Dependent Prediction Combination in Intra Prediction (combines values predicted using non-filtered and filtered
(smoothed) reference samples, depending on the prediction mode, and block size)
− Cross Component Linear Model in Intra Prediction (Chroma component prediction )
− 87 intra modes (wide angles included), 3 most probable modes (MPM) , TUbinarization
− Affine MC (4x4 fixed subblock size, 4/6 parameter model switching at CUlevel)
− Affine MV coding
− list construction contains inheritance and derivation spatial/temporal
− improved difference coding
Summary of proposed elements (1)
51

− Adaptive motion vector resolution (AMVR) (A video encoder adaptively select a sub-pixel precision for MV)
− Local illumination compensation
− Subblock MC (4x4) from advanced temporal motion vector prediction (ATMVP) merge, 8x8 granularity motion
vector storage [High precision]
− Multiple transform selection (all are DCT/DST types) for intra and inter (Just DCT in HEVC), 4 different separable
transforms (DCT/DST)
− Adaptive Transform:
− Performed in addition to DCT-II and 4x4 DST-VII, which are employed in HEVC; The newly introduced
transform matrices are: DST-VII, DCT-VIII, DST-I and DCT-V.
− Increase max QP from 51 to 63
− An enhanced rate distortion optimized quantization scheme called Dependent Scalar Quantization.
− CABAC coder from AVC, it has been enhanced and is now even faster.
− Modified entropy coding supporting Dependent Scalar Quantization
52

− Adaptive loop filter
− 4x4 classification based (gradient strength & orientation) for luma
− 7x7 luma, 5x5 chroma filters
− enabling flag at CTU level
− Basic high-level syntax
− (SPS (Sequence Parameter Set)
− PPS (Picture Parameter Set)
− Tiles/Slices
− Reference Picture Signaling
− Update of Benchmark Set (BMS) software contains
− Generalized Bi prediction (kind of local weighted prediction)
− Decoder-side estimation: Bi-directional Optical flow (BIO, simplified bilateral matching)
− Current picture referencing (aka intra block copy)
53

− Root Size 128×128 (64×64 in HEVC)
− 1st Tree
• Quad Split
− 2nd Tree
• Binary Split
• Ternary Split
54
Quad/binary/ternary partitioning
Block partitioning

− 65 intra prediction direction (33 in HEVC)
− Rectangular block prediction (HEVC: Square)
− Larger blocks size128×128 (HEVC: 32×32)
− New prediction modes where you can do a Directional
Interpolation (Position Dependent intra Prediction Combination:
PDPC)
− Chroma component prediction (Cross-Component Linear
Model: CCLM)
− Luma and chroma blocks can have different block sizes using
a separate tree for the chroma components
0: Planar
1: DC
Intra-Picture Prediction

− Average rate reductions of 4-5% have been reported by Neural networks
− Mostly fully connected networks (FCN) have been used for this purpose (no convolutional layers)
Intra-Picture Prediction by neural networks (NN)
Reconstructed Samples
K
K
Reconstructed Samples
K
K
Neural
Network
Weights
biases
Mode
56
M
N
Predicted
Samples

− Based rate-distortion optimization, the encoder locally signals whether motion derivation is used or not.
− The motion information is derived at the decoder side instead of explicitly extraction from coded bit stream.
− In order to minimum the search complexity at the decoder, only 8 positions are searched in each reference around the block.
I. Performing normal bi-prediction
− Search through the reference pictures that we already have for motion compensated block (Refereed by initial MV0 and MV1).
II. Performing normal bi-prediction again
− Using the updated motion information, a better prediction and motion vector of the current block is obtained (Refereed by the
updated MV0’ and MV1’).
Past Reference (List 0) Future Reference (List 1)
Current Block
Decoder side search for motion information or prediction information
Normal bi-prediction
Copying the current block
MV0
MV1
MV1’
MV0’
57

Current picture
Reference picture
(motion source picture)
Inter-Picture Prediction:
− ATMVP (Alternative Temporal Motion Vector Prediction)
 Prediction information (motion information): The motion vector and related picture in the reference buffer to use
 Motion information prediction : To use neighboring motion vector of current block to obtain its prediction information.
 ATMVP: Prediction the sub-CUs motion vectors within a CU.
N×N Sub-CU
Corresponding
N×N Block
MV1
MV1
MV1
MV1
MV0
MV0
MV0
MV0
Alternative Temporal Motion Vector
Prediction (by TMVP modification)
58

Current picture
Reference picture
(motion source picture)
− Spatial-Temporal Motion Vector Prediction (STMVP)
 To combine temporal predication and spatial predication (Predictions from the neighborhood) (Inter/Intra)
 In STMVP method motion vectors of sub-CUs are derived recursively by using the temporal motion vector predictor and
spatial neighbouring motion vector.
 The averaged motion vector is assigned as the motion vector of the current sub-CU.
Temporal
Neighbor
Temporal
Neighbor
TMV
SMV
SMV
STMVP
59

Part of left CU
Part of Current CU1
Part of upper CU
− Overlap Block Motion Compensation (OBMC)
− MV is most reliable in the center of the block.
− To process CU boundaries in a uniform fashion, OBMC is
performed at sub-block level for all MC block
boundaries.
− OBMC overlaps the prediction from multiple nearby MVs
and blend them to avoid sharp edges which typically
occur in inter prediction (blocking artifact).
− When a CU is coded with sub-CU mode, each sub-block
of the CU is a MC block.
− OBMC can be switched on and off at the CU level.
Current MC block
(sub-block 4×4) in PU
𝑴𝑽 𝟎
𝑴𝑽 𝒖
𝑴𝑽𝒍
𝑴𝑽 𝒏𝒆𝒘
60

61
− Overlap Block Motion Compensation (OBMC)

− The in natural images edges are usually not rectangular, therefore, encoders tend to select very small blocks around
edges in order to efficiently predict them.
− With more flexible prediction shapes like diagonal splits, the encoder could use bigger blocks at edges which would be
much more efficient.
− Non-rectangular Portions
− 2 diagonal options
− More flexible signaling
Past Reference (List 0) Future Reference (List 1)
Current Block
Non rectangular partitioning
MV1
MV0
62

− Classical motion compensation
− 2 dimensional rectangular regions (x,y)
− Affine motion:
− Scaling, rotation, shape changes and shearing
− The full model now has 6 degrees of freedom (DOF).
− The motion information per 4×4 block is calculated using the affine motion model.
− Affine Transformation
 2D Transformation (2 DOF)
 + Rotate and Scale (4 DOF)
 + Aspect Ration and Shear (6 DOF)
 2 or 3 control point motion vector
Affine Motion Vector
Translation scaling rotation shape changes shearing63

Geometric (GEO) partitioning
64
− Motivation: Towards object-oriented coding
 Follow object boundaries more closely
 Less coding artifacts
 GEO available for all block sizes ≥ 8×8 luma samples
− Prediction, transform and coding driven by actual object
shape under RD-constraint (Rate-Distortion constraint)
 Inter and intra predicted segments for handling of
disocclusions
 Overlapped wedge based filtering at partition boundary
 Shape-adaptive DCT for spatially localized transform
coding

− Partitioning is represented by two coordinate points 𝑃0 and 𝑃1 on the block boundary
− Prediction of two coordinate points 𝑃0 and 𝑃1 from 16 pre-defined templates (scaled for non-square blocks)
 Alternative: Spatial or temporal prediction
 Refinement: Block size dependent offset
− Integration with AMVP (Advanced motion vector prediction), MERGE, FRUC (Frame rate up conversion )
GEO: Partitioning coding and prediction
65

Results for GEO
66
− Visual improvements at object boundaries
 Sharper contours
 Less staircase-effect
 More background details
JEM 7.0

Results for GEO
67
− Visual improvements at object boundaries
 Sharper contours
 Less staircase-effect
 More background details
− Objective gains (BD-rate savings)
 Against HEVC: 25% ~33%
JEM 7.0 + GEO

− NN and intra coding:
− All methods were developed for still image coding could be used for intra coding.
− NN and inter coding:
− Motion compensation is a very effective tool, and can hardly be trained into a network (or would be tremendously more
complex than conventional motion estimation)
− Some work on using CNN for
 Sub-pel interpolation
 Resolution up-conversion
 Post-processing
 Texture synthesis and inpainting
 loop filtering
 Intra coding
 Encoder optimization, in particular partitioning which is basically a segmentation problem
− It is also not as simple to train for perceptual criteria in video
NN for video Coding
68

− Loop filtering
 Removes compression artifacts from reconstruction
 Improves prediction from reconstructed frames
− Generally, signal-adaptive and non-linear filters
 De-blocking, de-ringing, de-banding
 Edge-adaptive & Wiener optimized
 Bi-lateral filters
 ...
− CNN reconstruction
 Additional gain (3-5% rate) and might replace some
conventional filters
 Can be operated on block basis, parallel processing possible
CNN for loop filtering
69The decoded frame of CNNF
The original frame

− CNN-based could generate super-resolution upsampling, sharper edges, etc.
− Basic idea of dynamic resolution coding:
 Downsample and coding by lower resolution (less bitrate cost)
 Key pictures coded with full resolution
 Non-key pictures coded with reduced resolution
 Upsample at decoder side to full resolution
 Encoder decides using full Res, conventional or CNN-based downsampling and upsampling
− Can be implemented in combination with intra and inter prediction coding
− Operated on block by block basis
− Significant bit rate saving (20-30% average) whereas subjective quality is preserved compared to full-resolution coding
CNN for Variable-resolution coding (dynamic texture content)
70Key pictures

− New omnidirectional cameras allow acquiring panoramic video (by mosaic stitching)
− Appropriate rendering to a head mounted display allows adapting the viewpoint according to head movements in real-time
− With appropriate projection, the panorama can be packed into a 2D frame
VR / 360° video
71

− New omnidirectional cameras allow acquiring panoramic video (by mosaic stitching)
− Appropriate rendering to a head mounted display allows adapting the viewpoint according to head movements in real-time
− With appropriate projection, the panorama can be packed into a 2D frame
VR / 360° video
72

− Cubemap projection with 3x2 packing (as example)
− 6 Faces can be treated as rectangular video
− Equirectangular projection
− The whole sphere is projected into a
rectangular picture
− Extreme geometric distortions, in particular
at the poles
− Non-uniform sampling inherent
− Cubemap seems to provide better performance than
equitrecntangular
VR / 360° video: projection formats
73

− Stitching requires registration
− Identification of matching key points, geometric warping of pictures
− Optimum stitching path can be based on
− Minimum sample difference
− Depth cues (for appropriate occlusion handling)
− To mask artifacts
− Some blending/filtering/hole filling may be necessary
− In video: avoid temporal variation of stitching path
VR / 360° video: panorama stitching
74

− 360° video can be supported by signalling mechanisms of SEI messages (and equivalently in MPEG's Ominidirectional
Application Format, OMAF)
− Equirectangular & cubemap projections
− Sphere rotation
− Region-wise packing
− Omnidirectional viewport
− Motion constrained tile sets may be used to disallow MC across face boundaries when these are defined as tiles
− Discontinuities in projection (e.g. boundaries between cube faces) can cause problems – may become visible in particular
when compression artifacts are there
− This can be resolved by extending the faces, padding pixels from spherically neighbored positions with
geometry correction, and performing blending in rendering
− This however increases the number of coded samples (pixels close to boundaries are duplicated) and therefore
increases the data rate
360° video – current standardization status
75

− According to CfP results, projection formats from the family of cubemaps show best compression performance
− They however suffer from visibility of face boundaries, which grow larger with decreased compression quality
Two problems and proposed solutions:
− Packed/projected neighbors which are no physical/spherical neighbors:
− Solution: disable coding tools over face boundaries, such as prediction, filtering, ...
− Physical/spherical neighbors which are no packed/projected neighbors:
− Solution: connect samples from disparate positions in the frame for better prediction, perfoming filtering, ...
360° video specific coding tools
76
6 reference pictures
One per cube face

− Special characteristics of 360 content
 360° symmetry not exploited by current codecs
 Motion across face boundaries possible
 Geometric distortions
 Motion compensation suboptimal
 Not correctly treated by loop filters
− Two proposals:
I) Face extension for motion estimation and
compensation
II) Loop filtering over continuous boundaries
according to 3D arrangement
360° video coding tools
77
Motion across face boundaries
Motion across face boundaries
Motion across
face boundaries

− Compression of 360° video is dependent on projection, which with translational block-wise motion compensation can cause
geometrical errors
− Particular problems at boundaries with formats using several planar-projected faces (e.g. cube)
− Specific MC tools should be designed
− Face extension
− Cube padding (Padding of samples ) allows the receptive field of each face to extend across the adjacent faces.
360° video coding tool: face extension
78

360° video coding tools – First proposal Face extension
Main Face
− Compression of 360° video is dependent on projection
− Traditional block-wise motion compensation can cause geometrical errors (Particular problems at boundaries of faces)
Solution
− Cube padding (Padding of samples ) allows the receptive field of each face to extend across the adjacent faces.
− Face extension
79

80
All six original faces of a frame of the “bicyclist” sequence.
The additional reference picture for face 1 of a
frame of the “bicyclist” sequence.
The additional reference picture for face 2
of a frame of the “bicyclist” sequence.

81
The additional reference picture for face 3
of a frame of the “bicyclist” sequence.
The additional reference
picture for face 4 of a frame
of the “bicyclist” sequence.

82
The additional reference picture
for face 5 of a frame of the
“bicyclist” sequence.
The additional reference picture
for face 6 of a frame of the
“bicyclist” sequence.

− Reference samples of blocks at face boundaries
changed.
Solution
 Samples are chosen according to 3D cube
geometry not just from top or left.
360° video coding tools – Second proposal Corrected deblocking filter
83

− Objective gains (Bjontegaard delta (BD) rate savings)
 Against HEVC anchor: ~31% E2E WS-PSNR (End to End WS-PSNR)
 Gains higher for sequences with high motion
JEM deblocking VVC deblocking 84
360° video coding tools – Second proposal Corrected deblocking filter

Questions??
Discussion!!
Suggestions!!
85

An Introduction to Versatile Video Coding (VVC) for UHD, HDR and 360 Video

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to An Introduction to Versatile Video Coding (VVC) for UHD, HDR and 360 Video

Similar to An Introduction to Versatile Video Coding (VVC) for UHD, HDR and 360 Video (20)

More from Dr. Mohieddin Moradi

More from Dr. Mohieddin Moradi (20)

Recently uploaded

Recently uploaded (20)

An Introduction to Versatile Video Coding (VVC) for UHD, HDR and 360 Video