4. Video coding standardization organisations
4
− ISO/IEC MPEG = “Moving Picture Experts Group”
(ISO/IEC JTC 1/SC 29/WG 11 = International Standardization Organization and International Electrotechnical
Commission, Joint Technical Committee 1, Subcommittee 29, Working Group 11)
− ITU-T VCEG = “Video Coding Experts Group”
(ITU-T SG16/Q6 = International Telecommunications Union – Telecommunications Standardization Sector (ITU-T,
a United Nations Organization, formerly CCITT), Study Group 16, Working Party 3, Question 6)
− JVT = “Joint Video Team”
Collaborative team of MPEG & VCEG, responsible for developing AVC (discontinued in 2009)
− JCT-VC = “Joint Collaborative Team on Video Coding”
Team of MPEG & VCEG , responsible for developing HEVC (established January 2010)
− JVET = “Joint Video Experts Team”
Exploring potential for new technology beyond HEVC (established Oct. 2015 as Joint Video Exploration Team, renamed
Apr. 2018)
5. The scope of video standardization
Only Specifications of the Bitstream, Syntax, and Decoder are standardized:
• Permits optimization beyond the obvious
• Permits complexity reduction for implementability
• Provides no guarantees of quality
Pre-Processing
Source
Destination
Post-Processing
& Error Recovery
Scope of Standard
Encoding
Decoding
5
6. Current Stage
Used since early days of video compression
standards, e.g. MPEG-1/-2/-4, H.264/AVC, HEVC and
also in most proprietary codecs (VC1, VP8 etc.)
Video coding concept
Input Frame 1
,Q
6
17. HEVC spatial coding structures
• Hybrid Video Coding
− Transform coding: DCT-
like transform to compact
the energy of the signal.
− Predictive coding: Intra or
Inter (motion
compensated) prediction
− Entropy coding: context-
adaptive binary
arithmetic coding
(CABAC).
17
18. • Blocks and Units
Block: Square or rectangular area in a color component array
Unit: Collocated blocks of the (three) color components,
associated syntax elements and prediction data (e.g. motion
vectors)
• Picture partitioning
Coding Tree Units / Coding Tree Blocks (CTUs / CTBs)
Independent slice segment: full header, independently
decodable
Dependent slice segment: very short header, relies on
corresponding independent slice, inherits CABAC state
• Slice types
I-slice: Intra prediction only
P-slice: Intra prediction and motion compensation with one
reference picture list
B-slice: Intra prediction and motion compensation with one or two
reference picture lists
HEVC spatial coding structures
18
20. Quant. Transf. Coeffs.
Output
Transform, Entropy
Scaling & Quantization Coding 010110...
- Bitstream
Scaling &
Decoder Inverse Transform
Intra-Picture In-Loop Filter
Prediction
Output
Inter-Picture
Prediction Video Signal
Motion Estimation
Coding Tree Blocks
Coding Blocks Transform Blocks
Input
Video Signal
Prediction Blocks
HEVC spatial coding structures
21. HEVC spatial coding structures
Slices
− A video is split into blocks
− Those blocks are split into smaller blocks
− The prediction and transformation are done on the smallest blocks.
CTU
(Max CU Size 64×64)
22. HEVC spatial coding structures
22
CTU
Prediction block partitioning (Prediction Unit)
Transform block partitioning (Transform Unit)
TU size 4x4, 8x8, 16x16, 32x32 DCT, and 4x4 DST
Asymmetric motion partitioning (AMP)
2N×2N
4×4
Intra Prediction
CU
CU
CU
CU CU
CU
CU CU
CUCU
TU TU
TU
TU TU
TUTU
2N×nU 2N×nD nL×2N nR×2N
2N×2N 2N×N N×2N N×N
Inter Prediction
23. • Prediction Block (PB) partitioning of a 2N×2N CB
Each prediction block in a coding block uses the same
prediction mode (intra or inter)
• Transform Block (TB) partitioning of a CB
Quadtree partitioning of CB → Residual Quad Tree (RQT)
Transform size 4×4 to 32×32
PB boundaries inside TBs allowed 23
HEVC spatial coding structures
Asymmetric motion partitioning (AMP)
2N×2N
4×4
Intra Prediction
2N×nU 2N×nD nL×2N nR×2N
2N×2N 2N×N N×2N N×N
Inter Prediction
24. HEVC spatial coding structures
– Coding Tree Unit (CTU)
− Corresponds to macroblocks in earlier coding standards.
− Each CTU in exactly one slice segment
− Maximum CTU size: 64×64 pixels
− Split into Coding Units (CU)
– Coding Unit (CU)
− CU size 64x64, 32x32, 16x16, 8x8
− For Intra or inter coding mode decision
− Split into Prediction Units (PUs) and Transform Units (TUs)
– Prediction Unit (PU), the elementary unit for predication
− Partition and motion info
– Transform Unit (TU), the units for transform and quantization
− TU size 4x4, 8x8, 16x16, 32x32 DCT, and 4x4 DST
24
25. Example (Coding Quadtree:):
− The numbers indicate the coding order of the transform blocks (Z–Scan Order)
− The transform blocks chosen identical to the corresponding coding blocks are not explicitly marked in this
figure
HEVC spatial coding structures
Blue lines: coding tree
Red lines: non-degenerated
residual quadtrees
25
26. Intra-Picture
Prediction
Quant. Transf. Coeffs.
Output
Transform, Entropy
Scaling & Quantization Coding 010110...
Bitstream
Scaling &
Inverse Transform
In-Loop Filter
Output
Video Signal
Inter-Picture
Prediction
Motion Estimation
coder
-
De
Input
Video Signal
HEVC spatial coding structures
29. HEVC spatial coding structures
EX: Intra coding of one 64×64 CTB with all intra
coding.
− a) Original.
− b) Applicable intra prediction modes on the
CU basis
− c) Prediction signal.
− d) Residual with the corresponding CB (solid
lines) and TB partitioning (dashed lines)
29
30. • Intra prediction modes
− Planar prediction: mode 0 (P)
− DC intra prediction: mode 1(DC)
− Numbering from diagonal-up to diagonal-down
− Modes 2 – 18: Horizontal
− Modes 19 – 34: Vertical
− Horizontal: mode 10
− Vertical: mode 26
• Intra prediction block size
− Intra prediction mode coded per CU
− Prediction block size derived from residual quadtree
− Boundary samples of neighboring block used for prediction
− Efficient representation
− Local update of prediction source
HEVC spatial coding structures
Intra Prediction Modes
30
31. Inter-Picture
Prediction
Quant. Transf. Coeffs.
Output
Transform, Entropy
Scaling & Quantization Coding 010110...
Bitstream
Scaling &
Inverse Transform
In-Loop Filter
Output
Video Signal
Motion Estimation
coder
Intra-Picture
Prediction
-
De
Input
Video Signal
HEVC spatial coding structures
33. • Prediction from reference picture lists
• Uni-prediction
P-slices only with List0, B-slices with List0 or List1
HEVC: Minimum PB size 8×4 or 4×8
• Bi-prediction, only in B-slices
One predictor from List0, one predictor from List1
HEVC: Minimum prediction block size 8×8
HEVC spatial coding structures
Picture Order Count (POC)
Motion Compensated Prediction
34. Scaling &
Inverse Transform
In-Loop Filter
Transform,
Scaling & Quantization
Quant. Transf. Coeffs.
Output
Entropy
Coding 010110...
Bitstream
Output
Video Signal
-
Decoder
Intra-Picture
Prediction
Input
Inter-Picture
Video Signal Prediction
Motion Estimation
HEVC spatial coding structures
35. In-Loop Filter
Transform,
Scaling & Quantizati
Scaling &
Inverse Transform
Quant. Transf. Coeffs.
Output
Entropy
on Coding 010110...
Bitstream
Output
Video Signal
-
Decoder
Intra-Picture
Prediction
Input
Inter-Picture
Video Signal Prediction
Motion Estimation
HEVC spatial coding structures
36. • Two filters to remove coding artifacts and preserve edges (not effective in intra prediction):
− Deblocking Filter: operated only on 8x8 block boundaries (not 4x4) with 4-sample units.
− Sample Adaptive Offset filter: to add corrective offset values for attenuating:
• Systematic Errors introduced by quantization and phase shifts from inaccurate motion vectors
• Ringing Artefacts (Gibbs Phenomenon), introduced mainly by large transform sizes.
HEVC spatial coding structures
Deblocking Filter Effect
38. • Context-based Adaptive Binary Arithmetic Coding (CABAC):
− Usage of adaptive probability models for most symbols
− Exploiting symbol correlations by using contexts
− Restriction to binary arithmetic coding based on table look-ups and shifts only
HEVC spatial coding structures
39. UHDTV 1
3840 x 2160
8.3 MPs
Digital Cinema 2K
2048 x 1080
2.21 MPs
4K
4096 x 2160
8.84 MPs
SD (PAL)
720 x 576
0.414MPs
HDTV 720P
1280 x 720
0.922 MPs
HDTV 1920 x 1080
2.027 MPs
UHDTV 2
7680 x 4320
33.18 MPs 8K
8192×4320
35.39 MPs
Wider Viewing Angle
More Immersive
Motivation for improved video compression: “Spatial Resolution”
39
40. Motion Blur
Motion Judder
Conventional Frame Rate
Wider Viewing Angle
Increased perceived motion artifacts
Higher frame rates is needed
50fps minimum (100fps being vetted)
Motivation for improved video compression: “HFR (High Frame Rate)”
40
41. – Deeper Colors
– More Realistic Pictures
– More Colorful
Wide Color Space (ITU-R Rec. BT.2020)
75.8%, of CIE 1931
Color Space (ITU-R Rec. BT.709)
35.9%, of CIE 1931
WCG CIE 1931 Color Space
Motivation for improved video compression: “WCG (Wide Color Gamut)”
41
42. Standard Dynamic Range
High Dynamic Range
(More Vivid, More Detail)
Motivation for improved video compression: “HDR (High Dynamic Range)”
OR
42
43. 10 bits quantization, 1024 Levels8 bits quantization, 256 Levels
Motivation for improved video compression: Quantization (Bit Depth)
– More colours
– More bits (10-bit)
– Avoiding Banding or Contouring
43
Banding or Contouring
47. 47
Jan
2013
HEVC
v1
Oct
2015
JVET
Formed
Mar
2017
Call for
Evidence
Oct
2017
Call for
Proposal
Apr
2018
Common base
Stablished (VVC)
Oct
2020
VVC
v1
− “Joint Exploration Model“ (JEM) developed by JVET
Experimental software to explore new coding tools
Intended to investigate potential for better compression beyond HEVC
Was initially started extending HEVC software by additional compression tools, or replace existing tools
• Call For Evidence (CfE): Subjective verification of the JEM coding efficiency compared to HEVC
• Call for Proposals (CfP): Submission and subjective evaluation of new video coding technologies
JEM and VVC Timeline
48. − 32 companies in 21 proponent groups
− 46 category-specific submissions
22 in SDR video
12 in HDR video
12 in 360° video
− All responses clearly better than HEVC, some evidently better than JEM
The subjective quality of best performing proposals is always equal or sometimes better (~1/3 of cases) than HEVC over all
categories with approx. 40% less rate
Very successful Call for Proposals (CfP) (April 2018)
48
JVET documents available at http://phenix.it-
sudparis.eu/jvet
49. − SDR-A: 3840×2160
5 UHD sequences (from 950 kbit/s to 10 Mbit/s)
− SDR-B: 1920×1080
5 HD sequences (from 400 kbit/s to 3.8 Mbit/s)
− HDR (PQ HD, HLG 4K)
4 HD sequences, PQ curve (350 kbit/s to 3
Mbit/s)
3 UHD sequences, HLG curve (from 640 kbit/s
to 10 Mbit/s)
− 360 Video (8K, 6K)
1 sequence 6K x 3K (2 Mbit/s to 10 Mbit/s)
4 sequences 8K x 4K (400kbit/s to 7 Mbit/s)
VVC Call for Proposal test sequences
49
FoodMarket4 60p CatRobot1 60p DaylightRoad2 60p ParkRunning3 50p Campfire 30p
BasketballDrive 50p Cactus 50p BQTerrace 60p RitualDance 60p MarketPlace 60p
Market3 HD50p Hurdles HD50p Starting HD50p ShowGirls2 HD25p Cosmos1 HD24p
DayStreet 60p PeopleInShop... SunsetBeach 60p
ChairliftRide 30p KiteFlite 30p Harbor 30p Trolley 30p Balboa 60p
50. − New elements (some come with high complexity):
Decoder side estimation for mode/MV derivation
Finer partitioning: Asymmetric, geometric
Neural networks for prediction, loop filtering, upsampling, (encoder control)
Additional non-linear, de-noising and statistics-based loop filters
Additional linear and non-linear elements in prediction
Intra block copy (current picture referencing)
− HDR specific:
New adaptive reshaping and quantization, also in-loop
HDR-specific modifications of existing tools, e.g. deblocking
− 360-video specific:
Variants of projection formats, geometry-corrected face boundary padding
Modification and disabling of existing tools at face boundaries
What was proposed in CfP?
50
51. − QT/BT/TT (QT: Quadtree. BT: Binary tree. TT: Ternary tree)
− Remove unnecessary partitioning restrictions
− Implicit splitting at picture boundaries
− Separate trees for intra slices
− Position Dependent Prediction Combination in Intra Prediction (combines values predicted using non-filtered and filtered
(smoothed) reference samples, depending on the prediction mode, and block size)
− Cross Component Linear Model in Intra Prediction (Chroma component prediction )
− 87 intra modes (wide angles included), 3 most probable modes (MPM) , TUbinarization
− Affine MC (4x4 fixed subblock size, 4/6 parameter model switching at CUlevel)
− Affine MV coding
− list construction contains inheritance and derivation spatial/temporal
− improved difference coding
Summary of proposed elements (1)
51
52. − Adaptive motion vector resolution (AMVR) (A video encoder adaptively select a sub-pixel precision for MV)
− Local illumination compensation
− Subblock MC (4x4) from advanced temporal motion vector prediction (ATMVP) merge, 8x8 granularity motion
vector storage [High precision]
− Multiple transform selection (all are DCT/DST types) for intra and inter (Just DCT in HEVC), 4 different separable
transforms (DCT/DST)
− Adaptive Transform:
− Performed in addition to DCT-II and 4x4 DST-VII, which are employed in HEVC; The newly introduced
transform matrices are: DST-VII, DCT-VIII, DST-I and DCT-V.
− Increase max QP from 51 to 63
− An enhanced rate distortion optimized quantization scheme called Dependent Scalar Quantization.
− CABAC coder from AVC, it has been enhanced and is now even faster.
− Modified entropy coding supporting Dependent Scalar Quantization
Summary of proposed elements (2)
52
53. − Adaptive loop filter
− 4x4 classification based (gradient strength & orientation) for luma
− 7x7 luma, 5x5 chroma filters
− enabling flag at CTU level
− Basic high-level syntax
− (SPS (Sequence Parameter Set)
− PPS (Picture Parameter Set)
− Tiles/Slices
− Reference Picture Signaling
− Update of Benchmark Set (BMS) software contains
− Generalized Bi prediction (kind of local weighted prediction)
− Decoder-side estimation: Bi-directional Optical flow (BIO, simplified bilateral matching)
− Current picture referencing (aka intra block copy)
Summary of proposed elements (3)
53
54. − Root Size 128×128 (64×64 in HEVC)
− 1st Tree
• Quad Split
− 2nd Tree
• Binary Split
• Ternary Split
54
Quad/binary/ternary partitioning
Block partitioning
55. − 65 intra prediction direction (33 in HEVC)
− Rectangular block prediction (HEVC: Square)
− Larger blocks size128×128 (HEVC: 32×32)
− New prediction modes where you can do a Directional
Interpolation (Position Dependent intra Prediction Combination:
PDPC)
− Chroma component prediction (Cross-Component Linear
Model: CCLM)
− Luma and chroma blocks can have different block sizes using
a separate tree for the chroma components
0: Planar
1: DC
Intra-Picture Prediction
56. − Average rate reductions of 4-5% have been reported by Neural networks
− Mostly fully connected networks (FCN) have been used for this purpose (no convolutional layers)
Intra-Picture Prediction by neural networks (NN)
Reconstructed Samples
K
K
Reconstructed Samples
K
K
Neural
Network
Weights
biases
Mode
56
M
N
Predicted
Samples
57. − Based rate-distortion optimization, the encoder locally signals whether motion derivation is used or not.
− The motion information is derived at the decoder side instead of explicitly extraction from coded bit stream.
− In order to minimum the search complexity at the decoder, only 8 positions are searched in each reference around the block.
I. Performing normal bi-prediction
− Search through the reference pictures that we already have for motion compensated block (Refereed by initial MV0 and MV1).
II. Performing normal bi-prediction again
− Using the updated motion information, a better prediction and motion vector of the current block is obtained (Refereed by the
updated MV0’ and MV1’).
Past Reference (List 0) Future Reference (List 1)
Current Block
Decoder side search for motion information or prediction information
Normal bi-prediction
Copying the current block
MV0
MV1
MV1’
MV0’
57
58. Current picture
Reference picture
(motion source picture)
Inter-Picture Prediction:
− ATMVP (Alternative Temporal Motion Vector Prediction)
Prediction information (motion information): The motion vector and related picture in the reference buffer to use
Motion information prediction : To use neighboring motion vector of current block to obtain its prediction information.
ATMVP: Prediction the sub-CUs motion vectors within a CU.
N×N Sub-CU
Corresponding
N×N Block
MV1
MV1
MV1
MV1
MV0
MV0
MV0
MV0
Alternative Temporal Motion Vector
Prediction (by TMVP modification)
58
59. Current picture
Reference picture
(motion source picture)
Inter-Picture Prediction:
− Spatial-Temporal Motion Vector Prediction (STMVP)
To combine temporal predication and spatial predication (Predictions from the neighborhood) (Inter/Intra)
In STMVP method motion vectors of sub-CUs are derived recursively by using the temporal motion vector predictor and
spatial neighbouring motion vector.
The averaged motion vector is assigned as the motion vector of the current sub-CU.
Temporal
Neighbor
Temporal
Neighbor
TMV
SMV
SMV
STMVP
59
60. Part of left CU
Part of Current CU1
Part of upper CU
− Overlap Block Motion Compensation (OBMC)
− MV is most reliable in the center of the block.
− To process CU boundaries in a uniform fashion, OBMC is
performed at sub-block level for all MC block
boundaries.
− OBMC overlaps the prediction from multiple nearby MVs
and blend them to avoid sharp edges which typically
occur in inter prediction (blocking artifact).
− When a CU is coded with sub-CU mode, each sub-block
of the CU is a MC block.
− OBMC can be switched on and off at the CU level.
Inter-Picture Prediction:
Current MC block
(sub-block 4×4) in PU
𝑴𝑽 𝟎
𝑴𝑽 𝒖
𝑴𝑽𝒍
𝑴𝑽 𝒏𝒆𝒘
60
62. − The in natural images edges are usually not rectangular, therefore, encoders tend to select very small blocks around
edges in order to efficiently predict them.
− With more flexible prediction shapes like diagonal splits, the encoder could use bigger blocks at edges which would be
much more efficient.
− Non-rectangular Portions
− 2 diagonal options
− More flexible signaling
Past Reference (List 0) Future Reference (List 1)
Current Block
Non rectangular partitioning
MV1
MV0
62
63. − Classical motion compensation
− 2 dimensional rectangular regions (x,y)
− Affine motion:
− Scaling, rotation, shape changes and shearing
− The full model now has 6 degrees of freedom (DOF).
− The motion information per 4×4 block is calculated using the affine motion model.
− Affine Transformation
2D Transformation (2 DOF)
+ Rotate and Scale (4 DOF)
+ Aspect Ration and Shear (6 DOF)
2 or 3 control point motion vector
Affine Motion Vector
Translation scaling rotation shape changes shearing63
64. Geometric (GEO) partitioning
64
− Motivation: Towards object-oriented coding
Follow object boundaries more closely
Less coding artifacts
GEO available for all block sizes ≥ 8×8 luma samples
− Prediction, transform and coding driven by actual object
shape under RD-constraint (Rate-Distortion constraint)
Inter and intra predicted segments for handling of
disocclusions
Overlapped wedge based filtering at partition boundary
Shape-adaptive DCT for spatially localized transform
coding
65. − Partitioning is represented by two coordinate points 𝑃0 and 𝑃1 on the block boundary
− Prediction of two coordinate points 𝑃0 and 𝑃1 from 16 pre-defined templates (scaled for non-square blocks)
Alternative: Spatial or temporal prediction
Refinement: Block size dependent offset
− Integration with AMVP (Advanced motion vector prediction), MERGE, FRUC (Frame rate up conversion )
GEO: Partitioning coding and prediction
65
66. Results for GEO
66
− Visual improvements at object boundaries
Sharper contours
Less staircase-effect
More background details
JEM 7.0
67. Results for GEO
67
− Visual improvements at object boundaries
Sharper contours
Less staircase-effect
More background details
− Objective gains (BD-rate savings)
Against HEVC: 25% ~33%
JEM 7.0 + GEO
68. − NN and intra coding:
− All methods were developed for still image coding could be used for intra coding.
− NN and inter coding:
− Motion compensation is a very effective tool, and can hardly be trained into a network (or would be tremendously more
complex than conventional motion estimation)
− Some work on using CNN for
Sub-pel interpolation
Resolution up-conversion
Post-processing
Texture synthesis and inpainting
loop filtering
Intra coding
Encoder optimization, in particular partitioning which is basically a segmentation problem
− It is also not as simple to train for perceptual criteria in video
NN for video Coding
68
69. − Loop filtering
Removes compression artifacts from reconstruction
Improves prediction from reconstructed frames
− Generally, signal-adaptive and non-linear filters
De-blocking, de-ringing, de-banding
Edge-adaptive & Wiener optimized
Bi-lateral filters
...
− CNN reconstruction
Additional gain (3-5% rate) and might replace some
conventional filters
Can be operated on block basis, parallel processing possible
CNN for loop filtering
69The decoded frame of CNNF
The original frame
70. − CNN-based could generate super-resolution upsampling, sharper edges, etc.
− Basic idea of dynamic resolution coding:
Downsample and coding by lower resolution (less bitrate cost)
Key pictures coded with full resolution
Non-key pictures coded with reduced resolution
Upsample at decoder side to full resolution
Encoder decides using full Res, conventional or CNN-based downsampling and upsampling
− Can be implemented in combination with intra and inter prediction coding
− Operated on block by block basis
− Significant bit rate saving (20-30% average) whereas subjective quality is preserved compared to full-resolution coding
CNN for Variable-resolution coding (dynamic texture content)
70Key pictures
71. − New omnidirectional cameras allow acquiring panoramic video (by mosaic stitching)
− Appropriate rendering to a head mounted display allows adapting the viewpoint according to head movements in real-time
− With appropriate projection, the panorama can be packed into a 2D frame
VR / 360° video
71
72. − New omnidirectional cameras allow acquiring panoramic video (by mosaic stitching)
− Appropriate rendering to a head mounted display allows adapting the viewpoint according to head movements in real-time
− With appropriate projection, the panorama can be packed into a 2D frame
VR / 360° video
72
73. − Cubemap projection with 3x2 packing (as example)
− 6 Faces can be treated as rectangular video
− Equirectangular projection
− The whole sphere is projected into a
rectangular picture
− Extreme geometric distortions, in particular
at the poles
− Non-uniform sampling inherent
− Cubemap seems to provide better performance than
equitrecntangular
VR / 360° video: projection formats
73
74. − Stitching requires registration
− Identification of matching key points, geometric warping of pictures
− Optimum stitching path can be based on
− Minimum sample difference
− Depth cues (for appropriate occlusion handling)
− To mask artifacts
− Some blending/filtering/hole filling may be necessary
− In video: avoid temporal variation of stitching path
VR / 360° video: panorama stitching
74
75. − 360° video can be supported by signalling mechanisms of SEI messages (and equivalently in MPEG's Ominidirectional
Application Format, OMAF)
− Equirectangular & cubemap projections
− Sphere rotation
− Region-wise packing
− Omnidirectional viewport
− Motion constrained tile sets may be used to disallow MC across face boundaries when these are defined as tiles
− Discontinuities in projection (e.g. boundaries between cube faces) can cause problems – may become visible in particular
when compression artifacts are there
− This can be resolved by extending the faces, padding pixels from spherically neighbored positions with
geometry correction, and performing blending in rendering
− This however increases the number of coded samples (pixels close to boundaries are duplicated) and therefore
increases the data rate
360° video – current standardization status
75
76. − According to CfP results, projection formats from the family of cubemaps show best compression performance
− They however suffer from visibility of face boundaries, which grow larger with decreased compression quality
Two problems and proposed solutions:
− Packed/projected neighbors which are no physical/spherical neighbors:
− Solution: disable coding tools over face boundaries, such as prediction, filtering, ...
− Physical/spherical neighbors which are no packed/projected neighbors:
− Solution: connect samples from disparate positions in the frame for better prediction, perfoming filtering, ...
360° video specific coding tools
76
6 reference pictures
One per cube face
77. − Special characteristics of 360 content
360° symmetry not exploited by current codecs
Motion across face boundaries possible
Geometric distortions
Motion compensation suboptimal
Not correctly treated by loop filters
− Two proposals:
I) Face extension for motion estimation and
compensation
II) Loop filtering over continuous boundaries
according to 3D arrangement
360° video coding tools
77
Motion across face boundaries
Motion across face boundaries
Motion across
face boundaries
78. − Compression of 360° video is dependent on projection, which with translational block-wise motion compensation can cause
geometrical errors
− Particular problems at boundaries with formats using several planar-projected faces (e.g. cube)
− Specific MC tools should be designed
− Face extension
− Cube padding (Padding of samples ) allows the receptive field of each face to extend across the adjacent faces.
360° video coding tool: face extension
78
79. 360° video coding tools – First proposal Face extension
Main Face
− Compression of 360° video is dependent on projection
− Traditional block-wise motion compensation can cause geometrical errors (Particular problems at boundaries of faces)
Solution
− Cube padding (Padding of samples ) allows the receptive field of each face to extend across the adjacent faces.
− Face extension
79
80. 360° video coding tool: face extension
80
All six original faces of a frame of the “bicyclist” sequence.
The additional reference picture for face 1 of a
frame of the “bicyclist” sequence.
The additional reference picture for face 2
of a frame of the “bicyclist” sequence.
81. 360° video coding tool: face extension
81
The additional reference picture for face 3
of a frame of the “bicyclist” sequence.
The additional reference
picture for face 4 of a frame
of the “bicyclist” sequence.
All six original faces of a frame of the “bicyclist” sequence.
82. 360° video coding tool: face extension
82
The additional reference picture
for face 5 of a frame of the
“bicyclist” sequence.
The additional reference picture
for face 6 of a frame of the
“bicyclist” sequence.
All six original faces of a frame of the “bicyclist” sequence.
83. − Reference samples of blocks at face boundaries
changed.
Solution
Samples are chosen according to 3D cube
geometry not just from top or left.
360° video coding tools – Second proposal Corrected deblocking filter
83
84. − Objective gains (Bjontegaard delta (BD) rate savings)
Against HEVC anchor: ~31% E2E WS-PSNR (End to End WS-PSNR)
Gains higher for sequences with high motion
JEM deblocking VVC deblocking 84
360° video coding tools – Second proposal Corrected deblocking filter