FaME-ML: Fast Multirate Encoding for HTTP Adaptive Streaming Using Machine Learning

FaME-ML: Fast Multirate Encoding for HTTP
Adaptive Streaming Using Machine Learning
December 2nd
, 2020
IEEE VCIP
1
Ekrem Çetinkaya, Hadi Amirpour, Christian Timmerer, and Mohammad Ghanbari

● Introduction
● FaME-ML
● Experimental Results
● Conclusion
● Q & A
Agenda
2

Video Streaming
Share in the Internet Traffic
82%
4

Video Streaming
82%
Content Characteristics
4

Video Streaming
82%
1 Million
minutes
Video Streamed Every Second
4

Video Streaming
82%
1 Million
minutes
Video Streamed Every Second
As of 2021
* Cisco VNI Forecast Highlights (2021)
4

HTTP Adaptive Streaming (HAS)
5

Very Nice Video
Play
5

Very Nice Video
Play
Very Nice Video 3500Kbps
5

Very Nice Video
Play
Play
5

Very Nice Video
PlayPlay
5

Multi-rate Encoding
6
1500
kbps
2000
kbps
5000
kbps
3500
kbps
Source Video
HTTP Server
Encoding x4

Block Partitioning
PSNR
Bitrate
0
7

Block Partitioning
PSNR
Bitrate
0
1 1 1 1
7

Block Partitioning
PSNR
Bitrate
0
1 1 1 1
2 2 2 2
7

Block Partitioning
PSNR
Bitrate
0
1 1 1 1
2 2 2 2
3 3 3 3
7

CTU Search Window Bound
QP 22 QP 38QP 30
● Finding = CTUs tend to get higher depth levels as the quality goes up
Upper1 Lower2
8
1 Schroeder, Damien, et al. "Efficient multi-rate video encoding
for HEVC-based adaptive HTTP streaming." IEEE Transactions
on Circuits and systems for Video Technology 28.1 (2016): 143-
157.
2 H. Amirpour, E. Çetinkaya, C. Timmerer and M. Ghanbari,
"Fast Multi-rate Encoding for Adaptive HTTP Streaming," 2020
Data Compression Conference (DCC), Snowbird, UT, USA, 2020,
pp. 358-358

QP 22 QP 38QP 30
Depth = [0 1 2 3]
Upper1 Lower2
8
157.
pp. 358-358

QP 22 QP 38QP 30
2
Depth = [0 1 2 3]
Upper1 Lower2
8
157.
pp. 358-358

QP 22 QP 38QP 30
2
3
Depth = [0 1 2 3]
Upper1 Lower2
8
157.
pp. 358-358

QP 22 QP 38QP 30
2
3
Depth = [0 1 2 3]
Depth = [0 1 2 3]
Upper1 Lower2
8
157.
pp. 358-358

QP 22 QP 38
1
QP 30
2
3
Depth = [0 1 2 3]
Depth = [0 1 2 3]
Upper1 Lower2
8
157.
pp. 358-358

QP 22 QP 38
1
QP 30
2
3
2
Depth = [0 1 2 3]
Depth = [0 1 2 3]
Upper1 Lower2
8
157.
pp. 358-358

Problem & Solution
● Existing methods typically use the highest
quality representation as the reference
● Cannot reduce the parallel encoding time
● The highest quality representation is the
bottleneck
● Use the lowest quality representation as the
reference
● Utilize machine learning for better
performance
● Focus on parallel encoding time
○ Reduce the encoding-time of the highest
complexity representations
○ Eliminate the encoding-time bottleneck
9
Normalized time-complexity of different quality
representations using three different encoding
methods
157.
pp. 358-358
1 2

Features
● RD Cost = Number of bits to encode the given CU and four sub-CUs (FRD)
11
FRD
5

Features
11
FRD FV
5 5
● Variance of pixels = Inside the CU (FV)

Features
11
FRD FV FMV
5 5 1
● Motion vectors = Average magnitude of MVs inside the CU (FMV)

Features
11
FRD FV FMV FD
5 5 1 1
● Depth level = CU split decision for given depth level (FD)

Features
11
FRD FV FMV FD FQP
5 5 1 1 1
● Frame level QP = QP value for the given frame (FQP)

Features
11
FRD FV FMV FD FQP FPU
5 5 1 1 1 1
● PU decision = PU split decision for the given CU (FPU)

Features
11
FRD FV FMV FD FQP FPU
5 5 1 1 1 1
F =
● PU decision = PU split decision for the given CU (FPU)
14

Training Dataset
● 12 Test sequences defined in HEVC CTC3
● YUV information are extracted for each CU
○ 64x64 size for D0 and 32x32 size for D1
● Sequences are encoded with HEVC reference software (HM 16.21)4
○ Encoding information are extracted and saved for QP38
○ 64x64 size for D0 and 32x32 size for D1
○ Features are individually min-max normalized in video level
○ Depth values are saved as targets for remaining QPs
● 90 % of frames for training set (259,200 CTUs)
● 10 % of frames for validation set (28,800 CTUs)
12
3 F. Bossen et al., “Common test conditions and software
reference configurations,” JCTVC-L1100, vol. 12, p. 7, 2013
4 https://vcgit.hhi.fraunhofer.de/jct-vc/HM

Convolutional Neural Network (CNN)
13
Y,U,V input sizes are halved and red part is dismissed in the Depth 1 classifier.

Overall Method
● Encode the lowest quality representation
with HEVC reference software
○ Save the encoding information
● Pass YUV information to texture
processing CNN and get an intermediate
decisions
● Combine the intermediate decision with
feature vector and pass through a fully
connected layer to get the final decision
● Apply CNN for bottleneck quality levels in
parallel encoding scenario
○ Depth 0 and Depth 1 for QP22
○ Depth 0 for QP26

Overall Method
QP38
HEVC
decisions

Overall Method
QP38
QP34QP30
HEVC
HEVCHEVC
decisions

Overall Method
QP38
CNN
QP34QP22 QP26 QP30
HEVC
HEVCHEVC
CNN
HEVC HEVC
decisions

Experiment Settings
● 8 Test sequences from SVT 5 and JVET 6 datasets
● Five QP levels [38, 34, 30, 26, 22]
● Low-Delay P configuration
● Bjontegaard Delta 7 Rate with PSNR and VMAF 8 are calculated
● Encoding performance is compared with HEVC reference software (HM 16.21)4 and the
lower bound approach 2
○ Lower bound = Minimum CTU depth search value is limited by the lowest quality
representation
16
5 L. Haglund, “The SVT high definition multi format test
set,”SwedishTelevision Stockholm, 2006
6 K. Suehring and X. Li, “JVET common test conditions and
software reference configurations,”JVET-B1010, 2016
7 G. Bjontegaard, “Calculation of average PSNR differences
between RD-curves,”VCEG-M33, 2001.
8 Z.Li,A.Aaron,I.Katsavounidis,A.Moorthy, and
M.Manohara,“Towards practical perceptual video quality
metric,”[Online]https://netflixtechblog.com/toward-a-
practical-perceptual-video-quality-metric-653f208b9652,2016

Encoding Results
● Compared with the HM
● Calculated over five QP levels
● ΔT is the difference between the maximum time complexity of each method
● BDRP and BDRV are the Bjontegaard Delta rates with PSNR and VMAF respectively
● 41 % time saving for parallel encoding (difference between the highest time complexity
representations)
17

Encoding Time Graph
18
0.59
0.88
1.00

Conclusion
● Machine learning based approach for fast multi-rate encoding
○ Focus on the parallel encoding performance
● The lowest quality representation is used as the reference
● CNN is used for CU split decision for a given depth level
● Method is applied on the highest two complexity representations
○ Bottleneck encoding times are reduced with minimal quality increase
● 41 % time saving for parallel encoding with 0.88 % bitrate increase in average
20

Thank you
21
ekrem@itec.aau.at

FaME-ML: Fast Multirate Encoding for HTTP Adaptive Streaming Using Machine Learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to FaME-ML: Fast Multirate Encoding for HTTP Adaptive Streaming Using Machine Learning

Similar to FaME-ML: Fast Multirate Encoding for HTTP Adaptive Streaming Using Machine Learning (20)

More from Alpen-Adria-Universität

More from Alpen-Adria-Universität (20)

Recently uploaded

Recently uploaded (20)

FaME-ML: Fast Multirate Encoding for HTTP Adaptive Streaming Using Machine Learning