HTTP Adaptive Streaming(HAS) is the most common approach for delivering video content over the Internet. Therequirement to encode the same content at different quality levels(i.e., representations) in HAS is a challenging problem for content providers. Fast multirate encoding approaches try to accelerate this process by reusing information from previously encoded representations. In this paper, we propose to use convolutional neural networks (CNNs) to speed up the encoding of multiple representations with a specific focus on parallel encoding. In parallel encoding, the overall time-complexity is limited to the maximum time-complexity of one of the representations that are encoded in parallel. Therefore, instead of reducing the time-complexity for all representations, the highest time-complexities are reduced. Experimental results show that FaME-ML achieves significant time-complexity savings in parallel encoding scenarios(41%in average) with a slight increase in bitrate and quality degradation compared to the HEVC reference software.
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
FaME-ML: Fast Multirate Encoding for HTTP Adaptive Streaming Using Machine Learning
1. FaME-ML: Fast Multirate Encoding for HTTP
Adaptive Streaming Using Machine Learning
December 2nd
, 2020
IEEE VCIP
1
Ekrem Çetinkaya, Hadi Amirpour, Christian Timmerer, and Mohammad Ghanbari
7. Video Streaming
Share in the Internet Traffic
82%
Content Characteristics
1 Million
minutes
Video Streamed Every Second
4
8. Video Streaming
Share in the Internet Traffic
82%
Content Characteristics
1 Million
minutes
Video Streamed Every Second
As of 2021
* Cisco VNI Forecast Highlights (2021)
4
36. CTU Search Window Bound
QP 22 QP 38QP 30
● Finding = CTUs tend to get higher depth levels as the quality goes up
Upper1 Lower2
8
1 Schroeder, Damien, et al. "Efficient multi-rate video encoding
for HEVC-based adaptive HTTP streaming." IEEE Transactions
on Circuits and systems for Video Technology 28.1 (2016): 143-
157.
2 H. Amirpour, E. Çetinkaya, C. Timmerer and M. Ghanbari,
"Fast Multi-rate Encoding for Adaptive HTTP Streaming," 2020
Data Compression Conference (DCC), Snowbird, UT, USA, 2020,
pp. 358-358
37. CTU Search Window Bound
QP 22 QP 38QP 30
Depth = [0 1 2 3]
● Finding = CTUs tend to get higher depth levels as the quality goes up
Upper1 Lower2
8
1 Schroeder, Damien, et al. "Efficient multi-rate video encoding
for HEVC-based adaptive HTTP streaming." IEEE Transactions
on Circuits and systems for Video Technology 28.1 (2016): 143-
157.
2 H. Amirpour, E. Çetinkaya, C. Timmerer and M. Ghanbari,
"Fast Multi-rate Encoding for Adaptive HTTP Streaming," 2020
Data Compression Conference (DCC), Snowbird, UT, USA, 2020,
pp. 358-358
38. CTU Search Window Bound
QP 22 QP 38QP 30
2
Depth = [0 1 2 3]
● Finding = CTUs tend to get higher depth levels as the quality goes up
Upper1 Lower2
8
1 Schroeder, Damien, et al. "Efficient multi-rate video encoding
for HEVC-based adaptive HTTP streaming." IEEE Transactions
on Circuits and systems for Video Technology 28.1 (2016): 143-
157.
2 H. Amirpour, E. Çetinkaya, C. Timmerer and M. Ghanbari,
"Fast Multi-rate Encoding for Adaptive HTTP Streaming," 2020
Data Compression Conference (DCC), Snowbird, UT, USA, 2020,
pp. 358-358
39. CTU Search Window Bound
QP 22 QP 38QP 30
2
3
Depth = [0 1 2 3]
● Finding = CTUs tend to get higher depth levels as the quality goes up
Upper1 Lower2
8
1 Schroeder, Damien, et al. "Efficient multi-rate video encoding
for HEVC-based adaptive HTTP streaming." IEEE Transactions
on Circuits and systems for Video Technology 28.1 (2016): 143-
157.
2 H. Amirpour, E. Çetinkaya, C. Timmerer and M. Ghanbari,
"Fast Multi-rate Encoding for Adaptive HTTP Streaming," 2020
Data Compression Conference (DCC), Snowbird, UT, USA, 2020,
pp. 358-358
40. CTU Search Window Bound
QP 22 QP 38QP 30
2
3
Depth = [0 1 2 3]
Depth = [0 1 2 3]
● Finding = CTUs tend to get higher depth levels as the quality goes up
Upper1 Lower2
8
1 Schroeder, Damien, et al. "Efficient multi-rate video encoding
for HEVC-based adaptive HTTP streaming." IEEE Transactions
on Circuits and systems for Video Technology 28.1 (2016): 143-
157.
2 H. Amirpour, E. Çetinkaya, C. Timmerer and M. Ghanbari,
"Fast Multi-rate Encoding for Adaptive HTTP Streaming," 2020
Data Compression Conference (DCC), Snowbird, UT, USA, 2020,
pp. 358-358
41. CTU Search Window Bound
QP 22 QP 38
1
QP 30
2
3
Depth = [0 1 2 3]
Depth = [0 1 2 3]
● Finding = CTUs tend to get higher depth levels as the quality goes up
Upper1 Lower2
8
1 Schroeder, Damien, et al. "Efficient multi-rate video encoding
for HEVC-based adaptive HTTP streaming." IEEE Transactions
on Circuits and systems for Video Technology 28.1 (2016): 143-
157.
2 H. Amirpour, E. Çetinkaya, C. Timmerer and M. Ghanbari,
"Fast Multi-rate Encoding for Adaptive HTTP Streaming," 2020
Data Compression Conference (DCC), Snowbird, UT, USA, 2020,
pp. 358-358
42. CTU Search Window Bound
QP 22 QP 38
1
QP 30
2
3
2
Depth = [0 1 2 3]
Depth = [0 1 2 3]
● Finding = CTUs tend to get higher depth levels as the quality goes up
Upper1 Lower2
8
1 Schroeder, Damien, et al. "Efficient multi-rate video encoding
for HEVC-based adaptive HTTP streaming." IEEE Transactions
on Circuits and systems for Video Technology 28.1 (2016): 143-
157.
2 H. Amirpour, E. Çetinkaya, C. Timmerer and M. Ghanbari,
"Fast Multi-rate Encoding for Adaptive HTTP Streaming," 2020
Data Compression Conference (DCC), Snowbird, UT, USA, 2020,
pp. 358-358
43. Problem & Solution
● Existing methods typically use the highest
quality representation as the reference
● Cannot reduce the parallel encoding time
● The highest quality representation is the
bottleneck
● Use the lowest quality representation as the
reference
● Utilize machine learning for better
performance
● Focus on parallel encoding time
○ Reduce the encoding-time of the highest
complexity representations
○ Eliminate the encoding-time bottleneck
9
Normalized time-complexity of different quality
representations using three different encoding
methods
1 Schroeder, Damien, et al. "Efficient multi-rate video encoding
for HEVC-based adaptive HTTP streaming." IEEE Transactions
on Circuits and systems for Video Technology 28.1 (2016): 143-
157.
2 H. Amirpour, E. Çetinkaya, C. Timmerer and M. Ghanbari,
"Fast Multi-rate Encoding for Adaptive HTTP Streaming," 2020
Data Compression Conference (DCC), Snowbird, UT, USA, 2020,
pp. 358-358
1 2
46. Features
● RD Cost = Number of bits to encode the given CU and four sub-CUs (FRD)
11
FRD
5
47. Features
● RD Cost = Number of bits to encode the given CU and four sub-CUs (FRD)
11
FRD FV
5 5
● Variance of pixels = Inside the CU (FV)
48. Features
● RD Cost = Number of bits to encode the given CU and four sub-CUs (FRD)
11
FRD FV FMV
5 5 1
● Variance of pixels = Inside the CU (FV)
● Motion vectors = Average magnitude of MVs inside the CU (FMV)
49. Features
● RD Cost = Number of bits to encode the given CU and four sub-CUs (FRD)
11
FRD FV FMV FD
5 5 1 1
● Variance of pixels = Inside the CU (FV)
● Motion vectors = Average magnitude of MVs inside the CU (FMV)
● Depth level = CU split decision for given depth level (FD)
50. Features
● RD Cost = Number of bits to encode the given CU and four sub-CUs (FRD)
11
FRD FV FMV FD FQP
5 5 1 1 1
● Variance of pixels = Inside the CU (FV)
● Motion vectors = Average magnitude of MVs inside the CU (FMV)
● Depth level = CU split decision for given depth level (FD)
● Frame level QP = QP value for the given frame (FQP)
51. Features
● RD Cost = Number of bits to encode the given CU and four sub-CUs (FRD)
11
FRD FV FMV FD FQP FPU
5 5 1 1 1 1
● Variance of pixels = Inside the CU (FV)
● Motion vectors = Average magnitude of MVs inside the CU (FMV)
● Depth level = CU split decision for given depth level (FD)
● Frame level QP = QP value for the given frame (FQP)
● PU decision = PU split decision for the given CU (FPU)
52. Features
● RD Cost = Number of bits to encode the given CU and four sub-CUs (FRD)
11
FRD FV FMV FD FQP FPU
5 5 1 1 1 1
F =
● Variance of pixels = Inside the CU (FV)
● Motion vectors = Average magnitude of MVs inside the CU (FMV)
● Depth level = CU split decision for given depth level (FD)
● Frame level QP = QP value for the given frame (FQP)
● PU decision = PU split decision for the given CU (FPU)
14
53. Training Dataset
● 12 Test sequences defined in HEVC CTC3
● YUV information are extracted for each CU
○ 64x64 size for D0 and 32x32 size for D1
● Sequences are encoded with HEVC reference software (HM 16.21)4
○ Encoding information are extracted and saved for QP38
○ 64x64 size for D0 and 32x32 size for D1
○ Features are individually min-max normalized in video level
○ Depth values are saved as targets for remaining QPs
● 90 % of frames for training set (259,200 CTUs)
● 10 % of frames for validation set (28,800 CTUs)
12
3 F. Bossen et al., “Common test conditions and software
reference configurations,” JCTVC-L1100, vol. 12, p. 7, 2013
4 https://vcgit.hhi.fraunhofer.de/jct-vc/HM
54. Convolutional Neural Network (CNN)
13
Y,U,V input sizes are halved and red part is dismissed in the Depth 1 classifier.
55. Overall Method
● Encode the lowest quality representation
with HEVC reference software
○ Save the encoding information
● Pass YUV information to texture
processing CNN and get an intermediate
decisions
● Combine the intermediate decision with
feature vector and pass through a fully
connected layer to get the final decision
● Apply CNN for bottleneck quality levels in
parallel encoding scenario
○ Depth 0 and Depth 1 for QP22
○ Depth 0 for QP26
56. Overall Method
● Encode the lowest quality representation
with HEVC reference software
○ Save the encoding information
● Pass YUV information to texture
processing CNN and get an intermediate
decisions
● Combine the intermediate decision with
feature vector and pass through a fully
connected layer to get the final decision
● Apply CNN for bottleneck quality levels in
parallel encoding scenario
○ Depth 0 and Depth 1 for QP22
○ Depth 0 for QP26
57. Overall Method
QP38
HEVC
● Encode the lowest quality representation
with HEVC reference software
○ Save the encoding information
● Pass YUV information to texture
processing CNN and get an intermediate
decisions
● Combine the intermediate decision with
feature vector and pass through a fully
connected layer to get the final decision
● Apply CNN for bottleneck quality levels in
parallel encoding scenario
○ Depth 0 and Depth 1 for QP22
○ Depth 0 for QP26
58. Overall Method
QP38
QP34QP30
HEVC
HEVCHEVC
● Encode the lowest quality representation
with HEVC reference software
○ Save the encoding information
● Pass YUV information to texture
processing CNN and get an intermediate
decisions
● Combine the intermediate decision with
feature vector and pass through a fully
connected layer to get the final decision
● Apply CNN for bottleneck quality levels in
parallel encoding scenario
○ Depth 0 and Depth 1 for QP22
○ Depth 0 for QP26
59. Overall Method
QP38
CNN
QP34QP22 QP26 QP30
HEVC
HEVCHEVC
CNN
HEVC HEVC
● Encode the lowest quality representation
with HEVC reference software
○ Save the encoding information
● Pass YUV information to texture
processing CNN and get an intermediate
decisions
● Combine the intermediate decision with
feature vector and pass through a fully
connected layer to get the final decision
● Apply CNN for bottleneck quality levels in
parallel encoding scenario
○ Depth 0 and Depth 1 for QP22
○ Depth 0 for QP26
61. Experiment Settings
● 8 Test sequences from SVT 5 and JVET 6 datasets
● Five QP levels [38, 34, 30, 26, 22]
● Low-Delay P configuration
● Bjontegaard Delta 7 Rate with PSNR and VMAF 8 are calculated
● Encoding performance is compared with HEVC reference software (HM 16.21)4 and the
lower bound approach 2
○ Lower bound = Minimum CTU depth search value is limited by the lowest quality
representation
16
5 L. Haglund, “The SVT high definition multi format test
set,”SwedishTelevision Stockholm, 2006
6 K. Suehring and X. Li, “JVET common test conditions and
software reference configurations,”JVET-B1010, 2016
7 G. Bjontegaard, “Calculation of average PSNR differences
between RD-curves,”VCEG-M33, 2001.
8 Z.Li,A.Aaron,I.Katsavounidis,A.Moorthy, and
M.Manohara,“Towards practical perceptual video quality
metric,”[Online]https://netflixtechblog.com/toward-a-
practical-perceptual-video-quality-metric-653f208b9652,2016
62. Encoding Results
● Compared with the HM
● Calculated over five QP levels
● ΔT is the difference between the maximum time complexity of each method
● BDRP and BDRV are the Bjontegaard Delta rates with PSNR and VMAF respectively
● 41 % time saving for parallel encoding (difference between the highest time complexity
representations)
17
65. Conclusion
● Machine learning based approach for fast multi-rate encoding
○ Focus on the parallel encoding performance
● The lowest quality representation is used as the reference
● CNN is used for CU split decision for a given depth level
● Method is applied on the highest two complexity representations
○ Bottleneck encoding times are reduced with minimal quality increase
● 41 % time saving for parallel encoding with 0.88 % bitrate increase in average
20