Presentation MM-4094, AMD Video Compression Engine: The Route towards Low-Latency Cloud Gaming Solutions, by Khaled Mammou and Ihab Amer at the AMD Developer Summit (APU13) November 11-13, 2013.
4. A LITTLE BIT OF HISTORY!
Perf/W
CPU
CPU
HW
IP
HWaccelerated
Video Coding
GPU
GPUaccelerated
Video Coding
CPU
CPU-based
Video Coding
2008/9
4 | AMD VCE FOR LOW-LATENCY CLOUD GAMING | NOVEMBER 19, 2013
2011/12
Year
5. CPU VS. GPU VS. FIXED-FUNCTION-HW VIDEO COMPRESSION
CPU
Pros
‒ No extra dollars
‒ Higher achievable qualities at target
bitrates (less architectural limitations)
‒ High Flexibility & Short Lead Time
‒ Can be optimized by advanced IS (e.g.
MMX)
Cons
‒ Limited Operations per WATT
‒ Monopolized CPU
Examples
‒ Handbrake/x264
5 | AMD VCE FOR LOW-LATENCY CLOUD GAMING | NOVEMBER 19, 2013
Pros
CPU+GPU
‒ No extra dollars
‒ Part of CPU available for other tasks
‒ SW-based – Still Flexible & relatively
Short Lead Time
Cons
‒ Limited Operations per WATT
‒ Major design and code changes to
leverage parallelism
‒ Massive parallelism impacts
quality/bitrate
Examples
‒ GPU-accelerated MainConcept Enc.
‒ GPU-accelerated x264
Pros
Fixed-Function HW
‒ Fast!
‒ Power Efficient!
‒ Most of CPU available for other tasks
Cons
‒ Additional Area Cost
‒ Least Flexible (hard coded)
‒ Long lead time
Examples
‒ Applications in the market that
support:
‒ AMD VCE
‒ Intel Quick Sync
‒ NVIDIA NVEnc
7. AMD VIDEO CODING ENGINE (VCE)
VCE is AMD’s dedicated fixed-function
video coding engine for improved
video encoding performance
7 | AMD VCE FOR LOW-LATENCY CLOUD GAMING | NOVEMBER 19, 2013
8. VCE TARGET PLATFORMS
AMD APUs
AMD Discrete GPUs
Server
Yes
Yes
Desktop
Yes
Yes
All-in-one
Yes
Yes
Premium Notebook
Yes
Yes
Value Notebook
Yes
Possible
Tablet
Yes
N/A
8 | AMD VCE FOR LOW-LATENCY CLOUD GAMING | NOVEMBER 19, 2013
9. VCE MAIN USE CASES
(*) Courtesy of Cyberlink, Inc.
VCE
(**) Courtesy of CiiNow, Inc.
9 | AMD VCE FOR LOW-LATENCY CLOUD GAMING | NOVEMBER 19, 2013
10. VCE CAPABILITIES
Up to 3x 1080p@~30fps per instance
Low-power budget
Multi-streaming support
Configurable speed/quality tradeoff
Flexible/programmable to meet various use-cases
10 | AMD VCE FOR LOW-LATENCY CLOUD GAMING | NOVEMBER 19, 2013
12. VCE CREW – OUR EVERYDAY STORY!
“I don’t believe in perfection. I don’t think there is
such a thing. But the energy of wanting things to be
great is a perfectionist energy!”
Reese Witherspoon
Hollywood Actress & Academy Award Winner
“Have no fear of perfection – you’ll never reach it!”
Salvador Dalí
Spanish Painter/Artist
12 | AMD VCE FOR LOW-LATENCY CLOUD GAMING | NOVEMBER 19, 2013
14. LATENCY IS KEY!
WHAT IS LATENCY?
Latency is the elapsed time between the user’s input and his/her perception of the corresponding game reaction
−
< 100–150 ms
Game server
Game Engine
User
Display
Decoded
Picture
Buffer
Decoder
Code
Picture
Buffer
Network
Encoder
Buffer
Encoder
Rendering
Game client
Graphics Commands
Rendered Frame
Compressed Stream
Decoded Frame
NAL Units
Network Packets
14 | AMD VCE FOR LOW-LATENCY CLOUD GAMING | NOVEMBER 19, 2013
15. LATENCY IS KEY!
WHAT IS ENCODING LATENCY?
Encoding latency is the elapsed time between a frame is rendered on the server and it is decoded on the client
Game server
Game Engine
User
Display
Network
Decoded
Picture
Buffer
Decoder
Code
Picture
Buffer
Encoder
Buffer
Encoder
Rendering
Game client
Rendered Frame
Decoded Frame
15 | AMD VCE FOR LOW-LATENCY CLOUD GAMING | NOVEMBER 19, 2013
Compressed Stream
NAL Units
Network Packets
16. ENCODE SPEED VS. QUALITY
HOW FAST CAN VCE ENCODE?
VCE Quality Presets
1080p
720p
480p
Speed
95 fps
215 fps
535
Balanced
80 fps
180 fps
470
Quality
40 fps
90 fps
250
16 | AMD VCE FOR LOW-LATENCY CLOUD GAMING | NOVEMBER 19, 2013
17. BUFFERING DELAY VS. QUALITY
WHY BUFFERING?
Constant Quantization Parameter (CQP)
Instantaneous bitrate (mbits/s)
70
60
50
40
CQP
30
Avg. Bitrate (6Mbit/s)
20
10
0
0
10
20
30
40
50
60
Tranmission over a
Constant Bitrate (CBR)
channel of 6 Mbit/s ?
Frame number
Buffering
Transmission delay
(latency!)
17 | AMD VCE FOR LOW-LATENCY CLOUD GAMING | NOVEMBER 19, 2013
18. BUFFERING DELAYS VS. QUALITY
Input Rate
WHAT IS CBR?
Constant Bit Rate (CBR) Rate Control
Buffer
‒ Control transmission delays (i.e., # bits per picture) by adjusting QPs
‒ Leaky bucket model
Size (B)
‒ Defined by the triplet
‒ Avoid encoder buffer underflow and overflow (i.e., transmission bitrate = encoding bitrate)
Output Rate (R)
Encoder can predict the decoder
buffer fullness
B
B
F
b2
b0 b1
b1
Buffering latency is smaller than B/R
b2
b0
S0+De
S1
Fullness (F)
Decoder buffer
fullness
Encoder buffer
fullness
S0
Initial
S2
…
time
T0-Dd
18 | AMD VCE FOR LOW-LATENCY CLOUD GAMING | NOVEMBER 19, 2013
T0
T1
T2
…
time
19. LATENCY VS. QUALITY
WHAT IS VBR?
Variable Bit Rate (VBR) Rate Control
‒ Channel can stop transmission without loosing synchronization (e.g., packet based networks)
‒ Leaky bucket model
‒ Defined by the triplet
‒ Avoid only encoder buffer overflow (i.e., transmission bitrate may be higher than the encoding bitrate)
Allows shorter buffering delay
AMD Media SDK exposes two VBR modes:
Peak-Constrained VBR, and Latency-Constrained VBR
19 | AMD VCE FOR LOW-LATENCY CLOUD GAMING | NOVEMBER 19, 2013
20. LATENCY VS. QUALITY
HOW BUFFERING LATENCY IMPACTS QUALITY?
High buffering delay
‒
‒
‒
‒
‒
Peak-Constrained VBR
Target Bitrate = 6 Mbps
Peak Bitrate = 6 Mbps
VBVBufferSize = 6 Mbits (i.e., buffering latency of 1000 ms)
IDRPeriod = 60
Instantaneous bitrate (mbits/s)
70
60
50
40
VBR High Buffering Latency
30
CQP
20
Avg. Bitrate (6Mbit/s)
10
0
0
10
20
30
Frame number
20 | AMD VCE FOR LOW-LATENCY CLOUD GAMING | NOVEMBER 19, 2013
40
50
60
Reduce overshoots with
limited quality impact
22. LATENCY VS. QUALITY
HOW BUFFERING LATENCY IMPACTS QUALITY?
Low buffering delay
Frame 300 (IDR)
Poor IDR frames quality
Replace IDR frames with
partially intra-encoded frames
22 | AMD VCE FOR LOW-LATENCY CLOUD GAMING | NOVEMBER 19, 2013
23. ERROR RESILIENCY VS. QUALITY
WHAT IS INTRA-REFRESH?
Intra-refresh principle
IDR
23 | AMD VCE FOR LOW-LATENCY CLOUD GAMING | NOVEMBER 19, 2013
P
P
P
24. ERROR RESILIENCY VS. QUALITY
WHAT IS INTRA-REFRESH?
Intra-refresh principle
‒ Spread out Intra Units throughout successive pictures
‒ Constraint inter/intra-prediction to preserve error resiliency (i.e., Dirty/Clean Maps)
Restriction on
Search Region
I
D
D
D
C
Restriction on
Search Region
24 | AMD VCE FOR LOW-LATENCY CLOUD GAMING | NOVEMBER 19, 2013
I
D
D
C
C
I
D
Restriction on
Search Region
C
C
C
I
25. ERROR RESILIENCY VS. QUALITY
HOW INTRA-REFRESH IMPACTS QUALITY?
Intra-refresh with low buffering delay
‒
‒
‒
‒
‒
VBR Rate Control
Target Bitrate = 6 Mbps
Peak Bitrate = 6 Mbps
VBVBufferSize = 0.1Mbits (i.e., buffering latency of 16 ms)
IDRPeriod = 60
25 | AMD VCE FOR LOW-LATENCY CLOUD GAMING | NOVEMBER 19, 2013
26. ERROR RESILIENCY VS. QUALITY
HOW INTRA-REFRESH IMPACTS QUALITY?
Intra-refresh with low buffering delay
Frame 300 (IDR)
26 | AMD VCE FOR LOW-LATENCY CLOUD GAMING | NOVEMBER 19, 2013
27. ERROR RESILIENCY VS. QUALITY
HOW INTRA-REFRESH IMPACTS QUALITY?
Intra-refresh with low buffering delay
Frame 300 (Intra-Refreshed)
Improved IDR frames quality
27 | AMD VCE FOR LOW-LATENCY CLOUD GAMING | NOVEMBER 19, 2013
28. COMPARATIVE EVALUATION
HOW GOOD IS VCE?
VCE vs. Software Encoder
‒
‒
‒
‒
‒
‒
VBR Rate Control
Target Bitrate = 6 Mbps
Peak Bitrate = 6 Mbps
VBVBufferSize = 0.1Mbits (i.e., buffering latency of 16 ms)
IDRPeriod = 60
Intra-refresh
28 | AMD VCE FOR LOW-LATENCY CLOUD GAMING | NOVEMBER 19, 2013
Software Encoder
29. COMPARATIVE EVALUATION
HOW GOOD IS VCE?
VCE vs. Software Encoder
VCE
29 | AMD VCE FOR LOW-LATENCY CLOUD GAMING | NOVEMBER 19, 2013
Software Encoder
31. CONCLUSIONS
Fixed-Function HW acceleration is the prevalent technology for video compression
AMD MediaSDK and RapidFire allow application developers to configure VCE parameters
‒ Latency/Error Resiliency/Quality
‒ Number of encoded streams/Power/NW-efficiency
VCE offers an out-of-the-box solution for ultra-low latency cloud gaming
‒ Special rate control settings
‒ Intra-refresh support
Deployed in the solutions of various cloud-gaming partners
31 | AMD VCE FOR LOW-LATENCY CLOUD GAMING | NOVEMBER 19, 2013