SlideShare a Scribd company logo
1 of 38
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 1 of 38
AMD Radeon™ RX 5700 Series
7nm Energy-Efficient
High-Performance GPUs
Sal Dasgupta1,Teja Singh2, Ashish Jain2, Samuel Naffziger3, Deepesh
John2, Chetan Bisht4, Pradeep Jayaraman1, Michael Mantor4
1AMD, Santa Clara, CA, 2AMD, Austin, TX, 3AMD, Fort Collins, CO, 4AMD, Orlando, FL
Presented at ISSCC 2020
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 2 of 38
Outline
• Overview of AMD Radeon™ RX 5700 Series
• AMD RDNA Architecture
• Power Management features
• GDDR6 (G6) PHY
• Physical design
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 3 of 38
AMD Radeon™ RX 5000 series
• GPUs are everywhere
• GPUs need to service a wide range of form factors and workloads
• Fundamental challenge is to get higher and higher performance at
lower and lower power
PC Gaming Content
Creation
Console
Gaming
Cloud
Gaming
Mobile
Devices
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 4 of 38
Improvements
• Up to 1.5x greater performance per watt than its predecessor
• Up to 1.25x performance per clock compared to previous 14nm processors
• Up to 1.23x higher max frequencies than its predecessor
• Up to 1.23x lower power consumption than its predecessor
Achieved through
• 7nm process
• Higher clocks
• Focused design for lower dynamic power
• Intelligent SOC design with power and performance at the forefront
• Improved power management
• All new AMD RDNA graphics architecture – higher performance for
the same cycles
AMD Radeon™ RX 5700 series
See endnote RX-327, RX-325 and RX-362
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 5 of 38
AMD Radeon™ RX 5700 Series
Overview
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 6 of 38
AMD Radeon™ RX 5700 XT
PCIE®
Gen4 x16 32 GB/s
Display
DP1.4
HDMI 4K 60fps
Multimedia
4K H264
Encode/Decode
H265/HEVC
Encode/Decode
GDDR6
256b
14 Gbps
See endnote GD-81
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 7 of 38
AMD Radeon™ RX 5700 XT Floorplan
Graphics
BUS
Interface
Display
G6 PHYG6 PHY
G6
Control
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 8 of 38
AMD
Radeon RX
5700 XT
AMD
Radeon RX
Vega 64
AMD
Radeon RX
580
Compute Units 40 64 36
Texture Units 160 256 144
ROPs 64 64 32
Memory Clocks 14 Gbps
GDDR6
1.89 Gbps
HBM2
8 Gbps
GDDR5
Memory Bus Width 256 bit 2048 bit 256 bit
Frame Buffer 8 GB 8 GB 8 GB
Boost Clock 1905 Mhz 1546 Mhz 1257 Mhz
Typical Board Power 225W 295W 185W
Transitor Count 10.3B 12.5B 5.7B
Manufacturing Process TSMC 7nm GF 14 nm GF 14 nm
Architecture RDNA GCN GCN
AMD Radeon™ RX 5700 XT Summary
0%
20%
40%
60%
80%
100% Additional Frequency and
Power Improvement
7nm process
Performance per
Clock
Enhancement
0
20
40
60
80
100
120
140
160 Delivered Performance
GCN RDNA
+50%
Same-Power,
Same-Configuration
Performance Gains
Performance Contributors
See endnote GD-151, RX-325
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 9 of 38
AMD Radeon™ RX 5700 Series
RDNA Graphics Architecture
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 10 of 38
40 RDNA Compute Units
• 80 Scalar Processors
• 2560 Stream Processors
• 160 64b Bilinear Filter units
Multilevel Cache
• 4MB L2, 512KB L1, (V$, I$, K$) L0
• 2x V$L0 Load Bandwidth
• DCC Everywhere
Streamlined Graphics Engine
• Geometry Engine (4 Prim shader out, 8 Prim
shader in)
• 64 Pixel Units
• 4 Asynchronous Compute Engines
Designed for higher frequencies at lower power
AMD Radeon™ RX 5700 XT
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 11 of 38
AMD Radeon™ RX 5700 XT Floorplan
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 12 of 38
AMD Radeon™ RX 5700 XT Functional Floorplan
InfinityFabric
PCIE
Gen4
Display
Engine
MultimediaEngine
Geometry
Processor
ShaderEngine
Command
Processor
HWS
DMA
64-bit Memory Controller 64-bit Memory Controller
L1
Prim
Unit
L1
Prim
Unit
L1
Prim
Unit
L1
Prim
Unit
Rasterizer
Rasterizer
ShaderEngine
L2 L2 L2 L2 L2 L2 L2 L2
Compute Units
L2 L2 L2 L2 L2 L2 L2 L2
64-bit Memory Controller 64-bit Memory Controller
RBs
ACE
Compute Units
RBsRBs
RBs
Compute Units
Compute Units
Rasterizer
Rasterizer
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 13 of 38
AMD Radeon™ RX 5700 XT Functional Floorplan
InfinityFabric
PCIE
Gen4
Display
Engine
MultimediaEngine
Geometry
Processor
ShaderEngine
Command
Processor
HWS
DMA
64-bit Memory Controller 64-bit Memory Controller
L1
Prim
Unit
L1
Prim
Unit
L1
Prim
Unit
L1
Prim
Unit
Rasterizer
Rasterizer
ShaderEngine
L2 L2 L2 L2 L2 L2 L2 L2
Compute Units
L2 L2 L2 L2 L2 L2 L2 L2
64-bit Memory Controller 64-bit Memory Controller
RBs
ACE
Compute Units
RBsRBs
RBs
Compute Units
Compute Units
Rasterizer
Rasterizer
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 14 of 38
RDNA – Work Group Processor
Up to 20 wave controllers
Improved instruction arbitration
10KB scalar register file
• 128 32b register per wavefront
128 KB Vector register file
~2X instruction rate vs GCN
• Dual SIMD32
Single cycle issue
• Wave32 on SIMD32
Bytes Per Flop
• 128B Load/Store
• 64B Filter Rate
Scalar
Registers
Redraw the graphic so its not a blatant copy of Hot Chips
RDNA
Workgroup
Processor
Scalar
Units
Vector ALUs (SIMD32)
Shader
Sequencers
Texture
Mapping
Units
Vector
Registers
Texture
L0
Cache
Scalar
Data
Cache
Local
Data
Share
Shader
Instruction
Cache
32 wide single and dual half ALU
• Full rate 32b FMA, Dual 16b FMA
8 wide transcendental ALU
• Single cycle issue
• Multi-cycle co-execution
SIMD Unit WGP
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 15 of 38
Schedulers
Local Data ShareScalar Data Cache
Shader Instruction Cache
Texture Mapping Units
Texture Filter Units
Stream Processors
Vector Registers
Scalar Registers
Scalar Units
Scheduler
Local
Data
Share
Texture
Filter
Units
L1 CacheVector ALU
Texture Fetch
Load/Store
Units
Scalar Registers Scalar
Unit
Vector
Registers Vector Units
Branch & Message
Unit
RDNA
Compute Unit
GCN
Compute Unit
RDNA
WGP
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 16 of 38
Cycle N Lanes 15 - 0
Cycle N+1 Lanes 31 – 16
Cycle N+2 Lanes 47 - 32
Cycle N+3 Lanes 63 - 48
WAVE64-4CYCLEISSUE
Operand gathering
4 cycle issue
VGPRVGPR VGPR VGPR
Operand gathering
4 cycle issue
Operand gathering
4 cycle issue
Operand gathering
4 cycle issue
SIMD0 SIMD 1 SIMD 2 SIMD 3
VGPRVGPR VGPR VGPR VGPRVGPR VGPR VGPRVGPRVGPR VGPR VGPR
S
Cycle0 SIMD0 Waves
Cycle1 SIMD1 Waves
Cycle2 SIMD2 Waves
Cycle3 SIMD3 Waves
SHARED SCALAR
Cycle0 SIMD0 Waves
Cycle1 SIMD1 Waves
Cycle2 SIMD2 Waves
Cycle3 SIMD3 Waves
All work-items of a wave64 have an opportunity to do work once every 4 clocks due to hardware interleaving
Special Function Unit alternate execution unit running at ¼ rate
A wave from a SIMD has an opportunity to accomplish a scalar instruction once every 4 clocks
GCN Instruction Issue
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 17 of 38
SIMD 0 SIMD 1S S
Operand gathering
1 cycle issue
VGPRVGPR VGPR VGPR VGPRVGPR VGPR VGPR
Operand gathering
1 cycle issue
SIMD0 Wave32 – every cycle issue
Vector Instruction Issue any cycle
Or SFU Issue once every 4 cycles
SIMD0 Wave32
Every cycle issue
SIMD1 Wave32 – every cycle issue
Vector Instruction Issue any cycle
Or SFU Issue once every 4 cycles
SIMD1 Wave32
Every cycle issue
Vector Units - All work-items of one wave32 have an opportunity to do work every clock
Special Function Unit uses 1 issue cycle and then executes in parallel
Each SIMD equipped with a scalar unit for an instruction execution every cycle
RDNA Instruction Issue
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 18 of 38
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
s_add_i32 s0, s1, s2
…
…
…
v_mul_f32 v0, v1, s0
… (simd busy 4 cycles)
…
…
v_add_f32 v5, v4, v3
…
…
…
v_sub_f32 v6, v7, v0
…
…
…
s_add_i32 s0, s1, s2
v_mul_f32 v0, v1, s0
v_add_f32 v5, v4, v3
v_sub_f32 v6, v7, v0
s_add_i32 s0, s1, s2
… (salu dependency stall on S0)
v_mul_f32 v0, v1, s0
v_add_f32 v5, v4, v3
… (valu dependency stall on V0)
…
…
v_sub_f32 v6, v7, v0
s_add_i32 s0, s1, s2
… (salu dependency stall on S0)
v_mul_f32 v0, v1, s0 (lo)
v_mul_f32 v0, v1, s0 (hi)
v_add_f32 v5, v4, v3 (lo)
v_add_f32 v5, v4, v3 (hi)
… (valu dependency stall on V0 lo)
v_sub_f32 v6, v7, v0 (lo)
v_sub_f32 v6, v7, v0 (hi)
SHORTEST
WAVE ISSUE
LATENCY
44%
REDUCTION IN
ISSUE CYCLES
RDNA Instruction Issue Example
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 19 of 38
PCIe® 4.0
Async Compute
L2
L1
Texture
Geometry
Rasterizer &
Render Backends
PCIe® 4.0
SOC Fabric
GDDR6
Command
Interfaces
Shader
Complex
RDNA Redesigned Cache Hierarchy
New L1 Cache Hierarchy
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 20 of 38
RDNA Cache Hierarchy
• Doubled the Load Bandwidth
from L0 to ALU
• Improved BW Amplification
• Reduced Latency and Power
• Reduced Congestion at L2 Level
• Reduced Data Movement
4x64B/C 16x32/C
32B/CLK
32B/CLK
128B/CLK
64B/CLK
128B/CLK
64B/CLK
2X
2X
Relative Cache Latency
-24%-21%
-7%
See RX-329 in Endnotes.
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 21 of 38
AMD Radeon™ RX 5700 Series
Power Management
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 22 of 38
AMD Radeon™ RX 5700 Series
Power Management
• Main functions
• Manage clocks/voltages to maximize performance during active workloads
• Draw minimum power during low activity conditions
• Challenges
• Highly variable GPU workloads
• Parallel work leads to high power/current demand
• Ever increasing needs for memory bandwidth
• Power features include
• AVFS to choose the most optimal per-part voltage
• DVFS to choose the best operating point given current environment
• Voltage droop mitigation and impact reduction
• Agile responses to the current draw demands of the moment
• Graceful throttling of the graphics core at thermal limits
• Aggressive clock and power gating
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 23 of 38
Power Management: Fine-grained DPM
Previous Generation: Coarse-grained DPM
• On previous architectures, operation was limited to
a few (typically 8) pre-defined frequency values
• Limited choice for Power Management Controller to
choose from
• Power/Thermal constrained effective frequency
determined by dithering between neighboring
coarse states
AMD RadeonTM RX 5700 Series: Fine-grained DPM
• Much finer-grained DPM state selection across the
V/F curve
• Improved perf/W efficiency by up to 5% as
compared to previous generation by staying on the
optimal curve
• More accurate frequency selection between what
the workload needs and what it gets
DPM7
DPM6
DPM5
DPM4
DPM3
DPM2
DPM1
DPM0
Coarse-
grained
DPM
Fine-
grained
DPM
Fmax
Fidle
Fmin
*Based on AMD internal data.
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 24 of 38
Power Management – Per Part Fmax
Previous generations
• Fmax determined by slowest part of
distribution
• Lower Cac workloads may leave power on
the table for a large population of parts
AMD RadeonTM RX 5700 Series
• Each individual part allowed to achieve
max potential (up to 15% higher) by
selecting its own Vmax-limited Fmax
based on the speed of the part
• Enables applications with lower Cac to
sustain higher clocks rather than be limited
to artificially low limits set by slowest parts
Based on AMD internal data
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 25 of 38
AMD Radeon™ RX 5700 Series
GDDR6 Interface
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 26 of 38
• 14Gbp/s 256b, 448 GB/s
• Up to 75% BW per pin
improvement over GDDR5
• Up to 60%
performance/Watt
GDDR6 Memory
Based on AMD internal data
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 27 of 38
G6 PHY
Read Data Strobe (RDQS) mode
to save power when high
memory bandwidth is not
required
T-coil provides bandwidth
enhancement and improved
return loss enabling the high data
rates on a single-ended interface
• up to 16% in height
• up to 26% in width
40.2
0.300
50.8
0.349
Based on AMD internal data
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 28 of 38
AMD Radeon™ RX 5700 Series
Physical Design
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 29 of 38
AMD Radeon™ RX 5700 Physical Design
Physical Design challenges
• New architecture
• Frequency uplift beyond natural process uplift
• Large design (10.3B transistors) with wide busses and complex crossbar structures
• Decreasing dynamic switching capacitance while providing that frequency uplift
• Operational logistics of managing such a large design
Approach
• High-performance clock distribution
• Intelligent SRAM generation
• Automated place and route while offering hooks for customization
• Exploiting the benefits of the technology while compensating for the new challenges it brings
• Careful use of mixed VTH cells to close timing gaps while maintaining power requirements
• Balancing resource constraints and the desire for physical reuse against area and performance targets
• Power aware floorplanning and bus planning working in conjunction with logic design teams
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 30 of 38
Global Clock Distribution Optimizations
Global clock distribution adopts low skew multiple mesh design style built of highly optimized configurable
clock cells
• Smaller mesh regions and optimized driver design reduces global distribution skew and variability
costs by 30% for most synchronous paths
Clock mesh wire power reduced up to 40% (normalized to area) by optimizing high level metal usage and
reducing parasitic capacitance in clock drivers
MESH1
MESH2
MESH5 MESH7 MESH6
MESH4
MESH3
S
P
I
N
E SPINE
S
P
I
N
E
S
P
I
N
E
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 31 of 38
Local Clock Distribution Optimizations
Configurable mixed-depth structured clock tree adopted for local clock distribution
• Reduces median clock insertion by up to 50% which helps reduce jitter and PVT variability
• Multiple levels of clock gating provides both coarse and fine control
Bottom up expansion of clock tree adopted instead of region-based cloning
• Local clock tree CAC decreased by up to 10% with load-based cloning
S
P
I
N
E
S
P
I
N
E
S
P
I
N
E
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 32 of 38
Bus planning
Bus
widths
• Large busses (upto 2048b) and complex
crossbar structures
• Very large number of physical partitions
need to be managed in an operational
cadence
• 60 unique designs with ~1-2M instance
count, despite considerable reuse.
• Requires prototyping and proving of
achieving performance targets well in
advance of netlist drops
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 33 of 38
AMD Radeon™ RX 5700 Series
Conclusion
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 34 of 38
AMD Radeon™ RX 5700 XT – RDNA
Performance Improvements
Based on internal testing. See endnote RX-363
56CU vs. 40CU
300W TBP vs. 225W TBP
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 35 of 38
Conclusion
AMD RadeonTM RX 5700 Series increased frequency while lowering
active power and improving performance per clock
Enabled by
• Performance and power efficient next-generation AMD RDNA architecture
• Increased memory bandwidth while maintaining power envelope and keeping
costs low
• Advanced power management techniques that allowed residency in the
optimum power states while not limiting performance to the worst of the
population
• Attaining timing closure through reducing skew and jitter, improved bus
planning, judicious use of Vt cells, and innovative floorplanning
See endnote RX-327, RX-325 and RX-362
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 36 of 38
Acknowledgment
We would like to thank our talented AMD design teams
across Austin, Bangalore, Boston, Fort Collins, Hyderabad,
Markham, Santa Clara, and Shanghai.
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 37 of 38
Notes
• GD-81 HEVC (H.265), H.264, and VP9 acceleration are subject to and not operable without inclusion/installation of compatible HEVC players. GD-81
• GD-151: Boost Clock Frequency is the maximum frequency achievable on the GPU running a bursty workload. Boost clock achievability, frequency, and sustainability will vary based on several factors, including
but not limited to: thermal conditions and variation in applications and workloads.
• RX-325: Testing done by AMD performance labs 5/23/19, using the Division 2 @ 25x14 Ultra settings. Performance may vary based on use of latest drivers. RX-325
• RX-327: Testing done by AMD performance labs 5/23/19, showing a geomean of 1.25x per/clock across 30 different games @ 4K Ultra, 4xAA settings. Performance may vary based on use of latest drivers. RX-
327
• RX-329 Testing conducted by AMD Performance Labs as of 05/30/2019 on Radeon RX 5700XT with AMD Driver 19.10 (1902270946) on Intel i7-6900k, and on Radeon Vega Frontier Edition with AMD Driver 19.30 (1904231814) on
Intel i7-5960k. Both systems used 2x8GB DDR4 2133Mhz RAM, Asus ROG Rampage V Edition Motherboard, and Windows 10 Enterprise. Performance may vary. RX-329.
• RX-362: Testing done by AMD performance labs on June 4, 2019. Systems were tested with: Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz (6 core) with 16GB DDR4 @ 2133 MHz using a Asus X99-E Motherboard
running Windows 10 Enterprise 64-bit (Ver. 1809, build 17763.053). Using the following graphics cards: Navi 10 (Driver 19.30_1905161434 (CL# 1784070)) with 40 compute units, versus a Vega 64 (Driver
19.4.1) with 40 compute units enabled. Breakdown based on AMD internal data June 4, 2019. Performance may vary. RX-362
• RX-363 Testing done by AMD performance labs 5/30/2019 on Core i9-9900K (3.6 GHz), 16GB DDR4-3200MHz, GIGABYTE Z390 AORUS ELITE, Win 10 64-bit, AMD Driver 19.30 for RX5700, and 19.10-190502a for Vega 56. Measuring
FPS using: Dirt Rally 2, Sid Meier's Civilization 6, Metro Exodus, Tom Clancy's Ghost Recon Wildlands, Shadow of the Tomb Raider Battlefield 5, Assassin's Creed Odyssey, Call of Duty: Black Ops 4 The Division 2, Far Cry New Dawn. All
at max settings. PC manufacturers may vary configurations yielding different results.. Performance may vary based on use of latest drivers. RX-363
8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 38 of 38
Disclaimer and Endnotes
DISCLAIMER
The information contained herein is for informational purposes only and is subject to change without notice. While every precaution has been taken in the
preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise
correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of
this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with
respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any
intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD’s products are as set forth in a signed
agreement between the parties or in AMD's Standard Terms and Conditions of Sale. GD-18
All rights reserved. AMD, the AMD Arrow logo, combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this
publication are for identification purposes only and may be trademarks of their respective companies.

More Related Content

What's hot

Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUHot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUAMD
 
The Path to "Zen 2"
The Path to "Zen 2"The Path to "Zen 2"
The Path to "Zen 2"AMD
 
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APUDelivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APUAMD
 
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor CoreZen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor CoreAMD
 
Broadcom PCIe & CXL Switches OCP Final.pptx
Broadcom PCIe & CXL Switches OCP Final.pptxBroadcom PCIe & CXL Switches OCP Final.pptx
Broadcom PCIe & CXL Switches OCP Final.pptxMemory Fabric Forum
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLinside-BigData.com
 
PCCC23:日本AMD株式会社 テーマ1「AMD Instinct™ アクセラレーターの概要」
PCCC23:日本AMD株式会社 テーマ1「AMD Instinct™ アクセラレーターの概要」PCCC23:日本AMD株式会社 テーマ1「AMD Instinct™ アクセラレーターの概要」
PCCC23:日本AMD株式会社 テーマ1「AMD Instinct™ アクセラレーターの概要」PC Cluster Consortium
 
NVIDIA A100 ampere GPU
NVIDIA A100 ampere GPUNVIDIA A100 ampere GPU
NVIDIA A100 ampere GPUsystem_plus
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning Dr. Swaminathan Kathirvel
 
Introduction to FPGA acceleration
Introduction to FPGA accelerationIntroduction to FPGA acceleration
Introduction to FPGA accelerationMarco77328
 
Chiplets in Data Centers
Chiplets in Data CentersChiplets in Data Centers
Chiplets in Data CentersODSA Workgroup
 
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesAMD
 
MemVerge: Past Present and Future of CXL
MemVerge: Past Present and Future of CXLMemVerge: Past Present and Future of CXL
MemVerge: Past Present and Future of CXLMemory Fabric Forum
 
Delivering Supermicro Software Defined Storage Solutions with OSNexus QuantaStor
Delivering Supermicro Software Defined Storage Solutions with OSNexus QuantaStorDelivering Supermicro Software Defined Storage Solutions with OSNexus QuantaStor
Delivering Supermicro Software Defined Storage Solutions with OSNexus QuantaStorRebekah Rodriguez
 
AMD: Where Gaming Begins
AMD: Where Gaming BeginsAMD: Where Gaming Begins
AMD: Where Gaming BeginsAMD
 
AMD Ryzen CPU Zen Cores Architecture
AMD Ryzen CPU Zen Cores ArchitectureAMD Ryzen CPU Zen Cores Architecture
AMD Ryzen CPU Zen Cores ArchitectureLow Hong Chuan
 
The Tofu Interconnect D for the Post K Supercomputer
The Tofu Interconnect D for the Post K SupercomputerThe Tofu Interconnect D for the Post K Supercomputer
The Tofu Interconnect D for the Post K Supercomputerinside-BigData.com
 
Delivering the Future of High-Performance Computing
Delivering the Future of High-Performance ComputingDelivering the Future of High-Performance Computing
Delivering the Future of High-Performance ComputingAMD
 
Architecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPUArchitecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPUGlobalLogic Ukraine
 
AMD Hot Chips Bulldozer & Bobcat Presentation
AMD Hot Chips Bulldozer & Bobcat PresentationAMD Hot Chips Bulldozer & Bobcat Presentation
AMD Hot Chips Bulldozer & Bobcat PresentationAMD
 

What's hot (20)

Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUHot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
 
The Path to "Zen 2"
The Path to "Zen 2"The Path to "Zen 2"
The Path to "Zen 2"
 
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APUDelivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
 
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor CoreZen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
 
Broadcom PCIe & CXL Switches OCP Final.pptx
Broadcom PCIe & CXL Switches OCP Final.pptxBroadcom PCIe & CXL Switches OCP Final.pptx
Broadcom PCIe & CXL Switches OCP Final.pptx
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
PCCC23:日本AMD株式会社 テーマ1「AMD Instinct™ アクセラレーターの概要」
PCCC23:日本AMD株式会社 テーマ1「AMD Instinct™ アクセラレーターの概要」PCCC23:日本AMD株式会社 テーマ1「AMD Instinct™ アクセラレーターの概要」
PCCC23:日本AMD株式会社 テーマ1「AMD Instinct™ アクセラレーターの概要」
 
NVIDIA A100 ampere GPU
NVIDIA A100 ampere GPUNVIDIA A100 ampere GPU
NVIDIA A100 ampere GPU
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning
 
Introduction to FPGA acceleration
Introduction to FPGA accelerationIntroduction to FPGA acceleration
Introduction to FPGA acceleration
 
Chiplets in Data Centers
Chiplets in Data CentersChiplets in Data Centers
Chiplets in Data Centers
 
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
 
MemVerge: Past Present and Future of CXL
MemVerge: Past Present and Future of CXLMemVerge: Past Present and Future of CXL
MemVerge: Past Present and Future of CXL
 
Delivering Supermicro Software Defined Storage Solutions with OSNexus QuantaStor
Delivering Supermicro Software Defined Storage Solutions with OSNexus QuantaStorDelivering Supermicro Software Defined Storage Solutions with OSNexus QuantaStor
Delivering Supermicro Software Defined Storage Solutions with OSNexus QuantaStor
 
AMD: Where Gaming Begins
AMD: Where Gaming BeginsAMD: Where Gaming Begins
AMD: Where Gaming Begins
 
AMD Ryzen CPU Zen Cores Architecture
AMD Ryzen CPU Zen Cores ArchitectureAMD Ryzen CPU Zen Cores Architecture
AMD Ryzen CPU Zen Cores Architecture
 
The Tofu Interconnect D for the Post K Supercomputer
The Tofu Interconnect D for the Post K SupercomputerThe Tofu Interconnect D for the Post K Supercomputer
The Tofu Interconnect D for the Post K Supercomputer
 
Delivering the Future of High-Performance Computing
Delivering the Future of High-Performance ComputingDelivering the Future of High-Performance Computing
Delivering the Future of High-Performance Computing
 
Architecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPUArchitecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPU
 
AMD Hot Chips Bulldozer & Bobcat Presentation
AMD Hot Chips Bulldozer & Bobcat PresentationAMD Hot Chips Bulldozer & Bobcat Presentation
AMD Hot Chips Bulldozer & Bobcat Presentation
 

Similar to AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs

NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfNVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfMuhammadAbdullah311866
 
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028Amd epyc update_gdep_xilinx_ai_web_seminar_20201028
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028ssuser5b12d1
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahAMD Developer Central
 
Jetson AGX Xavier and the New Era of Autonomous Machines
Jetson AGX Xavier and the New Era of Autonomous MachinesJetson AGX Xavier and the New Era of Autonomous Machines
Jetson AGX Xavier and the New Era of Autonomous MachinesDustin Franklin
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrjRoberto Brandao
 
計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?Shinnosuke Furuya
 
“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...
“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...
“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...Edge AI and Vision Alliance
 
PowerEdge Rack and Tower Server Masters AMD Processors.pptx
PowerEdge Rack and Tower Server Masters AMD Processors.pptxPowerEdge Rack and Tower Server Masters AMD Processors.pptx
PowerEdge Rack and Tower Server Masters AMD Processors.pptxNeoKenj
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGATO project
 
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with UnivaNVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univainside-BigData.com
 
“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...
“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...
“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...Edge AI and Vision Alliance
 
AMD Financial Analyst Day
AMD Financial Analyst DayAMD Financial Analyst Day
AMD Financial Analyst DayAMD
 
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mãoWebinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mãoEmbarcados
 
Esperanto accelerates machine learning with 1000+ low power RISC-V cores on a...
Esperanto accelerates machine learning with 1000+ low power RISC-V cores on a...Esperanto accelerates machine learning with 1000+ low power RISC-V cores on a...
Esperanto accelerates machine learning with 1000+ low power RISC-V cores on a...RISC-V International
 
ArcGIS Server a Brief Synopsis
ArcGIS Server a Brief SynopsisArcGIS Server a Brief Synopsis
ArcGIS Server a Brief Synopsisewug
 
High Performance DSP with Xilinx All Programmable Devices (Design Conference ...
High Performance DSP with Xilinx All Programmable Devices (Design Conference ...High Performance DSP with Xilinx All Programmable Devices (Design Conference ...
High Performance DSP with Xilinx All Programmable Devices (Design Conference ...Analog Devices, Inc.
 
NVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyNVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyMark Kilgard
 
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMDEdge AI and Vision Alliance
 

Similar to AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs (20)

NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfNVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
 
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028Amd epyc update_gdep_xilinx_ai_web_seminar_20201028
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028
 
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla MahGS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
GS-4106 The AMD GCN Architecture - A Crash Course, by Layla Mah
 
Jetson AGX Xavier and the New Era of Autonomous Machines
Jetson AGX Xavier and the New Era of Autonomous MachinesJetson AGX Xavier and the New Era of Autonomous Machines
Jetson AGX Xavier and the New Era of Autonomous Machines
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrj
 
SDC Server Sao Jose
SDC Server Sao JoseSDC Server Sao Jose
SDC Server Sao Jose
 
計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?
 
“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...
“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...
“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...
 
PowerEdge Rack and Tower Server Masters AMD Processors.pptx
PowerEdge Rack and Tower Server Masters AMD Processors.pptxPowerEdge Rack and Tower Server Masters AMD Processors.pptx
PowerEdge Rack and Tower Server Masters AMD Processors.pptx
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous Hardware
 
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with UnivaNVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
 
Latest HPC News from NVIDIA
Latest HPC News from NVIDIALatest HPC News from NVIDIA
Latest HPC News from NVIDIA
 
“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...
“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...
“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...
 
AMD Financial Analyst Day
AMD Financial Analyst DayAMD Financial Analyst Day
AMD Financial Analyst Day
 
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mãoWebinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
 
Esperanto accelerates machine learning with 1000+ low power RISC-V cores on a...
Esperanto accelerates machine learning with 1000+ low power RISC-V cores on a...Esperanto accelerates machine learning with 1000+ low power RISC-V cores on a...
Esperanto accelerates machine learning with 1000+ low power RISC-V cores on a...
 
ArcGIS Server a Brief Synopsis
ArcGIS Server a Brief SynopsisArcGIS Server a Brief Synopsis
ArcGIS Server a Brief Synopsis
 
High Performance DSP with Xilinx All Programmable Devices (Design Conference ...
High Performance DSP with Xilinx All Programmable Devices (Design Conference ...High Performance DSP with Xilinx All Programmable Devices (Design Conference ...
High Performance DSP with Xilinx All Programmable Devices (Design Conference ...
 
NVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyNVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and Transparency
 
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
 

More from AMD

AMD EPYC Family World Record Performance Summary Mar 2022
AMD EPYC Family World Record Performance Summary Mar 2022AMD EPYC Family World Record Performance Summary Mar 2022
AMD EPYC Family World Record Performance Summary Mar 2022AMD
 
AMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD
 
AMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD
 
AMD EPYC World Records
AMD EPYC World RecordsAMD EPYC World Records
AMD EPYC World RecordsAMD
 
AMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD
 
AMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD
 
AMD EPYC 100 World Records and Counting
AMD EPYC 100 World Records and CountingAMD EPYC 100 World Records and Counting
AMD EPYC 100 World Records and CountingAMD
 
AMD EPYC 7002 Launch World Records
AMD EPYC 7002 Launch World RecordsAMD EPYC 7002 Launch World Records
AMD EPYC 7002 Launch World RecordsAMD
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next HorizonAMD
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next HorizonAMD
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next HorizonAMD
 
Race to Reality: The Next Billion-People Market Opportunity
Race to Reality: The Next Billion-People Market OpportunityRace to Reality: The Next Billion-People Market Opportunity
Race to Reality: The Next Billion-People Market OpportunityAMD
 
GPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print ImagingGPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print ImagingAMD
 
Enabling ARM® Server Technology for the Datacenter
Enabling ARM® Server Technology for the DatacenterEnabling ARM® Server Technology for the Datacenter
Enabling ARM® Server Technology for the DatacenterAMD
 
Lessons From MineCraft: Building the Right SMB Network
Lessons From MineCraft: Building the Right SMB NetworkLessons From MineCraft: Building the Right SMB Network
Lessons From MineCraft: Building the Right SMB NetworkAMD
 

More from AMD (15)

AMD EPYC Family World Record Performance Summary Mar 2022
AMD EPYC Family World Record Performance Summary Mar 2022AMD EPYC Family World Record Performance Summary Mar 2022
AMD EPYC Family World Record Performance Summary Mar 2022
 
AMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World Record
 
AMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World Record
 
AMD EPYC World Records
AMD EPYC World RecordsAMD EPYC World Records
AMD EPYC World Records
 
AMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD EPYC 7002 World Records
AMD EPYC 7002 World Records
 
AMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD EPYC 7002 World Records
AMD EPYC 7002 World Records
 
AMD EPYC 100 World Records and Counting
AMD EPYC 100 World Records and CountingAMD EPYC 100 World Records and Counting
AMD EPYC 100 World Records and Counting
 
AMD EPYC 7002 Launch World Records
AMD EPYC 7002 Launch World RecordsAMD EPYC 7002 Launch World Records
AMD EPYC 7002 Launch World Records
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next Horizon
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next Horizon
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next Horizon
 
Race to Reality: The Next Billion-People Market Opportunity
Race to Reality: The Next Billion-People Market OpportunityRace to Reality: The Next Billion-People Market Opportunity
Race to Reality: The Next Billion-People Market Opportunity
 
GPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print ImagingGPU Compute in Medical and Print Imaging
GPU Compute in Medical and Print Imaging
 
Enabling ARM® Server Technology for the Datacenter
Enabling ARM® Server Technology for the DatacenterEnabling ARM® Server Technology for the Datacenter
Enabling ARM® Server Technology for the Datacenter
 
Lessons From MineCraft: Building the Right SMB Network
Lessons From MineCraft: Building the Right SMB NetworkLessons From MineCraft: Building the Right SMB Network
Lessons From MineCraft: Building the Right SMB Network
 

Recently uploaded

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Recently uploaded (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs

  • 1. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 1 of 38 AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs Sal Dasgupta1,Teja Singh2, Ashish Jain2, Samuel Naffziger3, Deepesh John2, Chetan Bisht4, Pradeep Jayaraman1, Michael Mantor4 1AMD, Santa Clara, CA, 2AMD, Austin, TX, 3AMD, Fort Collins, CO, 4AMD, Orlando, FL Presented at ISSCC 2020
  • 2. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 2 of 38 Outline • Overview of AMD Radeon™ RX 5700 Series • AMD RDNA Architecture • Power Management features • GDDR6 (G6) PHY • Physical design
  • 3. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 3 of 38 AMD Radeon™ RX 5000 series • GPUs are everywhere • GPUs need to service a wide range of form factors and workloads • Fundamental challenge is to get higher and higher performance at lower and lower power PC Gaming Content Creation Console Gaming Cloud Gaming Mobile Devices
  • 4. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 4 of 38 Improvements • Up to 1.5x greater performance per watt than its predecessor • Up to 1.25x performance per clock compared to previous 14nm processors • Up to 1.23x higher max frequencies than its predecessor • Up to 1.23x lower power consumption than its predecessor Achieved through • 7nm process • Higher clocks • Focused design for lower dynamic power • Intelligent SOC design with power and performance at the forefront • Improved power management • All new AMD RDNA graphics architecture – higher performance for the same cycles AMD Radeon™ RX 5700 series See endnote RX-327, RX-325 and RX-362
  • 5. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 5 of 38 AMD Radeon™ RX 5700 Series Overview
  • 6. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 6 of 38 AMD Radeon™ RX 5700 XT PCIE® Gen4 x16 32 GB/s Display DP1.4 HDMI 4K 60fps Multimedia 4K H264 Encode/Decode H265/HEVC Encode/Decode GDDR6 256b 14 Gbps See endnote GD-81
  • 7. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 7 of 38 AMD Radeon™ RX 5700 XT Floorplan Graphics BUS Interface Display G6 PHYG6 PHY G6 Control
  • 8. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 8 of 38 AMD Radeon RX 5700 XT AMD Radeon RX Vega 64 AMD Radeon RX 580 Compute Units 40 64 36 Texture Units 160 256 144 ROPs 64 64 32 Memory Clocks 14 Gbps GDDR6 1.89 Gbps HBM2 8 Gbps GDDR5 Memory Bus Width 256 bit 2048 bit 256 bit Frame Buffer 8 GB 8 GB 8 GB Boost Clock 1905 Mhz 1546 Mhz 1257 Mhz Typical Board Power 225W 295W 185W Transitor Count 10.3B 12.5B 5.7B Manufacturing Process TSMC 7nm GF 14 nm GF 14 nm Architecture RDNA GCN GCN AMD Radeon™ RX 5700 XT Summary 0% 20% 40% 60% 80% 100% Additional Frequency and Power Improvement 7nm process Performance per Clock Enhancement 0 20 40 60 80 100 120 140 160 Delivered Performance GCN RDNA +50% Same-Power, Same-Configuration Performance Gains Performance Contributors See endnote GD-151, RX-325
  • 9. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 9 of 38 AMD Radeon™ RX 5700 Series RDNA Graphics Architecture
  • 10. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 10 of 38 40 RDNA Compute Units • 80 Scalar Processors • 2560 Stream Processors • 160 64b Bilinear Filter units Multilevel Cache • 4MB L2, 512KB L1, (V$, I$, K$) L0 • 2x V$L0 Load Bandwidth • DCC Everywhere Streamlined Graphics Engine • Geometry Engine (4 Prim shader out, 8 Prim shader in) • 64 Pixel Units • 4 Asynchronous Compute Engines Designed for higher frequencies at lower power AMD Radeon™ RX 5700 XT
  • 11. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 11 of 38 AMD Radeon™ RX 5700 XT Floorplan
  • 12. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 12 of 38 AMD Radeon™ RX 5700 XT Functional Floorplan InfinityFabric PCIE Gen4 Display Engine MultimediaEngine Geometry Processor ShaderEngine Command Processor HWS DMA 64-bit Memory Controller 64-bit Memory Controller L1 Prim Unit L1 Prim Unit L1 Prim Unit L1 Prim Unit Rasterizer Rasterizer ShaderEngine L2 L2 L2 L2 L2 L2 L2 L2 Compute Units L2 L2 L2 L2 L2 L2 L2 L2 64-bit Memory Controller 64-bit Memory Controller RBs ACE Compute Units RBsRBs RBs Compute Units Compute Units Rasterizer Rasterizer
  • 13. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 13 of 38 AMD Radeon™ RX 5700 XT Functional Floorplan InfinityFabric PCIE Gen4 Display Engine MultimediaEngine Geometry Processor ShaderEngine Command Processor HWS DMA 64-bit Memory Controller 64-bit Memory Controller L1 Prim Unit L1 Prim Unit L1 Prim Unit L1 Prim Unit Rasterizer Rasterizer ShaderEngine L2 L2 L2 L2 L2 L2 L2 L2 Compute Units L2 L2 L2 L2 L2 L2 L2 L2 64-bit Memory Controller 64-bit Memory Controller RBs ACE Compute Units RBsRBs RBs Compute Units Compute Units Rasterizer Rasterizer
  • 14. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 14 of 38 RDNA – Work Group Processor Up to 20 wave controllers Improved instruction arbitration 10KB scalar register file • 128 32b register per wavefront 128 KB Vector register file ~2X instruction rate vs GCN • Dual SIMD32 Single cycle issue • Wave32 on SIMD32 Bytes Per Flop • 128B Load/Store • 64B Filter Rate Scalar Registers Redraw the graphic so its not a blatant copy of Hot Chips RDNA Workgroup Processor Scalar Units Vector ALUs (SIMD32) Shader Sequencers Texture Mapping Units Vector Registers Texture L0 Cache Scalar Data Cache Local Data Share Shader Instruction Cache 32 wide single and dual half ALU • Full rate 32b FMA, Dual 16b FMA 8 wide transcendental ALU • Single cycle issue • Multi-cycle co-execution SIMD Unit WGP
  • 15. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 15 of 38 Schedulers Local Data ShareScalar Data Cache Shader Instruction Cache Texture Mapping Units Texture Filter Units Stream Processors Vector Registers Scalar Registers Scalar Units Scheduler Local Data Share Texture Filter Units L1 CacheVector ALU Texture Fetch Load/Store Units Scalar Registers Scalar Unit Vector Registers Vector Units Branch & Message Unit RDNA Compute Unit GCN Compute Unit RDNA WGP
  • 16. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 16 of 38 Cycle N Lanes 15 - 0 Cycle N+1 Lanes 31 – 16 Cycle N+2 Lanes 47 - 32 Cycle N+3 Lanes 63 - 48 WAVE64-4CYCLEISSUE Operand gathering 4 cycle issue VGPRVGPR VGPR VGPR Operand gathering 4 cycle issue Operand gathering 4 cycle issue Operand gathering 4 cycle issue SIMD0 SIMD 1 SIMD 2 SIMD 3 VGPRVGPR VGPR VGPR VGPRVGPR VGPR VGPRVGPRVGPR VGPR VGPR S Cycle0 SIMD0 Waves Cycle1 SIMD1 Waves Cycle2 SIMD2 Waves Cycle3 SIMD3 Waves SHARED SCALAR Cycle0 SIMD0 Waves Cycle1 SIMD1 Waves Cycle2 SIMD2 Waves Cycle3 SIMD3 Waves All work-items of a wave64 have an opportunity to do work once every 4 clocks due to hardware interleaving Special Function Unit alternate execution unit running at ¼ rate A wave from a SIMD has an opportunity to accomplish a scalar instruction once every 4 clocks GCN Instruction Issue
  • 17. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 17 of 38 SIMD 0 SIMD 1S S Operand gathering 1 cycle issue VGPRVGPR VGPR VGPR VGPRVGPR VGPR VGPR Operand gathering 1 cycle issue SIMD0 Wave32 – every cycle issue Vector Instruction Issue any cycle Or SFU Issue once every 4 cycles SIMD0 Wave32 Every cycle issue SIMD1 Wave32 – every cycle issue Vector Instruction Issue any cycle Or SFU Issue once every 4 cycles SIMD1 Wave32 Every cycle issue Vector Units - All work-items of one wave32 have an opportunity to do work every clock Special Function Unit uses 1 issue cycle and then executes in parallel Each SIMD equipped with a scalar unit for an instruction execution every cycle RDNA Instruction Issue
  • 18. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 18 of 38 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 s_add_i32 s0, s1, s2 … … … v_mul_f32 v0, v1, s0 … (simd busy 4 cycles) … … v_add_f32 v5, v4, v3 … … … v_sub_f32 v6, v7, v0 … … … s_add_i32 s0, s1, s2 v_mul_f32 v0, v1, s0 v_add_f32 v5, v4, v3 v_sub_f32 v6, v7, v0 s_add_i32 s0, s1, s2 … (salu dependency stall on S0) v_mul_f32 v0, v1, s0 v_add_f32 v5, v4, v3 … (valu dependency stall on V0) … … v_sub_f32 v6, v7, v0 s_add_i32 s0, s1, s2 … (salu dependency stall on S0) v_mul_f32 v0, v1, s0 (lo) v_mul_f32 v0, v1, s0 (hi) v_add_f32 v5, v4, v3 (lo) v_add_f32 v5, v4, v3 (hi) … (valu dependency stall on V0 lo) v_sub_f32 v6, v7, v0 (lo) v_sub_f32 v6, v7, v0 (hi) SHORTEST WAVE ISSUE LATENCY 44% REDUCTION IN ISSUE CYCLES RDNA Instruction Issue Example
  • 19. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 19 of 38 PCIe® 4.0 Async Compute L2 L1 Texture Geometry Rasterizer & Render Backends PCIe® 4.0 SOC Fabric GDDR6 Command Interfaces Shader Complex RDNA Redesigned Cache Hierarchy New L1 Cache Hierarchy
  • 20. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 20 of 38 RDNA Cache Hierarchy • Doubled the Load Bandwidth from L0 to ALU • Improved BW Amplification • Reduced Latency and Power • Reduced Congestion at L2 Level • Reduced Data Movement 4x64B/C 16x32/C 32B/CLK 32B/CLK 128B/CLK 64B/CLK 128B/CLK 64B/CLK 2X 2X Relative Cache Latency -24%-21% -7% See RX-329 in Endnotes.
  • 21. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 21 of 38 AMD Radeon™ RX 5700 Series Power Management
  • 22. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 22 of 38 AMD Radeon™ RX 5700 Series Power Management • Main functions • Manage clocks/voltages to maximize performance during active workloads • Draw minimum power during low activity conditions • Challenges • Highly variable GPU workloads • Parallel work leads to high power/current demand • Ever increasing needs for memory bandwidth • Power features include • AVFS to choose the most optimal per-part voltage • DVFS to choose the best operating point given current environment • Voltage droop mitigation and impact reduction • Agile responses to the current draw demands of the moment • Graceful throttling of the graphics core at thermal limits • Aggressive clock and power gating
  • 23. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 23 of 38 Power Management: Fine-grained DPM Previous Generation: Coarse-grained DPM • On previous architectures, operation was limited to a few (typically 8) pre-defined frequency values • Limited choice for Power Management Controller to choose from • Power/Thermal constrained effective frequency determined by dithering between neighboring coarse states AMD RadeonTM RX 5700 Series: Fine-grained DPM • Much finer-grained DPM state selection across the V/F curve • Improved perf/W efficiency by up to 5% as compared to previous generation by staying on the optimal curve • More accurate frequency selection between what the workload needs and what it gets DPM7 DPM6 DPM5 DPM4 DPM3 DPM2 DPM1 DPM0 Coarse- grained DPM Fine- grained DPM Fmax Fidle Fmin *Based on AMD internal data.
  • 24. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 24 of 38 Power Management – Per Part Fmax Previous generations • Fmax determined by slowest part of distribution • Lower Cac workloads may leave power on the table for a large population of parts AMD RadeonTM RX 5700 Series • Each individual part allowed to achieve max potential (up to 15% higher) by selecting its own Vmax-limited Fmax based on the speed of the part • Enables applications with lower Cac to sustain higher clocks rather than be limited to artificially low limits set by slowest parts Based on AMD internal data
  • 25. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 25 of 38 AMD Radeon™ RX 5700 Series GDDR6 Interface
  • 26. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 26 of 38 • 14Gbp/s 256b, 448 GB/s • Up to 75% BW per pin improvement over GDDR5 • Up to 60% performance/Watt GDDR6 Memory Based on AMD internal data
  • 27. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 27 of 38 G6 PHY Read Data Strobe (RDQS) mode to save power when high memory bandwidth is not required T-coil provides bandwidth enhancement and improved return loss enabling the high data rates on a single-ended interface • up to 16% in height • up to 26% in width 40.2 0.300 50.8 0.349 Based on AMD internal data
  • 28. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 28 of 38 AMD Radeon™ RX 5700 Series Physical Design
  • 29. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 29 of 38 AMD Radeon™ RX 5700 Physical Design Physical Design challenges • New architecture • Frequency uplift beyond natural process uplift • Large design (10.3B transistors) with wide busses and complex crossbar structures • Decreasing dynamic switching capacitance while providing that frequency uplift • Operational logistics of managing such a large design Approach • High-performance clock distribution • Intelligent SRAM generation • Automated place and route while offering hooks for customization • Exploiting the benefits of the technology while compensating for the new challenges it brings • Careful use of mixed VTH cells to close timing gaps while maintaining power requirements • Balancing resource constraints and the desire for physical reuse against area and performance targets • Power aware floorplanning and bus planning working in conjunction with logic design teams
  • 30. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 30 of 38 Global Clock Distribution Optimizations Global clock distribution adopts low skew multiple mesh design style built of highly optimized configurable clock cells • Smaller mesh regions and optimized driver design reduces global distribution skew and variability costs by 30% for most synchronous paths Clock mesh wire power reduced up to 40% (normalized to area) by optimizing high level metal usage and reducing parasitic capacitance in clock drivers MESH1 MESH2 MESH5 MESH7 MESH6 MESH4 MESH3 S P I N E SPINE S P I N E S P I N E
  • 31. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 31 of 38 Local Clock Distribution Optimizations Configurable mixed-depth structured clock tree adopted for local clock distribution • Reduces median clock insertion by up to 50% which helps reduce jitter and PVT variability • Multiple levels of clock gating provides both coarse and fine control Bottom up expansion of clock tree adopted instead of region-based cloning • Local clock tree CAC decreased by up to 10% with load-based cloning S P I N E S P I N E S P I N E
  • 32. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 32 of 38 Bus planning Bus widths • Large busses (upto 2048b) and complex crossbar structures • Very large number of physical partitions need to be managed in an operational cadence • 60 unique designs with ~1-2M instance count, despite considerable reuse. • Requires prototyping and proving of achieving performance targets well in advance of netlist drops
  • 33. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 33 of 38 AMD Radeon™ RX 5700 Series Conclusion
  • 34. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 34 of 38 AMD Radeon™ RX 5700 XT – RDNA Performance Improvements Based on internal testing. See endnote RX-363 56CU vs. 40CU 300W TBP vs. 225W TBP
  • 35. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 35 of 38 Conclusion AMD RadeonTM RX 5700 Series increased frequency while lowering active power and improving performance per clock Enabled by • Performance and power efficient next-generation AMD RDNA architecture • Increased memory bandwidth while maintaining power envelope and keeping costs low • Advanced power management techniques that allowed residency in the optimum power states while not limiting performance to the worst of the population • Attaining timing closure through reducing skew and jitter, improved bus planning, judicious use of Vt cells, and innovative floorplanning See endnote RX-327, RX-325 and RX-362
  • 36. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 36 of 38 Acknowledgment We would like to thank our talented AMD design teams across Austin, Bangalore, Boston, Fort Collins, Hyderabad, Markham, Santa Clara, and Shanghai.
  • 37. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 37 of 38 Notes • GD-81 HEVC (H.265), H.264, and VP9 acceleration are subject to and not operable without inclusion/installation of compatible HEVC players. GD-81 • GD-151: Boost Clock Frequency is the maximum frequency achievable on the GPU running a bursty workload. Boost clock achievability, frequency, and sustainability will vary based on several factors, including but not limited to: thermal conditions and variation in applications and workloads. • RX-325: Testing done by AMD performance labs 5/23/19, using the Division 2 @ 25x14 Ultra settings. Performance may vary based on use of latest drivers. RX-325 • RX-327: Testing done by AMD performance labs 5/23/19, showing a geomean of 1.25x per/clock across 30 different games @ 4K Ultra, 4xAA settings. Performance may vary based on use of latest drivers. RX- 327 • RX-329 Testing conducted by AMD Performance Labs as of 05/30/2019 on Radeon RX 5700XT with AMD Driver 19.10 (1902270946) on Intel i7-6900k, and on Radeon Vega Frontier Edition with AMD Driver 19.30 (1904231814) on Intel i7-5960k. Both systems used 2x8GB DDR4 2133Mhz RAM, Asus ROG Rampage V Edition Motherboard, and Windows 10 Enterprise. Performance may vary. RX-329. • RX-362: Testing done by AMD performance labs on June 4, 2019. Systems were tested with: Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz (6 core) with 16GB DDR4 @ 2133 MHz using a Asus X99-E Motherboard running Windows 10 Enterprise 64-bit (Ver. 1809, build 17763.053). Using the following graphics cards: Navi 10 (Driver 19.30_1905161434 (CL# 1784070)) with 40 compute units, versus a Vega 64 (Driver 19.4.1) with 40 compute units enabled. Breakdown based on AMD internal data June 4, 2019. Performance may vary. RX-362 • RX-363 Testing done by AMD performance labs 5/30/2019 on Core i9-9900K (3.6 GHz), 16GB DDR4-3200MHz, GIGABYTE Z390 AORUS ELITE, Win 10 64-bit, AMD Driver 19.30 for RX5700, and 19.10-190502a for Vega 56. Measuring FPS using: Dirt Rally 2, Sid Meier's Civilization 6, Metro Exodus, Tom Clancy's Ghost Recon Wildlands, Shadow of the Tomb Raider Battlefield 5, Assassin's Creed Odyssey, Call of Duty: Black Ops 4 The Division 2, Far Cry New Dawn. All at max settings. PC manufacturers may vary configurations yielding different results.. Performance may vary based on use of latest drivers. RX-363
  • 38. 8.4: Radeon RX 5700 Series : The AMD 7nm Energy-Efficient High-Performance GPUs© 2020 IEEE International Solid-State Circuits Conference 38 of 38 Disclaimer and Endnotes DISCLAIMER The information contained herein is for informational purposes only and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD’s products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and Conditions of Sale. GD-18 All rights reserved. AMD, the AMD Arrow logo, combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.