SlideShare a Scribd company logo
1 of 38
Download to read offline
Debugging GPU faults: QoL
tools for your driver
Danylo Piliaiev
2023-10-17
1
Who Am I?
Currently implementing Adreno 7XX GPU generation
in Turnip
My blog:
In the past
Worked on mobile video games
Debugging unruly games since 2018
At Igalia since November 2020
blogs.igalia.com/dpiliaiev
2
The Problem
"What if I was able to quickly edit this GPU packet?"
"What if I was able to dump this buffer here?"
Or "It would have been nice to print that shader
register!"
"I'll implement it later...."
3
Unrecoverable Hangs - Roadblocks
Computer completely locks up and has to be rebooted
Last few seconds of logs/anything else are lost
Existing tooling isn't of much use with such
constraints
GFR (Graphics Flight Recorder):
VK layer for breadcrumbs
Dumps command buffers with commands' status
4
Unrecoverable Hangs - Solution
More BREADCRUMBS!
GFR writes results to the disk
GFR logging is far behind what's actually runs on GPU
GFR could be too high level:
Blits/BeginRenderPass/EndRenderPass could
internally use a lot of different 2d and 3d blits
5
Unrecoverable Hangs - Solution
Observations
Unrecoverable hangs are rarely caused by sync issues
Cannot allow GPU to race ahead of the last know
breadcrumb
A hang may happen asynchronously to the GPU
packet that triggered it, e.g.
A job is scheduled to another GPU unit
That GPU unit hangs some time afterwards
There may not be a way to synchronize it
6
Unrecoverable Hangs - Solution
The current solution in Turnip is:
Breadcrumbs are inserted after each GPU command
GPU writes a breadcrumb and immediately waits for
this value to be acknowledged
CPU in a busy loop checks the breadcrumb value
If new one is found, it is sent over the network
The CPU acks the breadcrumb and GPU continues
execution
7
How Our Breadcrumbs Work
BR1 BR2 BR3
BR1
Send
GPU
CPU
GPU executes this
command and
hangs
Kernel BR1
Send over
network
Wait Wait Wait
Read Write
BR2
Send
BR2
Read Write
8
Asynchronous hangs?
Require explicit input in tty for each breadcrumb
GPU is on breadcrumb 18, continue?y
GPU is on breadcrumb 19, continue?y
GPU is on breadcrumb 20, continue?y
GPU is on breadcrumb 21, continue?
9
Breadcrumbs In Practice
Increase GPU hang timeout
Receive breadcrumbs on another machine via bash
spaghetti
nc -lvup $PORT | stdbuf -o0 xxd -pc -c 4 | 
awk -Wposix '{printf("%u:%un", "0x" $0, a[$0]++)}
Launch workload with TU_BREADCRUMBS envvar
TU_BREADCRUMBS=$IP:$PORT,break=$BREAKPOINT:$BREAKPOI
Launch workload and break on the last known
10
Further Material
https://blogs.igalia.com/dpiliaiev/debugging-unrecoverable-hangs/
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15452
11
Faster Way To Debug Hangs
12
Breadcrumbs Shortcoming
Breadcrumbs are good for finding a command that
hangs
They cannot tell which part of the GPU state caused
it
They are useless for misrenderings
Some issues are not reproducible with breadcrumbs
13
Reproducing Hangs
Drivers are already able to capture command streams
And all used buffers
With this it is trivial to replay the submissions back
Ideally requires user-space specified GPU addresses
14
Replaying - Caveats
Multiple queues
When to re-upload memory?
Just force a single queue?
Timeline semaphores?
Recordings may be huge
15
Editing The Command Stream
Even the most minimalistic editing is useful:
/* pkt4: GRAS_2D_RESOLVE_CNTL_2 = { X = 63 | Y = 63 } */
pkt(cs, 4128831);
/* pkt4: RB_BLIT_SCISSOR_TL = { X = 0 | Y = 0 } */
pkt4(cs, 0x88d1, (2), 0);
/* pkt4: RB_BLIT_SCISSOR_BR = { X = 63 | Y = 63 } */
pkt(cs, 4128831);
pkt7(cs, CP_MEM_WRITE, 20);
/* { ADDR_LO = 0x1f7580 } */
pkt(cs, 2061696);
/* { ADDR_HI = 0x40 } */
pkt(cs, 64);
pkt(cs, 1216352390);
pkt(cs, 1107296256);
16
Editing Shaders
const char *source = R"(
shps #l37
getone #l37
cov.u32f32 r1.w, c504.z
cov.u32f32 r2.x, c504.w
cov.u32f32 r1.y, c504.x
....
end
)";
upload_shader(&ctx, 0x100200d80, source);
emit_shader_iova(&ctx, cs, 0x100200d80);
17
Replaying Edited Command Stream
The decompiler emits C code with raw commands
The replay tool takes original submissions capture:
Finds unused memory range
Emits edited command stream there
Overrides target submission
18
Dumping GPU Memory
Dumping GPU memory is simple to implement
Act of copying may disturb GPU caches
Kernel cooperation is needed to implement it
properly:
GPU interrupts execution e.g. by faulting
Now memory could be read undisturbed
19
Dumping Shader's Registers
print %tmp_regs, %src_reg
%tmp_regs - 3 consecutive free regs
For 64b address and 32b tmp offset
%src_reg - a single register to print
Shader Log Entries: 6
[0] 00000004 0.0000
[1] 00000000 0.0000
[2] 00000000 0.0000
[3] 00000000 0.0000
[4] beadc429 -0.3394
[5] beadc429 -0.3394
20
Dumping Shader's Registers
Want a nicer print? Just print $src_reg?
You still need to allocate temporary registers
What if there are no free regs?
Spilling regs may not be that easy at this stage
Too much trouble for a little gain...
21
Short Summary
A tool to replay command stream submissions
A tool to decompile a command stream into C code
An option to replay edit command stream
Helpers to dump GPU memory from the command
stream
Helpers to dump shader registers
22
Stale Regs In Command Stream
23
Debugging Stale Registers
It could be hard to spot stale reg usage:
It may appear as a random geometry flicker
Game hanging at a random moment
Rare CTS test failure
24
Stomping Registers - Caveats
Could be a bit tricky if a combination of regs causes
an issue
VK pipelines could be set outside a renderpass
Doesn't help if stale regs are between draw calls
Default invalid value may be valid for some registers
25
Stomping Registers
We mark each register with where it is used:
<reg32 offset="0x9600" name="VPC_DBG_ECO_CNTL" usage="cmd"/>
<reg32 offset="0xb987" name="HLSQ_CS_CNTL" variants="A6XX" usage="cmd"/>
<reg32 offset="0xb984" name="HLSQ_CONTROL_3_REG" variants="A6XX" usage="rp
<reg32 offset="0xb985" name="HLSQ_CONTROL_4_REG" variants="A6XX" usage="rp
To stomp register you need to specify:
export TU_DEBUG_STALE_REGS_RANGE=0x0c00,0xbe01
export TU_DEBUG_STALE_REGS_FLAGS=cmdbuf,renderpass
26
Turnip Tooling - Summary
Unique tooling:
Driver breadcrumbs
Command stream replaying and editing
GPU memory dumping
Shader register dumping
Debug option to find stale reg usage
27
Other Drivers and Tooling
28
Generic
GFR - Graphics Flight Recorder
Instruments command buffers with completion tags
Uses VK_AMD_buffer_marker (nothing vendor
specific)
In vkd3d-proton:
Breadcrumbs
Shader printf
Descriptor debugging
29
Other Mesa Drivers
Feature toggles and debug flags
Shader assembly replacement for debugging
GPU submissions decoding
GPU crash dumps decoding
30
Radeon - UMR
GPU register dumps
SGPR / VGPR shader register dumps
Shader wavefront Debugging
Shader disassembly around the crash site
See Maister's blog post for it in action
https://themaister.net/blog/2023/08/20/hardcor
e-vulkan-debugging-digging-deep-on-linux-amd
gpu/
31
Unreleased - Radeon - Shader Debugger
32
Proprietary - Radeon™ GPU Detective
Postmortem analysis of GPU crashes
Information about page faults
Breadcrumbs reflecting done and in-progress GPU
work
Command Buffer ID: 0x107c
=========================
[>] "Frame 1040 CL0"
├─[X] "Depth + Normal + Motion Vector PrePass"
├─[>] "DownSamplePS"
│ ├─[X] Draw
│ ├─[>] Draw
└─[>] "Bloom"
├─[>] "BlurPS"
│ ├─[>] Draw
│ └─[>] Draw
├─[ ] Draw
├─[ ] "BlurPS"
33
Proprietary - NVIDIA Aftermath
34
Proprietary - NVIDIA Aftermath
Collects GPU “mini-dumps”
Visualizes GPU state at the moment of crash
Collects breadcrumbs
Shows crashing shader and it registers
35
Q&A
Any good tools I haven't mentioned?
Maybe you tried something before?
Maybe you have an idea for a tool to implement?
We're hiring!
igalia.com/jobs/
36
37
Debugging GPU faults: QoL tools for your driver – XDC 2023

More Related Content

Similar to Debugging GPU faults: QoL tools for your driver – XDC 2023

eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceSUSE Labs Taipei
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUsfcassier
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Akihiro Hayashi
 
Embedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingEmbedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingAnne Nicolas
 
Reverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemReverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemCyber Security Alliance
 
BlueHat v18 || A mitigation for kernel toctou vulnerabilities
BlueHat v18 || A mitigation for kernel toctou vulnerabilitiesBlueHat v18 || A mitigation for kernel toctou vulnerabilities
BlueHat v18 || A mitigation for kernel toctou vulnerabilitiesBlueHat Security Conference
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010John Holden
 
Share the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development BoardShare the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development BoardJian-Hong Pan
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rFerdinand Jamitzky
 
Make ARM Shellcode Great Again - HITB2018PEK
Make ARM Shellcode Great Again - HITB2018PEKMake ARM Shellcode Great Again - HITB2018PEK
Make ARM Shellcode Great Again - HITB2018PEKSaumil Shah
 
Linux+sensor+device-tree+shell=IoT !
Linux+sensor+device-tree+shell=IoT !Linux+sensor+device-tree+shell=IoT !
Linux+sensor+device-tree+shell=IoT !Dobrica Pavlinušić
 
7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance 7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance AMD
 
Code Red Security
Code Red SecurityCode Red Security
Code Red SecurityAmr Ali
 
The n00bs guide to ovs dpdk
The n00bs guide to ovs dpdkThe n00bs guide to ovs dpdk
The n00bs guide to ovs dpdkmarkdgray
 
Exploiting the Linux Kernel via Intel's SYSRET Implementation
Exploiting the Linux Kernel via Intel's SYSRET ImplementationExploiting the Linux Kernel via Intel's SYSRET Implementation
Exploiting the Linux Kernel via Intel's SYSRET Implementationnkslides
 
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK... SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...Chester Chen
 
LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness Peter Griffin
 
LAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel AwarenessLAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel AwarenessLinaro
 

Similar to Debugging GPU faults: QoL tools for your driver – XDC 2023 (20)

eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to Userspace
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
XS Boston 2008 Debugging Xen
XS Boston 2008 Debugging XenXS Boston 2008 Debugging Xen
XS Boston 2008 Debugging Xen
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
 
Embedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingEmbedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debugging
 
Reverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemReverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande Modem
 
BlueHat v18 || A mitigation for kernel toctou vulnerabilities
BlueHat v18 || A mitigation for kernel toctou vulnerabilitiesBlueHat v18 || A mitigation for kernel toctou vulnerabilities
BlueHat v18 || A mitigation for kernel toctou vulnerabilities
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010
 
Share the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development BoardShare the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development Board
 
Lrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with rLrz kurs: gpu and mic programming with r
Lrz kurs: gpu and mic programming with r
 
Make ARM Shellcode Great Again - HITB2018PEK
Make ARM Shellcode Great Again - HITB2018PEKMake ARM Shellcode Great Again - HITB2018PEK
Make ARM Shellcode Great Again - HITB2018PEK
 
Linux+sensor+device-tree+shell=IoT !
Linux+sensor+device-tree+shell=IoT !Linux+sensor+device-tree+shell=IoT !
Linux+sensor+device-tree+shell=IoT !
 
7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance 7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance
 
Code Red Security
Code Red SecurityCode Red Security
Code Red Security
 
The n00bs guide to ovs dpdk
The n00bs guide to ovs dpdkThe n00bs guide to ovs dpdk
The n00bs guide to ovs dpdk
 
Exploiting the Linux Kernel via Intel's SYSRET Implementation
Exploiting the Linux Kernel via Intel's SYSRET ImplementationExploiting the Linux Kernel via Intel's SYSRET Implementation
Exploiting the Linux Kernel via Intel's SYSRET Implementation
 
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK... SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 
Writing bios
Writing biosWriting bios
Writing bios
 
LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness
 
LAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel AwarenessLAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel Awareness
 

More from Igalia

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Building End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPEBuilding End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPEIgalia
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Automated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded DevicesAutomated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded DevicesIgalia
 
Embedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to MaintenanceEmbedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to MaintenanceIgalia
 
Optimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdfOptimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdfIgalia
 
Running JS via WASM faster with JIT
Running JS via WASM      faster with JITRunning JS via WASM      faster with JIT
Running JS via WASM faster with JITIgalia
 
To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!Igalia
 
Implementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamerImplementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamerIgalia
 
8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in MesaIgalia
 
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIntroducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIgalia
 
2023 in Chimera Linux
2023 in Chimera                    Linux2023 in Chimera                    Linux
2023 in Chimera LinuxIgalia
 
Building a Linux distro with LLVM
Building a Linux distro        with LLVMBuilding a Linux distro        with LLVM
Building a Linux distro with LLVMIgalia
 
Graphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devicesGraphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devicesIgalia
 
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOSDelegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOSIgalia
 
MessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the webMessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the webIgalia
 
Replacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shadersReplacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shadersIgalia
 
I'm not an AMD expert, but...
I'm not an AMD expert, but...I'm not an AMD expert, but...
I'm not an AMD expert, but...Igalia
 
Status of Vulkan on Raspberry
Status of Vulkan on RaspberryStatus of Vulkan on Raspberry
Status of Vulkan on RaspberryIgalia
 
Enable hardware acceleration for GL applications without glamor on Xorg modes...
Enable hardware acceleration for GL applications without glamor on Xorg modes...Enable hardware acceleration for GL applications without glamor on Xorg modes...
Enable hardware acceleration for GL applications without glamor on Xorg modes...Igalia
 

More from Igalia (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Building End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPEBuilding End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPE
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded DevicesAutomated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded Devices
 
Embedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to MaintenanceEmbedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to Maintenance
 
Optimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdfOptimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdf
 
Running JS via WASM faster with JIT
Running JS via WASM      faster with JITRunning JS via WASM      faster with JIT
Running JS via WASM faster with JIT
 
To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!
 
Implementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamerImplementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamer
 
8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa
 
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIntroducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
 
2023 in Chimera Linux
2023 in Chimera                    Linux2023 in Chimera                    Linux
2023 in Chimera Linux
 
Building a Linux distro with LLVM
Building a Linux distro        with LLVMBuilding a Linux distro        with LLVM
Building a Linux distro with LLVM
 
Graphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devicesGraphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devices
 
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOSDelegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
 
MessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the webMessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the web
 
Replacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shadersReplacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shaders
 
I'm not an AMD expert, but...
I'm not an AMD expert, but...I'm not an AMD expert, but...
I'm not an AMD expert, but...
 
Status of Vulkan on Raspberry
Status of Vulkan on RaspberryStatus of Vulkan on Raspberry
Status of Vulkan on Raspberry
 
Enable hardware acceleration for GL applications without glamor on Xorg modes...
Enable hardware acceleration for GL applications without glamor on Xorg modes...Enable hardware acceleration for GL applications without glamor on Xorg modes...
Enable hardware acceleration for GL applications without glamor on Xorg modes...
 

Recently uploaded

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Recently uploaded (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Debugging GPU faults: QoL tools for your driver – XDC 2023

  • 1. Debugging GPU faults: QoL tools for your driver Danylo Piliaiev 2023-10-17 1
  • 2. Who Am I? Currently implementing Adreno 7XX GPU generation in Turnip My blog: In the past Worked on mobile video games Debugging unruly games since 2018 At Igalia since November 2020 blogs.igalia.com/dpiliaiev 2
  • 3. The Problem "What if I was able to quickly edit this GPU packet?" "What if I was able to dump this buffer here?" Or "It would have been nice to print that shader register!" "I'll implement it later...." 3
  • 4. Unrecoverable Hangs - Roadblocks Computer completely locks up and has to be rebooted Last few seconds of logs/anything else are lost Existing tooling isn't of much use with such constraints GFR (Graphics Flight Recorder): VK layer for breadcrumbs Dumps command buffers with commands' status 4
  • 5. Unrecoverable Hangs - Solution More BREADCRUMBS! GFR writes results to the disk GFR logging is far behind what's actually runs on GPU GFR could be too high level: Blits/BeginRenderPass/EndRenderPass could internally use a lot of different 2d and 3d blits 5
  • 6. Unrecoverable Hangs - Solution Observations Unrecoverable hangs are rarely caused by sync issues Cannot allow GPU to race ahead of the last know breadcrumb A hang may happen asynchronously to the GPU packet that triggered it, e.g. A job is scheduled to another GPU unit That GPU unit hangs some time afterwards There may not be a way to synchronize it 6
  • 7. Unrecoverable Hangs - Solution The current solution in Turnip is: Breadcrumbs are inserted after each GPU command GPU writes a breadcrumb and immediately waits for this value to be acknowledged CPU in a busy loop checks the breadcrumb value If new one is found, it is sent over the network The CPU acks the breadcrumb and GPU continues execution 7
  • 8. How Our Breadcrumbs Work BR1 BR2 BR3 BR1 Send GPU CPU GPU executes this command and hangs Kernel BR1 Send over network Wait Wait Wait Read Write BR2 Send BR2 Read Write 8
  • 9. Asynchronous hangs? Require explicit input in tty for each breadcrumb GPU is on breadcrumb 18, continue?y GPU is on breadcrumb 19, continue?y GPU is on breadcrumb 20, continue?y GPU is on breadcrumb 21, continue? 9
  • 10. Breadcrumbs In Practice Increase GPU hang timeout Receive breadcrumbs on another machine via bash spaghetti nc -lvup $PORT | stdbuf -o0 xxd -pc -c 4 | awk -Wposix '{printf("%u:%un", "0x" $0, a[$0]++)} Launch workload with TU_BREADCRUMBS envvar TU_BREADCRUMBS=$IP:$PORT,break=$BREAKPOINT:$BREAKPOI Launch workload and break on the last known 10
  • 12. Faster Way To Debug Hangs 12
  • 13. Breadcrumbs Shortcoming Breadcrumbs are good for finding a command that hangs They cannot tell which part of the GPU state caused it They are useless for misrenderings Some issues are not reproducible with breadcrumbs 13
  • 14. Reproducing Hangs Drivers are already able to capture command streams And all used buffers With this it is trivial to replay the submissions back Ideally requires user-space specified GPU addresses 14
  • 15. Replaying - Caveats Multiple queues When to re-upload memory? Just force a single queue? Timeline semaphores? Recordings may be huge 15
  • 16. Editing The Command Stream Even the most minimalistic editing is useful: /* pkt4: GRAS_2D_RESOLVE_CNTL_2 = { X = 63 | Y = 63 } */ pkt(cs, 4128831); /* pkt4: RB_BLIT_SCISSOR_TL = { X = 0 | Y = 0 } */ pkt4(cs, 0x88d1, (2), 0); /* pkt4: RB_BLIT_SCISSOR_BR = { X = 63 | Y = 63 } */ pkt(cs, 4128831); pkt7(cs, CP_MEM_WRITE, 20); /* { ADDR_LO = 0x1f7580 } */ pkt(cs, 2061696); /* { ADDR_HI = 0x40 } */ pkt(cs, 64); pkt(cs, 1216352390); pkt(cs, 1107296256); 16
  • 17. Editing Shaders const char *source = R"( shps #l37 getone #l37 cov.u32f32 r1.w, c504.z cov.u32f32 r2.x, c504.w cov.u32f32 r1.y, c504.x .... end )"; upload_shader(&ctx, 0x100200d80, source); emit_shader_iova(&ctx, cs, 0x100200d80); 17
  • 18. Replaying Edited Command Stream The decompiler emits C code with raw commands The replay tool takes original submissions capture: Finds unused memory range Emits edited command stream there Overrides target submission 18
  • 19. Dumping GPU Memory Dumping GPU memory is simple to implement Act of copying may disturb GPU caches Kernel cooperation is needed to implement it properly: GPU interrupts execution e.g. by faulting Now memory could be read undisturbed 19
  • 20. Dumping Shader's Registers print %tmp_regs, %src_reg %tmp_regs - 3 consecutive free regs For 64b address and 32b tmp offset %src_reg - a single register to print Shader Log Entries: 6 [0] 00000004 0.0000 [1] 00000000 0.0000 [2] 00000000 0.0000 [3] 00000000 0.0000 [4] beadc429 -0.3394 [5] beadc429 -0.3394 20
  • 21. Dumping Shader's Registers Want a nicer print? Just print $src_reg? You still need to allocate temporary registers What if there are no free regs? Spilling regs may not be that easy at this stage Too much trouble for a little gain... 21
  • 22. Short Summary A tool to replay command stream submissions A tool to decompile a command stream into C code An option to replay edit command stream Helpers to dump GPU memory from the command stream Helpers to dump shader registers 22
  • 23. Stale Regs In Command Stream 23
  • 24. Debugging Stale Registers It could be hard to spot stale reg usage: It may appear as a random geometry flicker Game hanging at a random moment Rare CTS test failure 24
  • 25. Stomping Registers - Caveats Could be a bit tricky if a combination of regs causes an issue VK pipelines could be set outside a renderpass Doesn't help if stale regs are between draw calls Default invalid value may be valid for some registers 25
  • 26. Stomping Registers We mark each register with where it is used: <reg32 offset="0x9600" name="VPC_DBG_ECO_CNTL" usage="cmd"/> <reg32 offset="0xb987" name="HLSQ_CS_CNTL" variants="A6XX" usage="cmd"/> <reg32 offset="0xb984" name="HLSQ_CONTROL_3_REG" variants="A6XX" usage="rp <reg32 offset="0xb985" name="HLSQ_CONTROL_4_REG" variants="A6XX" usage="rp To stomp register you need to specify: export TU_DEBUG_STALE_REGS_RANGE=0x0c00,0xbe01 export TU_DEBUG_STALE_REGS_FLAGS=cmdbuf,renderpass 26
  • 27. Turnip Tooling - Summary Unique tooling: Driver breadcrumbs Command stream replaying and editing GPU memory dumping Shader register dumping Debug option to find stale reg usage 27
  • 28. Other Drivers and Tooling 28
  • 29. Generic GFR - Graphics Flight Recorder Instruments command buffers with completion tags Uses VK_AMD_buffer_marker (nothing vendor specific) In vkd3d-proton: Breadcrumbs Shader printf Descriptor debugging 29
  • 30. Other Mesa Drivers Feature toggles and debug flags Shader assembly replacement for debugging GPU submissions decoding GPU crash dumps decoding 30
  • 31. Radeon - UMR GPU register dumps SGPR / VGPR shader register dumps Shader wavefront Debugging Shader disassembly around the crash site See Maister's blog post for it in action https://themaister.net/blog/2023/08/20/hardcor e-vulkan-debugging-digging-deep-on-linux-amd gpu/ 31
  • 32. Unreleased - Radeon - Shader Debugger 32
  • 33. Proprietary - Radeon™ GPU Detective Postmortem analysis of GPU crashes Information about page faults Breadcrumbs reflecting done and in-progress GPU work Command Buffer ID: 0x107c ========================= [>] "Frame 1040 CL0" ├─[X] "Depth + Normal + Motion Vector PrePass" ├─[>] "DownSamplePS" │ ├─[X] Draw │ ├─[>] Draw └─[>] "Bloom" ├─[>] "BlurPS" │ ├─[>] Draw │ └─[>] Draw ├─[ ] Draw ├─[ ] "BlurPS" 33
  • 34. Proprietary - NVIDIA Aftermath 34
  • 35. Proprietary - NVIDIA Aftermath Collects GPU “mini-dumps” Visualizes GPU state at the moment of crash Collects breadcrumbs Shows crashing shader and it registers 35
  • 36. Q&A Any good tools I haven't mentioned? Maybe you tried something before? Maybe you have an idea for a tool to implement? We're hiring! igalia.com/jobs/ 36
  • 37. 37