SlideShare a Scribd company logo
1 of 54
2020/10/22 朱玉婷
CUDA Debugger
OUTLINE
CUDA Programming and Execution Model
CUDA Memory Architecture
CUDA Exception List
CUDA Debugging
CUDA Terminology
OUTLINE
CUDA Programming and Execution Model
CUDA Memory Architecture
CUDA Exception List
CUDA Debugging
CUDA Terminology
BASIC CONCEPT
kernal
SM
device
warp
lane
block
thread
1 : 1
1 : N
N : 1
FUNCTION SPECIFIERS
Denote whether a function executes on the host or on the device and whether
it is callable from the host or from the device
 __global__ void kernel ( )
 __device__ void device ( )
 __host__ void main ( )
host device
__global__ callable execute
__device__ callable
__host__ execute
COMPILING PROCESS
 Separate source code to
host code and device code
 NVCC continue deal with
device code (PTX)
 Host code is pass to c++
compiler
 Combine then into
executable file
CPU code GPU code
.cu
.cpp
.ptxC++ compiler
Host linker
executable
PROGRAMMING MODEL
CPU GPU
MemoryMemory
coprocessor
CPU code GPU code
CUDA program  Data : CPU to GPU
 Allocate GPU memory
 Launch kernel on GPU
 Data : GPU to CPU
C funtion CUDA C funtion
malloc cudaMalloc
memcpy cudaMemcpy
memset cudaMemset
free cudaFree
OUTLINE
CUDA Programming and Execution Model
CUDA Memory Architecture
CUDA Exception List
CUDA Debugging
CUDA Terminology
GPU
HARDWARE ARCHITECTURE
Texture cache
Device Memory
SM0
SM1
Constant cache
Share memory
SP0 SP1 SP2 Registerthread
= thread blocks
Local
block
MEMORY TYPE
scope life locate
variable in kernel thread kernel register
arrary in kernel thread kernel local
__shared__ in kernel block kernel shared
__device__ grid application global
__constant__ grid application constant
EXPRIMENT GPU
OUTLINE
CUDA Programming and Execution Model
CUDA Memory Architecture
CUDA Exception List
CUDA Debugging
CUDA Terminology
CUDA EXCEPTION LIST
 illegal address
 stack overflow
 illegal instruction
 out-of-range address
 misaligned address
 invalid address space
 invaild PC
 Warp assert
 Syscall error
 invalid managed memory access
CASLAB SM EXCEPTION LIST
 illegal address
 stack overflow
 illegal instruction
 out-of-range address
 misaligned address
 invalid address space
 invaild PC
 Warp assert
 Syscall error
 invalid managed memory access
INVAILD PC
 Warp
This occurs when any thread within a warp advances its PC beyond the
40-bit address space
ILLEGAL INSTRUCTION
 Warp
This occurs when any thread within a warp has executed an illegal
instruction
CASLAB SM EXCEPTION LIST
CUDA EXCEPTION LIST
 illegal address
 Lane
 Device
 Warp
 stack overflow
 Lane
 Device
 Warp
 illegal instruction
 Warp
 out-of-range address
 Warp
 1
 2
 3
 4
 misaligned address
 Warp
 Lane
 invalid address space
 Warp
 invaild PC
 Warp
 Warp assert
 Warp
 Syscall error
 Lane
 invalid managed memory access
ILLEGAL ADDRESS
 Device
This occurs when a thread accesses an illegal (out of bounds) global address
 Warp
This occurs when a thread accesses an illegal (out of bounds) global/local/shared
address
 Lane
Precise (Requires memcheck on)
This occurs when a thread accesses an illegal (out of bounds) global address
STACK OVERFLOW
 Device
This occurs when the application triggers a global hardware stack overflow
The main cause of this error is large amounts of divergence in the presence of
function calls
 Warp
This occurs when any thread in a warp triggers a hardware stack overflow
 Lane
This occurs when a thread exceeds its stack memory limit
INVALID ADDRESS SPACE
 Warp
This occurs when any thread within a warp executes an instruction that
accesses a memory space not permitted for that instruction
MISALIGNED ADDRESS
 Warp
Occurs when any thread within a warp accesses an address in the local
or shared memory segments that is not correctly aligned
 Lane
This occurs when a thread accesses a global address that is not correctly
aligned
SYSCALL ERROR
 Lane
This occurs when a thread corrupts the heap by invoking free with an
invalid address
( ie, trying to free the same memory region twice )
INVALID MANAGED MEMORY ACCESS
 Host thread
This occurs when a host thread attempts to access managed memory
currently used by the GPU
WARP ASSERT
 Warp
This occurs when any thread in the warp hits a device side assertion
# include < assert.h >
__global__ void kernel ( )
{
assert ( threadIdx.x == 0 ) ;
}
OUTLINE
CUDA Programming and Execution Model
CUDA Memory Architecture
CUDA Exception List
CUDA Debugging
CUDA Terminology
CUDA DEBUGGING
1. Kernel Debugging
To inspect the flow and state of kernel execution on the fly
2. Memory Debugging
It focuses on the discovery of odd program behavior to the memory location
CUDA DEBUGGING
1. Kernel Debugging
2. Memory Debugging
KERNEL DEBUGGING
 Three Techniques
 CUDA-gdb
$ nvcc –g –G foo.cu –o foo
$ cuda-gdb foo
 printf
 assert
CUDA-GDB
Commands : break print run continue next step quit
 A CUDA program contain multiple host threads and many CUDA threads
 We can use cuda-gdb to report information about the current focus
CUDA INFO / FOCUS
(cuda-gdb) cuda thread lane warp block sm grid device kernel
Kernel1 ,grid 1027,block (0,0,0) thread (64,0,0) device 0, sm 1, warp 2,lane 0
(cuda-gdb) cuda thread (2)
CUDA DEBUGGING
1. Kernel Debugging
2. Memory Debugging
MEMORY DEBUGGING
$ cuda-memcheck [memcheck_options] app [app_options]
 Memcheck tool
 Racecheck tool
 Initcheck tool
 Syncheck tool
Memory
access error
Hardware
exception
Malloc/Free
errors
CUDA API
errors
cudaMalloc
memory leaks
Device Heap
Memory Leaks
MEMCHECK TOOL
To check for out-of-bounds and misaligned accesses in CUDA kernels
$ cuda-memcheck [memcheck_options] app [app_options]
 Memcheck tool
 Racecheck tool
 Initcheck tool
 Syncheck tool
MEMCHECK TOOL - OUT OF BOUNDS
MEMCHECK TOOL - OUT OF BOUNDS
MEMCHECK TOOL - OUT OF BOUNDS
MEMCHECK TOOL - OUT OF BOUNDS
MEMCHECK TOOL - MISALIGNED
MEMCHECK TOOL - MISALIGNED
RACECHECK TOOL
Shared memory data access hazards that can cause data races
$ cuda-memcheck [memcheck_options] app [app_options]
 Memcheck tool
 Racecheck tool
 Initcheck tool
 Syncheck tool
RACECHECK TOOL - BLOCK
__synthreads()
RACECHECK TOOL - WARP
__synwarp()
INITCHECK TOOL
The GPU performs uninitialized accesses to global memory
$ cuda-memcheck [memcheck_options] app [app_options]
 Memcheck tool
 Racecheck tool
 Initcheck tool
 Syncheck tool
INITCHECK TOOL
4 * 2 * 128
SYNCHECK TOOL
The application is attempting invalid usages of synchronization
$ cuda-memcheck [memcheck_options] app [app_options]
 Memcheck tool
 Racecheck tool
 Initcheck tool
 Syncheck tool
SYNCHECK TOOL
FUTURE WORK
改寫現有功能,使其更符合硬體行為
Trap handler 處理 SM exception 實作功能
處理軟體相容性問題、軟硬體溝通問題
功能擴充 新增GDB 除錯指令
OUTLINE
CUDA Programming and Execution Model
CUDA Memory Architecture
CUDA Exception List
CUDA Debugging
Appendix : CUDA Terminology
TERMINOLOGY
 Host
CPU and the system memory
 Device
GPU and its memory
 Kernel
A function that executes on the device , compose of several thread blocks (grid)
 SM
Streaming Multiprocessor , compose of several SPs , assign several thread blocks
 SP
Streaming Processor = CUDA Core , execute one thread
TERMINOLOGY
 Grid
Multiple thread blocks will form a grid
 Block
Several threads are grouped into a block, and the threads in the same block can be
synchronized, or they can communicate with each other via shared memory
 Warp
Set of threads that execute same instruction at the same time
 Thread
CUDA program is executed by many threads. A thread of a warp, called lane
CUDA GUARANTEES
 All threads in a thread black run on the same SM at the same
 All threads in a thread black run on the same SM may cooperate to solve sub-problem
 All threads in different thread black will not have cooperate relationship
 All blocks in a kernel finish before any blocks from the next kernel run
THANKS

More Related Content

What's hot

Linux Kernel Crashdump
Linux Kernel CrashdumpLinux Kernel Crashdump
Linux Kernel CrashdumpMarian Marinov
 
How to Root 10 Million Phones with One Exploit
How to Root 10 Million Phones with One ExploitHow to Root 10 Million Phones with One Exploit
How to Root 10 Million Phones with One ExploitJiahong Fang
 
Debugging linux kernel tools and techniques
Debugging linux kernel tools and  techniquesDebugging linux kernel tools and  techniques
Debugging linux kernel tools and techniquesSatpal Parmar
 
Tegra 186のu-boot & Linux
Tegra 186のu-boot & LinuxTegra 186のu-boot & Linux
Tegra 186のu-boot & LinuxMr. Vengineer
 
Kdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysisKdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysisBuland Singh
 
Troubleshooting Linux Kernel Modules And Device Drivers
Troubleshooting Linux Kernel Modules And Device DriversTroubleshooting Linux Kernel Modules And Device Drivers
Troubleshooting Linux Kernel Modules And Device DriversSatpal Parmar
 
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel FernandesKernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel FernandesAnne Nicolas
 
Kernel Recipes 2019 - Formal modeling made easy
Kernel Recipes 2019 - Formal modeling made easyKernel Recipes 2019 - Formal modeling made easy
Kernel Recipes 2019 - Formal modeling made easyAnne Nicolas
 
Crash Dump Analysis 101
Crash Dump Analysis 101Crash Dump Analysis 101
Crash Dump Analysis 101John Howard
 
Feb14 successful development
Feb14 successful developmentFeb14 successful development
Feb14 successful developmentConnor McDonald
 
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)Anne Nicolas
 
Изучаем миллиард состояний программы на уровне профи. Как разработать быстрый...
Изучаем миллиард состояний программы на уровне профи. Как разработать быстрый...Изучаем миллиард состояний программы на уровне профи. Как разработать быстрый...
Изучаем миллиард состояний программы на уровне профи. Как разработать быстрый...Positive Hack Days
 
Meder Kydyraliev - Mining Mach Services within OS X Sandbox
Meder Kydyraliev - Mining Mach Services within OS X SandboxMeder Kydyraliev - Mining Mach Services within OS X Sandbox
Meder Kydyraliev - Mining Mach Services within OS X SandboxDefconRussia
 
Ищем уязвимости нулевого дня в ядре Linux
Ищем уязвимости нулевого дня в ядре LinuxИщем уязвимости нулевого дня в ядре Linux
Ищем уязвимости нулевого дня в ядре LinuxPositive Hack Days
 
syzkaller: the next gen kernel fuzzer
syzkaller: the next gen kernel fuzzersyzkaller: the next gen kernel fuzzer
syzkaller: the next gen kernel fuzzerDmitry Vyukov
 
ARM Trusted FirmwareのBL31を単体で使う!
ARM Trusted FirmwareのBL31を単体で使う!ARM Trusted FirmwareのBL31を単体で使う!
ARM Trusted FirmwareのBL31を単体で使う!Mr. Vengineer
 

What's hot (20)

Linux Kernel Crashdump
Linux Kernel CrashdumpLinux Kernel Crashdump
Linux Kernel Crashdump
 
How to Root 10 Million Phones with One Exploit
How to Root 10 Million Phones with One ExploitHow to Root 10 Million Phones with One Exploit
How to Root 10 Million Phones with One Exploit
 
Debugging linux kernel tools and techniques
Debugging linux kernel tools and  techniquesDebugging linux kernel tools and  techniques
Debugging linux kernel tools and techniques
 
Tegra 186のu-boot & Linux
Tegra 186のu-boot & LinuxTegra 186のu-boot & Linux
Tegra 186のu-boot & Linux
 
Kdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysisKdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysis
 
Troubleshooting Linux Kernel Modules And Device Drivers
Troubleshooting Linux Kernel Modules And Device DriversTroubleshooting Linux Kernel Modules And Device Drivers
Troubleshooting Linux Kernel Modules And Device Drivers
 
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel FernandesKernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
 
Kernel Recipes 2019 - Formal modeling made easy
Kernel Recipes 2019 - Formal modeling made easyKernel Recipes 2019 - Formal modeling made easy
Kernel Recipes 2019 - Formal modeling made easy
 
Crash Dump Analysis 101
Crash Dump Analysis 101Crash Dump Analysis 101
Crash Dump Analysis 101
 
Linux : PSCI
Linux : PSCILinux : PSCI
Linux : PSCI
 
Feb14 successful development
Feb14 successful developmentFeb14 successful development
Feb14 successful development
 
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
 
Изучаем миллиард состояний программы на уровне профи. Как разработать быстрый...
Изучаем миллиард состояний программы на уровне профи. Как разработать быстрый...Изучаем миллиард состояний программы на уровне профи. Как разработать быстрый...
Изучаем миллиард состояний программы на уровне профи. Как разработать быстрый...
 
Alta disponibilidad en GNU/Linux
Alta disponibilidad en GNU/LinuxAlta disponibilidad en GNU/Linux
Alta disponibilidad en GNU/Linux
 
Meder Kydyraliev - Mining Mach Services within OS X Sandbox
Meder Kydyraliev - Mining Mach Services within OS X SandboxMeder Kydyraliev - Mining Mach Services within OS X Sandbox
Meder Kydyraliev - Mining Mach Services within OS X Sandbox
 
Ищем уязвимости нулевого дня в ядре Linux
Ищем уязвимости нулевого дня в ядре LinuxИщем уязвимости нулевого дня в ядре Linux
Ищем уязвимости нулевого дня в ядре Linux
 
syzkaller: the next gen kernel fuzzer
syzkaller: the next gen kernel fuzzersyzkaller: the next gen kernel fuzzer
syzkaller: the next gen kernel fuzzer
 
Debugging linux
Debugging linuxDebugging linux
Debugging linux
 
ARM Trusted FirmwareのBL31を単体で使う!
ARM Trusted FirmwareのBL31を単体で使う!ARM Trusted FirmwareのBL31を単体で使う!
ARM Trusted FirmwareのBL31を単体で使う!
 
Device tree
Device treeDevice tree
Device tree
 

Similar to Cuda debugger

gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsnARUNACHALAM468781
 
LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness Peter Griffin
 
LAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel AwarenessLAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel AwarenessLinaro
 
CUDA lab's slides of "parallel programming" course
CUDA lab's slides of "parallel programming" courseCUDA lab's slides of "parallel programming" course
CUDA lab's slides of "parallel programming" courseShuai Yuan
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Akihiro Hayashi
 
[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik
[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik
[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey GordeychikCODE BLUE
 
Kernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysisKernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysisAnne Nicolas
 
Hands on Virtualization with Ganeti
Hands on Virtualization with GanetiHands on Virtualization with Ganeti
Hands on Virtualization with GanetiOSCON Byrum
 
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScyllaDB
 
ARM Cortex-A53 Errata on Andoid
ARM Cortex-A53 Errata on AndoidARM Cortex-A53 Errata on Andoid
ARM Cortex-A53 Errata on Andoidhidenorly
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAPiyush Mittal
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.J On The Beach
 
Hadoop: Code Injection, Distributed Fault Injection
Hadoop: Code Injection, Distributed Fault InjectionHadoop: Code Injection, Distributed Fault Injection
Hadoop: Code Injection, Distributed Fault InjectionCloudera, Inc.
 
CUDA by Example : Introduction to CUDA C : Notes
CUDA by Example : Introduction to CUDA C : NotesCUDA by Example : Introduction to CUDA C : Notes
CUDA by Example : Introduction to CUDA C : NotesSubhajit Sahu
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Rob Gillen
 
The power of linux advanced tracer [POUG18]
The power of linux advanced tracer [POUG18]The power of linux advanced tracer [POUG18]
The power of linux advanced tracer [POUG18]Mahmoud Hatem
 
Share the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development BoardShare the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development BoardJian-Hong Pan
 

Similar to Cuda debugger (20)

gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsn
 
LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness
 
LAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel AwarenessLAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel Awareness
 
Defense_Presentation
Defense_PresentationDefense_Presentation
Defense_Presentation
 
CUDA lab's slides of "parallel programming" course
CUDA lab's slides of "parallel programming" courseCUDA lab's slides of "parallel programming" course
CUDA lab's slides of "parallel programming" course
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
 
[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik
[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik
[CB20] Vulnerabilities of Machine Learning Infrastructure by Sergey Gordeychik
 
Genode Compositions
Genode CompositionsGenode Compositions
Genode Compositions
 
FreeBSD: Dev to Prod
FreeBSD: Dev to ProdFreeBSD: Dev to Prod
FreeBSD: Dev to Prod
 
Kernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysisKernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysis
 
Hands on Virtualization with Ganeti
Hands on Virtualization with GanetiHands on Virtualization with Ganeti
Hands on Virtualization with Ganeti
 
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
 
ARM Cortex-A53 Errata on Andoid
ARM Cortex-A53 Errata on AndoidARM Cortex-A53 Errata on Andoid
ARM Cortex-A53 Errata on Andoid
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDA
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
Hadoop: Code Injection, Distributed Fault Injection
Hadoop: Code Injection, Distributed Fault InjectionHadoop: Code Injection, Distributed Fault Injection
Hadoop: Code Injection, Distributed Fault Injection
 
CUDA by Example : Introduction to CUDA C : Notes
CUDA by Example : Introduction to CUDA C : NotesCUDA by Example : Introduction to CUDA C : Notes
CUDA by Example : Introduction to CUDA C : Notes
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)
 
The power of linux advanced tracer [POUG18]
The power of linux advanced tracer [POUG18]The power of linux advanced tracer [POUG18]
The power of linux advanced tracer [POUG18]
 
Share the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development BoardShare the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development Board
 

Recently uploaded

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 

Recently uploaded (20)

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 

Cuda debugger

  • 2. OUTLINE CUDA Programming and Execution Model CUDA Memory Architecture CUDA Exception List CUDA Debugging CUDA Terminology
  • 3. OUTLINE CUDA Programming and Execution Model CUDA Memory Architecture CUDA Exception List CUDA Debugging CUDA Terminology
  • 5. FUNCTION SPECIFIERS Denote whether a function executes on the host or on the device and whether it is callable from the host or from the device  __global__ void kernel ( )  __device__ void device ( )  __host__ void main ( ) host device __global__ callable execute __device__ callable __host__ execute
  • 6. COMPILING PROCESS  Separate source code to host code and device code  NVCC continue deal with device code (PTX)  Host code is pass to c++ compiler  Combine then into executable file CPU code GPU code .cu .cpp .ptxC++ compiler Host linker executable
  • 7. PROGRAMMING MODEL CPU GPU MemoryMemory coprocessor CPU code GPU code CUDA program  Data : CPU to GPU  Allocate GPU memory  Launch kernel on GPU  Data : GPU to CPU C funtion CUDA C funtion malloc cudaMalloc memcpy cudaMemcpy memset cudaMemset free cudaFree
  • 8. OUTLINE CUDA Programming and Execution Model CUDA Memory Architecture CUDA Exception List CUDA Debugging CUDA Terminology
  • 9. GPU HARDWARE ARCHITECTURE Texture cache Device Memory SM0 SM1 Constant cache Share memory SP0 SP1 SP2 Registerthread = thread blocks Local block
  • 10. MEMORY TYPE scope life locate variable in kernel thread kernel register arrary in kernel thread kernel local __shared__ in kernel block kernel shared __device__ grid application global __constant__ grid application constant
  • 12. OUTLINE CUDA Programming and Execution Model CUDA Memory Architecture CUDA Exception List CUDA Debugging CUDA Terminology
  • 13. CUDA EXCEPTION LIST  illegal address  stack overflow  illegal instruction  out-of-range address  misaligned address  invalid address space  invaild PC  Warp assert  Syscall error  invalid managed memory access
  • 14. CASLAB SM EXCEPTION LIST  illegal address  stack overflow  illegal instruction  out-of-range address  misaligned address  invalid address space  invaild PC  Warp assert  Syscall error  invalid managed memory access
  • 15. INVAILD PC  Warp This occurs when any thread within a warp advances its PC beyond the 40-bit address space
  • 16. ILLEGAL INSTRUCTION  Warp This occurs when any thread within a warp has executed an illegal instruction
  • 18. CUDA EXCEPTION LIST  illegal address  Lane  Device  Warp  stack overflow  Lane  Device  Warp  illegal instruction  Warp  out-of-range address  Warp  1  2  3  4  misaligned address  Warp  Lane  invalid address space  Warp  invaild PC  Warp  Warp assert  Warp  Syscall error  Lane  invalid managed memory access
  • 19. ILLEGAL ADDRESS  Device This occurs when a thread accesses an illegal (out of bounds) global address  Warp This occurs when a thread accesses an illegal (out of bounds) global/local/shared address  Lane Precise (Requires memcheck on) This occurs when a thread accesses an illegal (out of bounds) global address
  • 20. STACK OVERFLOW  Device This occurs when the application triggers a global hardware stack overflow The main cause of this error is large amounts of divergence in the presence of function calls  Warp This occurs when any thread in a warp triggers a hardware stack overflow  Lane This occurs when a thread exceeds its stack memory limit
  • 21. INVALID ADDRESS SPACE  Warp This occurs when any thread within a warp executes an instruction that accesses a memory space not permitted for that instruction
  • 22. MISALIGNED ADDRESS  Warp Occurs when any thread within a warp accesses an address in the local or shared memory segments that is not correctly aligned  Lane This occurs when a thread accesses a global address that is not correctly aligned
  • 23. SYSCALL ERROR  Lane This occurs when a thread corrupts the heap by invoking free with an invalid address ( ie, trying to free the same memory region twice )
  • 24. INVALID MANAGED MEMORY ACCESS  Host thread This occurs when a host thread attempts to access managed memory currently used by the GPU
  • 25. WARP ASSERT  Warp This occurs when any thread in the warp hits a device side assertion # include < assert.h > __global__ void kernel ( ) { assert ( threadIdx.x == 0 ) ; }
  • 26. OUTLINE CUDA Programming and Execution Model CUDA Memory Architecture CUDA Exception List CUDA Debugging CUDA Terminology
  • 27. CUDA DEBUGGING 1. Kernel Debugging To inspect the flow and state of kernel execution on the fly 2. Memory Debugging It focuses on the discovery of odd program behavior to the memory location
  • 28. CUDA DEBUGGING 1. Kernel Debugging 2. Memory Debugging
  • 29. KERNEL DEBUGGING  Three Techniques  CUDA-gdb $ nvcc –g –G foo.cu –o foo $ cuda-gdb foo  printf  assert
  • 30. CUDA-GDB Commands : break print run continue next step quit  A CUDA program contain multiple host threads and many CUDA threads  We can use cuda-gdb to report information about the current focus
  • 31. CUDA INFO / FOCUS (cuda-gdb) cuda thread lane warp block sm grid device kernel Kernel1 ,grid 1027,block (0,0,0) thread (64,0,0) device 0, sm 1, warp 2,lane 0 (cuda-gdb) cuda thread (2)
  • 32. CUDA DEBUGGING 1. Kernel Debugging 2. Memory Debugging
  • 33. MEMORY DEBUGGING $ cuda-memcheck [memcheck_options] app [app_options]  Memcheck tool  Racecheck tool  Initcheck tool  Syncheck tool Memory access error Hardware exception Malloc/Free errors CUDA API errors cudaMalloc memory leaks Device Heap Memory Leaks
  • 34. MEMCHECK TOOL To check for out-of-bounds and misaligned accesses in CUDA kernels $ cuda-memcheck [memcheck_options] app [app_options]  Memcheck tool  Racecheck tool  Initcheck tool  Syncheck tool
  • 35. MEMCHECK TOOL - OUT OF BOUNDS
  • 36. MEMCHECK TOOL - OUT OF BOUNDS
  • 37. MEMCHECK TOOL - OUT OF BOUNDS
  • 38. MEMCHECK TOOL - OUT OF BOUNDS
  • 39. MEMCHECK TOOL - MISALIGNED
  • 40. MEMCHECK TOOL - MISALIGNED
  • 41. RACECHECK TOOL Shared memory data access hazards that can cause data races $ cuda-memcheck [memcheck_options] app [app_options]  Memcheck tool  Racecheck tool  Initcheck tool  Syncheck tool
  • 42. RACECHECK TOOL - BLOCK __synthreads()
  • 43. RACECHECK TOOL - WARP __synwarp()
  • 44. INITCHECK TOOL The GPU performs uninitialized accesses to global memory $ cuda-memcheck [memcheck_options] app [app_options]  Memcheck tool  Racecheck tool  Initcheck tool  Syncheck tool
  • 46. SYNCHECK TOOL The application is attempting invalid usages of synchronization $ cuda-memcheck [memcheck_options] app [app_options]  Memcheck tool  Racecheck tool  Initcheck tool  Syncheck tool
  • 48. FUTURE WORK 改寫現有功能,使其更符合硬體行為 Trap handler 處理 SM exception 實作功能 處理軟體相容性問題、軟硬體溝通問題 功能擴充 新增GDB 除錯指令
  • 49. OUTLINE CUDA Programming and Execution Model CUDA Memory Architecture CUDA Exception List CUDA Debugging Appendix : CUDA Terminology
  • 50. TERMINOLOGY  Host CPU and the system memory  Device GPU and its memory  Kernel A function that executes on the device , compose of several thread blocks (grid)  SM Streaming Multiprocessor , compose of several SPs , assign several thread blocks  SP Streaming Processor = CUDA Core , execute one thread
  • 51. TERMINOLOGY  Grid Multiple thread blocks will form a grid  Block Several threads are grouped into a block, and the threads in the same block can be synchronized, or they can communicate with each other via shared memory  Warp Set of threads that execute same instruction at the same time  Thread CUDA program is executed by many threads. A thread of a warp, called lane
  • 52. CUDA GUARANTEES  All threads in a thread black run on the same SM at the same  All threads in a thread black run on the same SM may cooperate to solve sub-problem  All threads in different thread black will not have cooperate relationship  All blocks in a kernel finish before any blocks from the next kernel run
  • 53.