SlideShare a Scribd company logo
1 of 19
© Copyright Khronos Group 2014 - Page 1
OpenGL Efficiency: AZDO
Cass Everitt
OpenGL Engineer, NVIDIA
GDC, San Francisco, March 2014
© Copyright Khronos Group 2014 - Page 2
AZDO?
• Approaching Zero Driver Overhead
© Copyright Khronos Group 2014 - Page 3
Why do you care about driver overhead?
•Because driver overhead == cost
•Costs
- CPU cycles from app
- CPU cache from app
- power / battery
- GPU throughput
© Copyright Khronos Group 2014 - Page 4
OpenGL Fallacy: Old and Inefficient
Immediate
Mode Fixed
Function
Ancient
crufty stuff
Feedback
Selection
Evaluators
Display Lists
Selectors
© Copyright Khronos Group 2014 - Page 5
OpenGL Reality: Modern & Efficient
Bindless
ARB
SSBO
GL4.3
Multi-Draw
Indirect
GL4.3
UBO
GL3.1
Texture
Arrays
GL3.0
Buffer
Storage
GL4.4
© Copyright Khronos Group 2014 - Page 6
Plus, OpenGL has all the features
Compute
Tessellation
Geometry
Shaders
Sparse
Textures
Image
Load/Store
© Copyright Khronos Group 2014 - Page 7
indirect draw
buffer object
buffer object
texture object
buffer object
buffer object
texture object
buffer object
buffer object
buffer object
render target
buffer object
Classic OpenGL Model
CPU
GPU
…
Memory
cmd cmd cmdcmd
Direct Drawing Commands
(via the command fifo)
© Copyright Khronos Group 2014 - Page 8
Classic Model Pros / Cons
• Pro
- Very stable – 20+ year old code still “just works”
- Simple
- driver handles hazards, sync, allocation
- Empowered the GPU revolution
- Many classes of applications well served
• Cons
- Demanding apps are not so well served
- Intense games, VR
- Doesn’t scale with high scene complexity
- Threading model
- Hardware abstraction showing age
© Copyright Khronos Group 2014 - Page 9
Aspirational Goal
• Can we address the cons within the framework of the
existing API?
- That is, can we fix the cons without tossing the pros?
• Good question!
- As it turns out, Smart People in Khronos have actually
been working on this question for a while now
- And they’ve developed an efficient, modern OpenGL that
- Gives amazing perf improvements, and lives within the
existing framework
• And here’s what it looks like…
© Copyright Khronos Group 2014 - Page 10
indirect draw
buffer object
indirect draw
buffer object
texture object
buffer object
indirect draw
buffer object
texture object
buffer object
buffer object
buffer object
render target
buffer object
Efficient OpenGL Model
CPU
CPU
CPU
CPU
GPU
…
Memory
© Copyright Khronos Group 2014 - Page 11
CPU and GPU decoupled
CPU
CPU
CPU
CPU
GPU
…
Memory
© Copyright Khronos Group 2014 - Page 12
indirect draw
buffer object
indirect draw
buffer object
texture object
buffer object
indirect draw
buffer object
texture object
buffer object
buffer object
buffer object
render target
buffer object
CPU Writes Memory – multi-threaded (no API)!
CPU
CPU
CPU
CPU
GPU
…
Memory
© Copyright Khronos Group 2014 - Page 13
indirect draw
buffer object
indirect draw
buffer object
texture object
buffer object
indirect draw
buffer object
texture object
buffer object
buffer object
buffer object
render target
buffer object
And/Or GPU Writes Memory
CPU
CPU
CPU
CPU
GPU
…
Memory
GPU Work Creation
Still no API – the magic of communicating through memory…
© Copyright Khronos Group 2014 - Page 14
indirect draw
buffer object
indirect draw
buffer object
texture object
buffer object
indirect draw
buffer object
texture object
buffer object
buffer object
buffer object
render target
buffer object
GPU Reads Commands from Memory
CPU
CPU
CPU
CPU
GPU
…
Memory
Minimal CPU / driver involvement…
© Copyright Khronos Group 2014 - Page 15
Results
•Integer multiple speedups ~5x – ~15x
- This is not a typo
- On driver limited cases, obviously
•Works TODAY on existing drivers!
- Mostly GL4.2+
- Extensions are at least EXT
© Copyright Khronos Group 2014 - Page 16
Bonuses
• Enables scalable multi-threading with no new API
- Cores just write to memory
• Enables GPU Work Creation
- Compute job or similar
- Builds buffers, constructs MDI commands
• Does not require a new object model
• Does not require breaking existing applications
© Copyright Khronos Group 2014 - Page 17
Results
•
- This is not a typo
- On driver limited cases, obviously
•
- Mostly GL4.2+
- Extensions are at least EXT
© Copyright Khronos Group 2014 - Page 18
Results
•Integer multiple speedups ~5x – ~15x
- This is not a typo
- On driver limited cases, obviously
•Works TODAY on existing drivers!
- Mostly GL4.2+
- Extensions are at least EXT
© Copyright Khronos Group 2014 - Page 19
Results
•Integer multiple speedups ~5x – ~15x
- This is not a typo
- On driver limited cases, obviously
•Works TODAY on existing drivers!
- Mostly GL4.2+
- Extensions are at least EXT

More Related Content

What's hot

Smedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphicsSmedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphicschangehee lee
 
GFX Part 5 - Introduction to Object Transformations in OpenGL ES
GFX Part 5 - Introduction to Object Transformations in OpenGL ESGFX Part 5 - Introduction to Object Transformations in OpenGL ES
GFX Part 5 - Introduction to Object Transformations in OpenGL ESPrabindh Sundareson
 
Discover the technology behind "The Heretic" – Unite Copenhagen 2019
Discover the technology behind "The Heretic" – Unite Copenhagen 2019Discover the technology behind "The Heretic" – Unite Copenhagen 2019
Discover the technology behind "The Heretic" – Unite Copenhagen 2019Unity Technologies
 
"Deep Learning on Arm Cortex-M Microcontrollers," a Presentation from Arm
"Deep Learning on Arm Cortex-M Microcontrollers," a Presentation from Arm"Deep Learning on Arm Cortex-M Microcontrollers," a Presentation from Arm
"Deep Learning on Arm Cortex-M Microcontrollers," a Presentation from ArmEdge AI and Vision Alliance
 
WT-4064, Build Rich Applications with HTML5 and WebGL, by Tony Parisi
WT-4064, Build Rich Applications with HTML5 and WebGL, by Tony ParisiWT-4064, Build Rich Applications with HTML5 and WebGL, by Tony Parisi
WT-4064, Build Rich Applications with HTML5 and WebGL, by Tony ParisiAMD Developer Central
 
Android High performance in GPU using opengles and renderscript
Android High performance in GPU using opengles and renderscriptAndroid High performance in GPU using opengles and renderscript
Android High performance in GPU using opengles and renderscriptArvind Devaraj
 
【Unite 2017 Tokyo】スクリプタブル・レンダーパイプラインのカスタマイズと拡張
【Unite 2017 Tokyo】スクリプタブル・レンダーパイプラインのカスタマイズと拡張【Unite 2017 Tokyo】スクリプタブル・レンダーパイプラインのカスタマイズと拡張
【Unite 2017 Tokyo】スクリプタブル・レンダーパイプラインのカスタマイズと拡張Unite2017Tokyo
 
NVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyNVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyMark Kilgard
 
Gdc 14 bringing unreal engine 4 to open_gl
Gdc 14 bringing unreal engine 4 to open_glGdc 14 bringing unreal engine 4 to open_gl
Gdc 14 bringing unreal engine 4 to open_glchangehee lee
 
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for ArtistsOwen Wu
 
Rendering Techniques for Augmented Reality and a Look Ahead at AR Foundation
Rendering Techniques for Augmented Reality and a Look Ahead at AR FoundationRendering Techniques for Augmented Reality and a Look Ahead at AR Foundation
Rendering Techniques for Augmented Reality and a Look Ahead at AR FoundationUnity Technologies
 
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option..."APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...Edge AI and Vision Alliance
 
【Unite 2018 Tokyo】スクリプタブルレンダーパイプライン入門
【Unite 2018 Tokyo】スクリプタブルレンダーパイプライン入門【Unite 2018 Tokyo】スクリプタブルレンダーパイプライン入門
【Unite 2018 Tokyo】スクリプタブルレンダーパイプライン入門Unity Technologies Japan K.K.
 
[TGDF 2020] Mobile Graphics Best Practices for Artist
[TGDF 2020] Mobile Graphics Best Practices for Artist[TGDF 2020] Mobile Graphics Best Practices for Artist
[TGDF 2020] Mobile Graphics Best Practices for ArtistOwen Wu
 
Il test pres_wanims
Il test pres_wanimsIl test pres_wanims
Il test pres_wanimsrrintala
 
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray TracingSyysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray TracingElectronic Arts / DICE
 
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu FengHC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu FengAMD Developer Central
 
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...Electronic Arts / DICE
 

What's hot (20)

Smedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphicsSmedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphics
 
GFX Part 5 - Introduction to Object Transformations in OpenGL ES
GFX Part 5 - Introduction to Object Transformations in OpenGL ESGFX Part 5 - Introduction to Object Transformations in OpenGL ES
GFX Part 5 - Introduction to Object Transformations in OpenGL ES
 
Discover the technology behind "The Heretic" – Unite Copenhagen 2019
Discover the technology behind "The Heretic" – Unite Copenhagen 2019Discover the technology behind "The Heretic" – Unite Copenhagen 2019
Discover the technology behind "The Heretic" – Unite Copenhagen 2019
 
Bending the Graphics Pipeline
Bending the Graphics PipelineBending the Graphics Pipeline
Bending the Graphics Pipeline
 
"Deep Learning on Arm Cortex-M Microcontrollers," a Presentation from Arm
"Deep Learning on Arm Cortex-M Microcontrollers," a Presentation from Arm"Deep Learning on Arm Cortex-M Microcontrollers," a Presentation from Arm
"Deep Learning on Arm Cortex-M Microcontrollers," a Presentation from Arm
 
WT-4064, Build Rich Applications with HTML5 and WebGL, by Tony Parisi
WT-4064, Build Rich Applications with HTML5 and WebGL, by Tony ParisiWT-4064, Build Rich Applications with HTML5 and WebGL, by Tony Parisi
WT-4064, Build Rich Applications with HTML5 and WebGL, by Tony Parisi
 
Android High performance in GPU using opengles and renderscript
Android High performance in GPU using opengles and renderscriptAndroid High performance in GPU using opengles and renderscript
Android High performance in GPU using opengles and renderscript
 
【Unite 2017 Tokyo】スクリプタブル・レンダーパイプラインのカスタマイズと拡張
【Unite 2017 Tokyo】スクリプタブル・レンダーパイプラインのカスタマイズと拡張【Unite 2017 Tokyo】スクリプタブル・レンダーパイプラインのカスタマイズと拡張
【Unite 2017 Tokyo】スクリプタブル・レンダーパイプラインのカスタマイズと拡張
 
NVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyNVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and Transparency
 
Gdc 14 bringing unreal engine 4 to open_gl
Gdc 14 bringing unreal engine 4 to open_glGdc 14 bringing unreal engine 4 to open_gl
Gdc 14 bringing unreal engine 4 to open_gl
 
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
 
Rendering Techniques for Augmented Reality and a Look Ahead at AR Foundation
Rendering Techniques for Augmented Reality and a Look Ahead at AR FoundationRendering Techniques for Augmented Reality and a Look Ahead at AR Foundation
Rendering Techniques for Augmented Reality and a Look Ahead at AR Foundation
 
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option..."APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...
 
【Unite 2018 Tokyo】スクリプタブルレンダーパイプライン入門
【Unite 2018 Tokyo】スクリプタブルレンダーパイプライン入門【Unite 2018 Tokyo】スクリプタブルレンダーパイプライン入門
【Unite 2018 Tokyo】スクリプタブルレンダーパイプライン入門
 
[TGDF 2020] Mobile Graphics Best Practices for Artist
[TGDF 2020] Mobile Graphics Best Practices for Artist[TGDF 2020] Mobile Graphics Best Practices for Artist
[TGDF 2020] Mobile Graphics Best Practices for Artist
 
Il test pres_wanims
Il test pres_wanimsIl test pres_wanims
Il test pres_wanims
 
#PDR15 - Pebble Graphics
#PDR15 - Pebble Graphics#PDR15 - Pebble Graphics
#PDR15 - Pebble Graphics
 
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray TracingSyysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
 
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu FengHC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
 
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
 

Viewers also liked

Oculus Rift Developer Kit 2 and Latency Mitigation techniques
Oculus Rift Developer Kit 2 and Latency Mitigation techniquesOculus Rift Developer Kit 2 and Latency Mitigation techniques
Oculus Rift Developer Kit 2 and Latency Mitigation techniquesCass Everitt
 
CS 354 Shadows (cont'd) and Scene Graphs
CS 354 Shadows (cont'd) and Scene GraphsCS 354 Shadows (cont'd) and Scene Graphs
CS 354 Shadows (cont'd) and Scene GraphsMark Kilgard
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadTristan Lorach
 
Approaching zero driver overhead
Approaching zero driver overheadApproaching zero driver overhead
Approaching zero driver overheadCass Everitt
 

Viewers also liked (6)

Oculus Rift Developer Kit 2 and Latency Mitigation techniques
Oculus Rift Developer Kit 2 and Latency Mitigation techniquesOculus Rift Developer Kit 2 and Latency Mitigation techniques
Oculus Rift Developer Kit 2 and Latency Mitigation techniques
 
Robasics
RobasicsRobasics
Robasics
 
CS 354 Shadows (cont'd) and Scene Graphs
CS 354 Shadows (cont'd) and Scene GraphsCS 354 Shadows (cont'd) and Scene Graphs
CS 354 Shadows (cont'd) and Scene Graphs
 
Chapter 06
Chapter 06Chapter 06
Chapter 06
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadOpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
 
Approaching zero driver overhead
Approaching zero driver overheadApproaching zero driver overhead
Approaching zero driver overhead
 

Similar to Gl efficiency

Automated perf optimization - jQuery Conference
Automated perf optimization - jQuery ConferenceAutomated perf optimization - jQuery Conference
Automated perf optimization - jQuery ConferenceMatthew Lancaster
 
Jun Heider - Flex Application Profiling By Example
Jun Heider - Flex Application Profiling By ExampleJun Heider - Flex Application Profiling By Example
Jun Heider - Flex Application Profiling By Example360|Conferences
 
Node.js meetup 17.05.2017 ember.js - escape the javascript fatigue
Node.js meetup 17.05.2017   ember.js - escape the javascript fatigueNode.js meetup 17.05.2017   ember.js - escape the javascript fatigue
Node.js meetup 17.05.2017 ember.js - escape the javascript fatigueTobias Braner
 
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati..."The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...Edge AI and Vision Alliance
 
Better Code: Concurrency
Better Code: ConcurrencyBetter Code: Concurrency
Better Code: ConcurrencyPlatonov Sergey
 
Droidcon 2013 automotive quality dunca_czol_garmin
Droidcon 2013 automotive quality dunca_czol_garminDroidcon 2013 automotive quality dunca_czol_garmin
Droidcon 2013 automotive quality dunca_czol_garminDroidcon Berlin
 
Dictionary Within the Cloud
Dictionary Within the CloudDictionary Within the Cloud
Dictionary Within the Cloudgueste4978b94
 
Beautiful code instead of callback hell using ES6 Generators, Koa, Bluebird (...
Beautiful code instead of callback hell using ES6 Generators, Koa, Bluebird (...Beautiful code instead of callback hell using ES6 Generators, Koa, Bluebird (...
Beautiful code instead of callback hell using ES6 Generators, Koa, Bluebird (...andreaslubbe
 
Web Performance Part 3 "Server-side tips"
Web Performance Part 3  "Server-side tips"Web Performance Part 3  "Server-side tips"
Web Performance Part 3 "Server-side tips"Binary Studio
 
Implementing Parallelism in PostgreSQL - PGCon 2014
Implementing Parallelism in PostgreSQL - PGCon 2014Implementing Parallelism in PostgreSQL - PGCon 2014
Implementing Parallelism in PostgreSQL - PGCon 2014EDB
 
PAC 2019 virtual Mark Tomlinson
PAC 2019 virtual Mark TomlinsonPAC 2019 virtual Mark Tomlinson
PAC 2019 virtual Mark TomlinsonNeotys
 
DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.Vlad Fedosov
 
Undo tech overview_201410
Undo tech overview_201410Undo tech overview_201410
Undo tech overview_201410gregthelaw
 
Qualcomm Snapdragon Processors: A Super Gaming Platform
Qualcomm Snapdragon Processors: A Super Gaming Platform Qualcomm Snapdragon Processors: A Super Gaming Platform
Qualcomm Snapdragon Processors: A Super Gaming Platform Qualcomm Developer Network
 
UplinQ - qualcomm® snapdragon™ processors a super gaming platform
UplinQ - qualcomm® snapdragon™ processors a super gaming platformUplinQ - qualcomm® snapdragon™ processors a super gaming platform
UplinQ - qualcomm® snapdragon™ processors a super gaming platformSatya Harish
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with BlackfireMarko Mitranić
 
Mongo Seattle - The Business of MongoDB
Mongo Seattle - The Business of MongoDBMongo Seattle - The Business of MongoDB
Mongo Seattle - The Business of MongoDBJustin Smestad
 
Pender presentation 2.0
Pender presentation 2.0 Pender presentation 2.0
Pender presentation 2.0 PhoneGap
 
Performance as UX with Justin Howlett
Performance as UX with Justin HowlettPerformance as UX with Justin Howlett
Performance as UX with Justin HowlettFITC
 
Raising ux bar with offline first design
Raising ux bar with offline first designRaising ux bar with offline first design
Raising ux bar with offline first designKyrylo Reznykov
 

Similar to Gl efficiency (20)

Automated perf optimization - jQuery Conference
Automated perf optimization - jQuery ConferenceAutomated perf optimization - jQuery Conference
Automated perf optimization - jQuery Conference
 
Jun Heider - Flex Application Profiling By Example
Jun Heider - Flex Application Profiling By ExampleJun Heider - Flex Application Profiling By Example
Jun Heider - Flex Application Profiling By Example
 
Node.js meetup 17.05.2017 ember.js - escape the javascript fatigue
Node.js meetup 17.05.2017   ember.js - escape the javascript fatigueNode.js meetup 17.05.2017   ember.js - escape the javascript fatigue
Node.js meetup 17.05.2017 ember.js - escape the javascript fatigue
 
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati..."The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
 
Better Code: Concurrency
Better Code: ConcurrencyBetter Code: Concurrency
Better Code: Concurrency
 
Droidcon 2013 automotive quality dunca_czol_garmin
Droidcon 2013 automotive quality dunca_czol_garminDroidcon 2013 automotive quality dunca_czol_garmin
Droidcon 2013 automotive quality dunca_czol_garmin
 
Dictionary Within the Cloud
Dictionary Within the CloudDictionary Within the Cloud
Dictionary Within the Cloud
 
Beautiful code instead of callback hell using ES6 Generators, Koa, Bluebird (...
Beautiful code instead of callback hell using ES6 Generators, Koa, Bluebird (...Beautiful code instead of callback hell using ES6 Generators, Koa, Bluebird (...
Beautiful code instead of callback hell using ES6 Generators, Koa, Bluebird (...
 
Web Performance Part 3 "Server-side tips"
Web Performance Part 3  "Server-side tips"Web Performance Part 3  "Server-side tips"
Web Performance Part 3 "Server-side tips"
 
Implementing Parallelism in PostgreSQL - PGCon 2014
Implementing Parallelism in PostgreSQL - PGCon 2014Implementing Parallelism in PostgreSQL - PGCon 2014
Implementing Parallelism in PostgreSQL - PGCon 2014
 
PAC 2019 virtual Mark Tomlinson
PAC 2019 virtual Mark TomlinsonPAC 2019 virtual Mark Tomlinson
PAC 2019 virtual Mark Tomlinson
 
DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.
 
Undo tech overview_201410
Undo tech overview_201410Undo tech overview_201410
Undo tech overview_201410
 
Qualcomm Snapdragon Processors: A Super Gaming Platform
Qualcomm Snapdragon Processors: A Super Gaming Platform Qualcomm Snapdragon Processors: A Super Gaming Platform
Qualcomm Snapdragon Processors: A Super Gaming Platform
 
UplinQ - qualcomm® snapdragon™ processors a super gaming platform
UplinQ - qualcomm® snapdragon™ processors a super gaming platformUplinQ - qualcomm® snapdragon™ processors a super gaming platform
UplinQ - qualcomm® snapdragon™ processors a super gaming platform
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire
 
Mongo Seattle - The Business of MongoDB
Mongo Seattle - The Business of MongoDBMongo Seattle - The Business of MongoDB
Mongo Seattle - The Business of MongoDB
 
Pender presentation 2.0
Pender presentation 2.0 Pender presentation 2.0
Pender presentation 2.0
 
Performance as UX with Justin Howlett
Performance as UX with Justin HowlettPerformance as UX with Justin Howlett
Performance as UX with Justin Howlett
 
Raising ux bar with offline first design
Raising ux bar with offline first designRaising ux bar with offline first design
Raising ux bar with offline first design
 

Gl efficiency

  • 1. © Copyright Khronos Group 2014 - Page 1 OpenGL Efficiency: AZDO Cass Everitt OpenGL Engineer, NVIDIA GDC, San Francisco, March 2014
  • 2. © Copyright Khronos Group 2014 - Page 2 AZDO? • Approaching Zero Driver Overhead
  • 3. © Copyright Khronos Group 2014 - Page 3 Why do you care about driver overhead? •Because driver overhead == cost •Costs - CPU cycles from app - CPU cache from app - power / battery - GPU throughput
  • 4. © Copyright Khronos Group 2014 - Page 4 OpenGL Fallacy: Old and Inefficient Immediate Mode Fixed Function Ancient crufty stuff Feedback Selection Evaluators Display Lists Selectors
  • 5. © Copyright Khronos Group 2014 - Page 5 OpenGL Reality: Modern & Efficient Bindless ARB SSBO GL4.3 Multi-Draw Indirect GL4.3 UBO GL3.1 Texture Arrays GL3.0 Buffer Storage GL4.4
  • 6. © Copyright Khronos Group 2014 - Page 6 Plus, OpenGL has all the features Compute Tessellation Geometry Shaders Sparse Textures Image Load/Store
  • 7. © Copyright Khronos Group 2014 - Page 7 indirect draw buffer object buffer object texture object buffer object buffer object texture object buffer object buffer object buffer object render target buffer object Classic OpenGL Model CPU GPU … Memory cmd cmd cmdcmd Direct Drawing Commands (via the command fifo)
  • 8. © Copyright Khronos Group 2014 - Page 8 Classic Model Pros / Cons • Pro - Very stable – 20+ year old code still “just works” - Simple - driver handles hazards, sync, allocation - Empowered the GPU revolution - Many classes of applications well served • Cons - Demanding apps are not so well served - Intense games, VR - Doesn’t scale with high scene complexity - Threading model - Hardware abstraction showing age
  • 9. © Copyright Khronos Group 2014 - Page 9 Aspirational Goal • Can we address the cons within the framework of the existing API? - That is, can we fix the cons without tossing the pros? • Good question! - As it turns out, Smart People in Khronos have actually been working on this question for a while now - And they’ve developed an efficient, modern OpenGL that - Gives amazing perf improvements, and lives within the existing framework • And here’s what it looks like…
  • 10. © Copyright Khronos Group 2014 - Page 10 indirect draw buffer object indirect draw buffer object texture object buffer object indirect draw buffer object texture object buffer object buffer object buffer object render target buffer object Efficient OpenGL Model CPU CPU CPU CPU GPU … Memory
  • 11. © Copyright Khronos Group 2014 - Page 11 CPU and GPU decoupled CPU CPU CPU CPU GPU … Memory
  • 12. © Copyright Khronos Group 2014 - Page 12 indirect draw buffer object indirect draw buffer object texture object buffer object indirect draw buffer object texture object buffer object buffer object buffer object render target buffer object CPU Writes Memory – multi-threaded (no API)! CPU CPU CPU CPU GPU … Memory
  • 13. © Copyright Khronos Group 2014 - Page 13 indirect draw buffer object indirect draw buffer object texture object buffer object indirect draw buffer object texture object buffer object buffer object buffer object render target buffer object And/Or GPU Writes Memory CPU CPU CPU CPU GPU … Memory GPU Work Creation Still no API – the magic of communicating through memory…
  • 14. © Copyright Khronos Group 2014 - Page 14 indirect draw buffer object indirect draw buffer object texture object buffer object indirect draw buffer object texture object buffer object buffer object buffer object render target buffer object GPU Reads Commands from Memory CPU CPU CPU CPU GPU … Memory Minimal CPU / driver involvement…
  • 15. © Copyright Khronos Group 2014 - Page 15 Results •Integer multiple speedups ~5x – ~15x - This is not a typo - On driver limited cases, obviously •Works TODAY on existing drivers! - Mostly GL4.2+ - Extensions are at least EXT
  • 16. © Copyright Khronos Group 2014 - Page 16 Bonuses • Enables scalable multi-threading with no new API - Cores just write to memory • Enables GPU Work Creation - Compute job or similar - Builds buffers, constructs MDI commands • Does not require a new object model • Does not require breaking existing applications
  • 17. © Copyright Khronos Group 2014 - Page 17 Results • - This is not a typo - On driver limited cases, obviously • - Mostly GL4.2+ - Extensions are at least EXT
  • 18. © Copyright Khronos Group 2014 - Page 18 Results •Integer multiple speedups ~5x – ~15x - This is not a typo - On driver limited cases, obviously •Works TODAY on existing drivers! - Mostly GL4.2+ - Extensions are at least EXT
  • 19. © Copyright Khronos Group 2014 - Page 19 Results •Integer multiple speedups ~5x – ~15x - This is not a typo - On driver limited cases, obviously •Works TODAY on existing drivers! - Mostly GL4.2+ - Extensions are at least EXT