More Related Content
Similar to Gl efficiency (20)
Gl efficiency
- 1. © Copyright Khronos Group 2014 - Page 1
OpenGL Efficiency: AZDO
Cass Everitt
OpenGL Engineer, NVIDIA
GDC, San Francisco, March 2014
- 3. © Copyright Khronos Group 2014 - Page 3
Why do you care about driver overhead?
•Because driver overhead == cost
•Costs
- CPU cycles from app
- CPU cache from app
- power / battery
- GPU throughput
- 4. © Copyright Khronos Group 2014 - Page 4
OpenGL Fallacy: Old and Inefficient
Immediate
Mode Fixed
Function
Ancient
crufty stuff
Feedback
Selection
Evaluators
Display Lists
Selectors
- 5. © Copyright Khronos Group 2014 - Page 5
OpenGL Reality: Modern & Efficient
Bindless
ARB
SSBO
GL4.3
Multi-Draw
Indirect
GL4.3
UBO
GL3.1
Texture
Arrays
GL3.0
Buffer
Storage
GL4.4
- 6. © Copyright Khronos Group 2014 - Page 6
Plus, OpenGL has all the features
Compute
Tessellation
Geometry
Shaders
Sparse
Textures
Image
Load/Store
- 7. © Copyright Khronos Group 2014 - Page 7
indirect draw
buffer object
buffer object
texture object
buffer object
buffer object
texture object
buffer object
buffer object
buffer object
render target
buffer object
Classic OpenGL Model
CPU
GPU
…
Memory
cmd cmd cmdcmd
Direct Drawing Commands
(via the command fifo)
- 8. © Copyright Khronos Group 2014 - Page 8
Classic Model Pros / Cons
• Pro
- Very stable – 20+ year old code still “just works”
- Simple
- driver handles hazards, sync, allocation
- Empowered the GPU revolution
- Many classes of applications well served
• Cons
- Demanding apps are not so well served
- Intense games, VR
- Doesn’t scale with high scene complexity
- Threading model
- Hardware abstraction showing age
- 9. © Copyright Khronos Group 2014 - Page 9
Aspirational Goal
• Can we address the cons within the framework of the
existing API?
- That is, can we fix the cons without tossing the pros?
• Good question!
- As it turns out, Smart People in Khronos have actually
been working on this question for a while now
- And they’ve developed an efficient, modern OpenGL that
- Gives amazing perf improvements, and lives within the
existing framework
• And here’s what it looks like…
- 10. © Copyright Khronos Group 2014 - Page 10
indirect draw
buffer object
indirect draw
buffer object
texture object
buffer object
indirect draw
buffer object
texture object
buffer object
buffer object
buffer object
render target
buffer object
Efficient OpenGL Model
CPU
CPU
CPU
CPU
GPU
…
Memory
- 11. © Copyright Khronos Group 2014 - Page 11
CPU and GPU decoupled
CPU
CPU
CPU
CPU
GPU
…
Memory
- 12. © Copyright Khronos Group 2014 - Page 12
indirect draw
buffer object
indirect draw
buffer object
texture object
buffer object
indirect draw
buffer object
texture object
buffer object
buffer object
buffer object
render target
buffer object
CPU Writes Memory – multi-threaded (no API)!
CPU
CPU
CPU
CPU
GPU
…
Memory
- 13. © Copyright Khronos Group 2014 - Page 13
indirect draw
buffer object
indirect draw
buffer object
texture object
buffer object
indirect draw
buffer object
texture object
buffer object
buffer object
buffer object
render target
buffer object
And/Or GPU Writes Memory
CPU
CPU
CPU
CPU
GPU
…
Memory
GPU Work Creation
Still no API – the magic of communicating through memory…
- 14. © Copyright Khronos Group 2014 - Page 14
indirect draw
buffer object
indirect draw
buffer object
texture object
buffer object
indirect draw
buffer object
texture object
buffer object
buffer object
buffer object
render target
buffer object
GPU Reads Commands from Memory
CPU
CPU
CPU
CPU
GPU
…
Memory
Minimal CPU / driver involvement…
- 15. © Copyright Khronos Group 2014 - Page 15
Results
•Integer multiple speedups ~5x – ~15x
- This is not a typo
- On driver limited cases, obviously
•Works TODAY on existing drivers!
- Mostly GL4.2+
- Extensions are at least EXT
- 16. © Copyright Khronos Group 2014 - Page 16
Bonuses
• Enables scalable multi-threading with no new API
- Cores just write to memory
• Enables GPU Work Creation
- Compute job or similar
- Builds buffers, constructs MDI commands
• Does not require a new object model
• Does not require breaking existing applications
- 17. © Copyright Khronos Group 2014 - Page 17
Results
•
- This is not a typo
- On driver limited cases, obviously
•
- Mostly GL4.2+
- Extensions are at least EXT
- 18. © Copyright Khronos Group 2014 - Page 18
Results
•Integer multiple speedups ~5x – ~15x
- This is not a typo
- On driver limited cases, obviously
•Works TODAY on existing drivers!
- Mostly GL4.2+
- Extensions are at least EXT
- 19. © Copyright Khronos Group 2014 - Page 19
Results
•Integer multiple speedups ~5x – ~15x
- This is not a typo
- On driver limited cases, obviously
•Works TODAY on existing drivers!
- Mostly GL4.2+
- Extensions are at least EXT