Review state-of-the-art techniques that use neural networks to synthesize motion, such as mode-adaptive neural network and phase-functioned neural networks. See how next-generation CPUs with reinforcement learning can offer better performance.
3. Agenda
3
Ø Overview of Reinforcement Learning (RL)
Ø Reinforcement Learning in Gaming
Ø Training RL Algorithms
Ø Intelligent Motion Use case
Ø Performance Optimization on Intel® CPU
Ø Inference RL Algorithms
Ø Understanding Motion models
Ø Using DirectML* to leverage Intel GPUs
Ø Summary
4. Overview of Machine Learning
4
4
m
Machine Learning
Supervised Unsupervised Reinforcement
Data; labels à Class
Task driven
Data à Cluster State à Action
Learn from mistake
6. High-Level Reinforcement Learning Overview
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Agent gets state (s) from environment
Agent takes action (a) using policy (π)
Agent receives reward (r)
Goal: Maximize large future reward return (R)
https://unity3d.com/machine-learning
7. Examples Of RL Algorithms
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
• Actor-Critic algorithms (model based learning)*
• Reduce variance of policy gradient using the actor
(the policy) and critic (value function)
• Value Based
• Q-Learning
• Find best action under current state
• Policy based
• Trust Region Policy Optimization
• Generalized Advantage estimation
http://rail.eecs.berkeley.edu/deeprlcourse-fa17/f17docs/lecture_3_rl_intro.pdf
8. Brain behind Algorithms
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
• Value Functions
• How much reward a state or an action by prediction of total future
reward (return)
• Policy Methods
• Find the best action directly
• Optimize policy (behavior) directly
• Vanilla Policy Gradients
• For every episode with positive reward use gradient to increase
probability of future actions
• Improved Policy Gradients
• Multiple gradient steps per episode
9. Popular Path
To Bring
Machine
Learning In
Games
• Microsoft*
• DirectML (DML) framework
• Ubisoft* – LaForge
• Bringing research into industry
• Access to game engines and data
• Unity*
• First party support via ML-Agents
• Interface between research and gaming
• DML backend coming soon
10. Motion With Reinforcement Learning
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
• Understanding path or motion planning problem is crucial in
unstructured environment
• Data driven input in combination of physics based animation character to create
smooth and robust animation
• RL offers a convenient framework for learning different strategies without
mountain of data
• Solves generalization problems by path or motion planning
Deep Q-Networks : Volodymyr Mnih, Deep RL Bootcamp, Berkeley, DeepMind*
11. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
• Q-learning (Q) : State × Action → Result, if we were to take an action in a given
state, then we could easily construct a policy that maximizes our rewards:
• A = argmax Q (s,a)
• Neural network helps to resemble Q as it can calculate universal function approximators
• Q(s,a)=r+γQa’(sʹ,aʹ))
Equations to framework
(e.g. Q-Learning à DQN Learning)
Layer-1 Layer-3Layer-2state Q(s,n)
conv conv conv FC FC
Q Values
Straight
Left
Right
Activation
function
Activation
function
Activation
function
12. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Evaluating Motion Algorithms On Intel® Core Processors
https://github.com/xbpeng/DeepMimic
0 500 1000 1500 2000 2500 3000 3500
5
10
15
20
25
30
35
40
45
50
55
60
Minutes
MillionIterations
TensorFlow Baseline
~52hours of training on
8Core platform
~52hours to train on CPU à Can we do better?
Testing by Intel as of June 28th , 2019 Intel® i9-9900k, 95W TDP, 8C16T; Frequency : 4.3Ghz, Turbo Enabled Graphics: NVIDIA* GTX 2080, Memory: 4x8GB@2133Mhz, Storage: Intel SSD 545 Series 240GB, OS: Windows* 10 RS5
BIOS build: CFLSFX1.R00.X151B01. All data is collected with Tensorflow* 1.12 and DeepMimic branch dates June 28th 2019
13. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Analyzing Software Stack
~20% of actual time is spend in compute and rest are overhead
Intel® VTune™ Amplifier XE
Actual compute
Inefficiency due to spins
14. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Optimizing the Software Stack - 1
ØRe-evaluating libraries included in software stack for DeepMimic
• Recompiling Tensorflow* with Intel® MKLDNN
bazel --output_base=output_dir build --config=mkl --config=opt
//tensorflow/tools/pip_package:build_pip_package
python -c "import tensorflow; print(tensorflow.pywrap_tensorflow.IsMklEnabled())“ à Result : True
• Evaluate different threading parameters to reduce spin time
import tensorflow # this sets KMP_BLOCKTIME and OMP_PROC_BIND
import os # delete the existing values
del os.environ['OMP_PROC_BIND’]
del os.environ['KMP_BLOCKTIME’]
ØMoving Python installation à Optimize Intel Python libraries
• Simple optimizations by moving numpy libraries to more efficient Intel
Numpy libraries
15. Optimizing the Software Stack - 2
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
ØOptimizing math libraries to use FP32 datatype and parallelism instead of
double precision and scalar code
• Mapping libraries from Eigen scaler to Eigen with MKL
Compiling EIGEN with MKL and Bullet3 (Physics SDK : real-time collision library) to use
AVX2 code path
16. Optimization Results
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Baseline After Optimizations
Putting CPUs to Work
• Application is able to train with acceptable compute instead of spinning
• Most of spinning from OpenMP and threading is removed due to Tensorflow with MKLDNN
• Eigen MKL library in DeepMimic Core is able to take advantage of intrinsic code
17. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
• Optimizing training is first step for deployment
• Correct libraries and datatype is important for deep learning training
performance
Training Result with Optimized Stack
Reducing training time by 2.6x by enabling multithreading and using MKLDNN instead of Eigen à 50hours to 19hours
0
1000
2000
3000
4000
5 10 15 20 25 30 35 40 45 50 55 60
MINUTES
ITERATIONS (MILLIONS)
Timing After Optimizations
TensorFlow - Baseline TensorFlow- MKLDNN Tensorflow+MKLDNN+EIGEN Libs
2.6x better training performance
18. Take-away
Use of optimization libraries to train machine
learning algorithms help to boost
performance and reduce training time
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
19. Bringing Motion to Production
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
20. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Understanding inference model
Training checkpoint
Inference Model
How can developer read?
21. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Unity® ML Agents
Bridging Gap between Research and Game integration
22. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Overview : Unity ML-Agents
Unity
Environment
Agent
Collect
Observations
Agent Action
Vector Action
Brain
Academy
Unity Inference Engine
DirectML CS CPU
23. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
• Goal: Puppy runs for bone
• Agent: Corgi
• About 50 float32 inputs
• Three hidden layers of 512 nodes
• About 20 float output
Puppo Motion Using Unity ML Agent
24. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Analyzing inference performance à 1 Agent
No Meta command : 1.8 seconds/inference
Meta command : 0.8 seconds/inference
https://devblogs.microsoft.com/pix/download/
Execution time reduced by 2x with meta commands on kernel level
25. Microsoft® PIX Tool – Benefits of using Meta Commands
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
3.064msec
1.364msec
More the Agents à Better performance with Metacommands
26. Results
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
0.00
0.50
1.00
1.50
2.00
2.50
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 Agent 10 Agent 50 Agent
GAIN(%)
MSEC
SCALING WITH Multiple AGENTS
Computer Shader Metacommands Gain
Lower is better
Metacommands gives significant boost in performance by leveraging Intel® Graphics
driver optimizations
27. Intel® Graphics Performance Analyzer (GPA) DX12 Profiling
Preview
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
DX12 DirectML profiling in Intel® GPA
28. Summary
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
• Tensorflow with Intel® MKLDNN build is now available on Windows
• Leveraging new instruction set on Intel® Xeon™ and Core™ Processors
• Performance boost on training as Reinforcement learning use cases are
CPU favorable
• Using optimized pre-post libraries gives E2E performance boost
• DirectML from Microsoft leverages metacommands which gives good boost
in performance for game + deep learning infused workloads