SlideShare a Scribd company logo
1 of 27
Download to read offline
USING SIGOPT TO TUNE DEEP LEARNING
MODELS WITH NERVANA CLOUD
Scott Clark
Co-founder and CEO of SigOpt
scott@sigopt.com @DrScottClark
TRIAL AND ERROR WASTES EXPERT TIME
● Deep Learning is extremely powerful
● Tuning Deep Learning systems is extremely non-intuitive
UNRESOLVED PROBLEM IN ML
https://www.quora.com/What-is-the-most-important-unresolved-problem-in-machine-learning-3
What is the most important unresolved problem in machine learning?
“...we still don't really know why some configurations of deep neural networks work
in some case and not others, let alone having a more or less automatic approach
to determining the architectures and the hyperparameters.”
Xavier Amatriain, VP Engineering at Quora
(former Director of Research at Netflix)
TUNING DEEP LEARNING MODELS
Big Data
Deep Learning
System
With tunable parameters
Expertise
TUNING DEEP LEARNING MODELS
Big Data
Metics
Optimally suggests
new parameters
Objective
New parameters
Expertise
Deep Learning
System
With tunable parameters
TUNING DEEP LEARNING MODELS
Big Data
Metics
Optimally suggests
new parameters
Objective
New parameters
Better
Results
Expertise
Deep Learning
System
With tunable parameters
COMMON APPROACH
Random Search for Hyper-Parameter Optimization, James Bergstra et al., 2012
1. Random search or grid search
2. Expert defined grid search near “good” points
3. Refine domain and repeat steps - “grad student descent”
COMMON APPROACH
● Expert intensive
● Computationally intensive
● Finds potentially local optima
● Does not fully exploit useful information
Random Search for Hyper-Parameter Optimization, James Bergstra et al., 2012
1. Random search or grid search
2. Expert defined grid search near “good” points
3. Refine domain and repeat steps - “grad student descent”
… the challenge of how to collect information as efficiently
as possible, primarily for settings where collecting information
is time consuming and expensive.
Prof. Warren Powell - Princeton
What is the most efficient way to collect information?
Prof. Peter Frazier - Cornell
How do we make the most money, as fast as possible?
Me - @DrScottClark
OPTIMAL LEARNING
● Optimize some Overall Evaluation Criterion (OEC)
○ Loss, Accuracy, Likelihood, Revenue
● Given tunable parameters
○ Hyperparameters, feature parameters
● In an efficient way
○ Sample function as few times as possible
○ Training on big data is expensive
BAYESIAN GLOBAL OPTIMIZATION
Details at https://sigopt.com/research
EXAMPLE: TUNING DNN CLASSIFIERS
CIFAR10 Dataset
● Photos of objects
● 10 classes
● Metric: Accuracy
○ [0.1, 1.0]
Learning Multiple Layers of Features from Tiny Images, Alex Krizhevsky, 2009.
● All convolutional neural network
● Multiple convolutional and dropout layers
● Hyperparameter optimization mixture of
domain expertise and grid search (brute force)
USE CASE: ALL CONVOLUTIONAL
http://arxiv.org/pdf/1412.6806.pdf
EXAMPLE: NCLOUD/NEON
● epochs: “number of epochs to run fit” - int [1,∞]
● learning rate: influence on current value of weights at each step - double (0, 1]
● momentum coefficient: “the coefficient of momentum” - double (0, 1]
● weight decay: parameter affecting how quickly weight decays - double (0, 1]
● depth: parameter affecting number of layers in net - int [1, 20(?)]
● gaussian scale: standard deviation of initialization normal dist. - double (0,∞]
● momentum step change: mul. amount to decrease momentum - double (0, 1]
● momentum step schedule start: epoch to start decreasing momentum - int [1,∞]
● momentum schedule width: epoch stride for decreasing momentum - int [1,∞]
Many tunable parameters...
...optimal values non-intuitive
COMPARATIVE PERFORMANCE
● Expert baseline: 0.8995
○ (using neon)
● SigOpt best: 0.9011
○ 1.6% reduction in
error rate
○ No expert time
wasted in tuning
USE CASE: DEEP RESIDUAL
http://arxiv.org/pdf/1512.03385v1.pdf
● Explicitly reformulate the layers as learning residual functions with
reference to the layer inputs, instead of learning unreferenced functions
● Variable depth
● Hyperparameter optimization mixture of domain expertise and grid
search (brute force)
COMPARATIVE PERFORMANCE
Standard Method
● Expert baseline: 0.9339
○ (from paper)
● SigOpt best: 0.9343
○ Found after 17 trials
○ 0.61% reduction in
error rate
○ No expert time
wasted in tuning
Questions?
scott@sigopt.com
@DrScottClark
https://sigopt.com
@SigOpt
TRY OUT SIGOPT FOR FREE
https://sigopt.com/get_started
● Quick example and intro to SigOpt
● No signup required
● Visual and code examples
https://sigopt.com/text-classifier
● Jupyter Notebook
● Use SigOpt to tune feature and model parameters
● Detailed walkthrough with code
MORE EXAMPLES
https://github.com/sigopt/sigopt-examples
Examples of using SigOpt in a variety of languages and contexts.
Tuning Machine Learning Models (with code)
A comparison of different hyperparameter optimization methods.
Using Model Tuning to Beat Vegas (with code)
Using SigOpt to tune a model for predicting basketball scores.
Learn more about the technology behind SigOpt at
https://sigopt.com/research
HOW DOES IT WORK?
1. User reports data
2. SigOpt builds statistical model
(Gaussian Process)
3. SigOpt finds the points of
highest Expected Improvement
4. SigOpt suggests best
parameters to test next
5. User tests those parameters
and reports results to SigOpt
6. Repeat
HOW DOES IT WORK?
1. User reports data
2. SigOpt builds statistical model
(Gaussian Process)
3. SigOpt finds the points of
highest Expected Improvement
4. SigOpt suggests best
parameters to test next
5. User tests those parameters
and reports results to SigOpt
6. Repeat
HOW DOES IT WORK?
1. User reports data
2. SigOpt builds statistical model
(Gaussian Process)
3. SigOpt finds the points of
highest Expected Improvement
4. SigOpt suggests best
parameters to test next
5. User tests those parameters
and reports results to SigOpt
6. Repeat
HOW DOES IT WORK?
1. User reports data
2. SigOpt builds statistical model
(Gaussian Process)
3. SigOpt finds the points of
highest Expected Improvement
4. SigOpt suggests best
parameters to test next
5. User tests those parameters
and reports results to SigOpt
6. Repeat
HOW DOES IT WORK?
1. User reports data
2. SigOpt builds statistical model
(Gaussian Process)
3. SigOpt finds the points of
highest Expected Improvement
4. SigOpt suggests best
parameters to test next
5. User tests those parameters
and reports results to SigOpt
6. Repeat
HOW DOES IT WORK?
1. User reports data
2. SigOpt builds statistical model
(Gaussian Process)
3. SigOpt finds the points of
highest Expected Improvement
4. SigOpt suggests best
parameters to test next
5. User tests those parameters
and reports results to SigOpt
6. Repeat
GPs: FUNCTIONAL VIEW
overfit good fit underfit
GPs: FITTING THE GP

More Related Content

What's hot

Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearnPratap Dangeti
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringHJ van Veen
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...MLconf
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsMark Peng
 
Jay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIJay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIAI Frontiers
 
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI AI Frontiers
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringSri Ambati
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingTed Xiao
 
Introduction of Feature Hashing
Introduction of Feature HashingIntroduction of Feature Hashing
Introduction of Feature HashingWush Wu
 
Semi-supervised learning with GANs
Semi-supervised learning with GANsSemi-supervised learning with GANs
Semi-supervised learning with GANsterek47
 
Pelee: a real time object detection system on mobile devices Paper Review
Pelee: a real time object detection system on mobile devices Paper ReviewPelee: a real time object detection system on mobile devices Paper Review
Pelee: a real time object detection system on mobile devices Paper ReviewLEE HOSEONG
 
Why biased matrix factorization works well?
Why biased matrix factorization works well?Why biased matrix factorization works well?
Why biased matrix factorization works well?Joonyoung Yi
 
Model remodeling with modern deep learning frameworks
Model remodeling with modern deep learning frameworksModel remodeling with modern deep learning frameworks
Model remodeling with modern deep learning frameworksrosentep
 
A Neural Autoregressive Approach to Collaborative Filtering (CF-NADE) Slide
A Neural Autoregressive Approach to Collaborative Filtering (CF-NADE) Slide A Neural Autoregressive Approach to Collaborative Filtering (CF-NADE) Slide
A Neural Autoregressive Approach to Collaborative Filtering (CF-NADE) Slide Joonyoung Yi
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentationHJ van Veen
 
QCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for EveryoneQCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for EveryoneDhiana Deva
 
Kernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian ProcessesKernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian ProcessesSungjoon Choi
 
Introduction to Machine Learning in Python using Scikit-Learn
Introduction to Machine Learning in Python using Scikit-LearnIntroduction to Machine Learning in Python using Scikit-Learn
Introduction to Machine Learning in Python using Scikit-LearnAmol Agrawal
 

What's hot (19)

Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
C3 w1
C3 w1C3 w1
C3 w1
 
Jay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIJay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AI
 
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
 
Introduction of Feature Hashing
Introduction of Feature HashingIntroduction of Feature Hashing
Introduction of Feature Hashing
 
Semi-supervised learning with GANs
Semi-supervised learning with GANsSemi-supervised learning with GANs
Semi-supervised learning with GANs
 
Pelee: a real time object detection system on mobile devices Paper Review
Pelee: a real time object detection system on mobile devices Paper ReviewPelee: a real time object detection system on mobile devices Paper Review
Pelee: a real time object detection system on mobile devices Paper Review
 
Why biased matrix factorization works well?
Why biased matrix factorization works well?Why biased matrix factorization works well?
Why biased matrix factorization works well?
 
Model remodeling with modern deep learning frameworks
Model remodeling with modern deep learning frameworksModel remodeling with modern deep learning frameworks
Model remodeling with modern deep learning frameworks
 
A Neural Autoregressive Approach to Collaborative Filtering (CF-NADE) Slide
A Neural Autoregressive Approach to Collaborative Filtering (CF-NADE) Slide A Neural Autoregressive Approach to Collaborative Filtering (CF-NADE) Slide
A Neural Autoregressive Approach to Collaborative Filtering (CF-NADE) Slide
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
 
QCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for EveryoneQCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for Everyone
 
Kernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian ProcessesKernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian Processes
 
Introduction to Machine Learning in Python using Scikit-Learn
Introduction to Machine Learning in Python using Scikit-LearnIntroduction to Machine Learning in Python using Scikit-Learn
Introduction to Machine Learning in Python using Scikit-Learn
 

Similar to Using SigOpt to Tune Deep Learning Models with Nervana Cloud

Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsSigOpt
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016MLconf
 
Tuning 2.0: Advanced Optimization Techniques Webinar
Tuning 2.0: Advanced Optimization Techniques WebinarTuning 2.0: Advanced Optimization Techniques Webinar
Tuning 2.0: Advanced Optimization Techniques WebinarSigOpt
 
Tuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep LearningTuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep LearningSigOpt
 
Adopting software design practices for better machine learning
Adopting software design practices for better machine learningAdopting software design practices for better machine learning
Adopting software design practices for better machine learningMLconf
 
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...StormForge .io
 
Using Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesUsing Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesSigOpt
 
SigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt
 
Elasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ SignalElasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ SignalJoachim Draeger
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningDatabricks
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeIdo Shilon
 
SigOpt at GTC - Reducing operational barriers to optimization
SigOpt at GTC - Reducing operational barriers to optimizationSigOpt at GTC - Reducing operational barriers to optimization
SigOpt at GTC - Reducing operational barriers to optimizationSigOpt
 
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
 Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D... Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...Databricks
 
Advanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise WebinarAdvanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise WebinarSigOpt
 
Model selection and tuning at scale
Model selection and tuning at scaleModel selection and tuning at scale
Model selection and tuning at scaleOwen Zhang
 
Hyperparameter optimization landscape Berlin ML Group meetup 8/2019
Hyperparameter optimization landscape Berlin ML Group meetup 8/2019Hyperparameter optimization landscape Berlin ML Group meetup 8/2019
Hyperparameter optimization landscape Berlin ML Group meetup 8/2019Jakub Czakon
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comSimon Hughes
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning InfrastructureSigOpt
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for BeginnersSanghamitra Deb
 

Similar to Using SigOpt to Tune Deep Learning Models with Nervana Cloud (20)

Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
 
Tuning 2.0: Advanced Optimization Techniques Webinar
Tuning 2.0: Advanced Optimization Techniques WebinarTuning 2.0: Advanced Optimization Techniques Webinar
Tuning 2.0: Advanced Optimization Techniques Webinar
 
Tuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep LearningTuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep Learning
 
Adopting software design practices for better machine learning
Adopting software design practices for better machine learningAdopting software design practices for better machine learning
Adopting software design practices for better machine learning
 
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
 
Using Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesUsing Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning Pipelines
 
SigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt for Machine Learning and AI
SigOpt for Machine Learning and AI
 
Elasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ SignalElasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ Signal
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine Learning
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
 
SigOpt at GTC - Reducing operational barriers to optimization
SigOpt at GTC - Reducing operational barriers to optimizationSigOpt at GTC - Reducing operational barriers to optimization
SigOpt at GTC - Reducing operational barriers to optimization
 
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
 Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D... Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
 
Advanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise WebinarAdvanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise Webinar
 
Model selection and tuning at scale
Model selection and tuning at scaleModel selection and tuning at scale
Model selection and tuning at scale
 
Hyperparameter optimization landscape Berlin ML Group meetup 8/2019
Hyperparameter optimization landscape Berlin ML Group meetup 8/2019Hyperparameter optimization landscape Berlin ML Group meetup 8/2019
Hyperparameter optimization landscape Berlin ML Group meetup 8/2019
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.com
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning Infrastructure
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 

More from SigOpt

Optimizing BERT and Natural Language Models with SigOpt Experiment Management
Optimizing BERT and Natural Language Models with SigOpt Experiment ManagementOptimizing BERT and Natural Language Models with SigOpt Experiment Management
Optimizing BERT and Natural Language Models with SigOpt Experiment ManagementSigOpt
 
Experiment Management for the Enterprise
Experiment Management for the EnterpriseExperiment Management for the Enterprise
Experiment Management for the EnterpriseSigOpt
 
Efficient NLP by Distilling BERT and Multimetric Optimization
Efficient NLP by Distilling BERT and Multimetric OptimizationEfficient NLP by Distilling BERT and Multimetric Optimization
Efficient NLP by Distilling BERT and Multimetric OptimizationSigOpt
 
Detecting COVID-19 Cases with Deep Learning
Detecting COVID-19 Cases with Deep LearningDetecting COVID-19 Cases with Deep Learning
Detecting COVID-19 Cases with Deep LearningSigOpt
 
Metric Management: a SigOpt Applied Use Case
Metric Management: a SigOpt Applied Use CaseMetric Management: a SigOpt Applied Use Case
Metric Management: a SigOpt Applied Use CaseSigOpt
 
Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric Strategy
Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric StrategyTuning for Systematic Trading: Talk 3: Training, Tuning, and Metric Strategy
Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric StrategySigOpt
 
Tuning for Systematic Trading: Talk 1
Tuning for Systematic Trading: Talk 1Tuning for Systematic Trading: Talk 1
Tuning for Systematic Trading: Talk 1SigOpt
 
Tuning Data Augmentation to Boost Model Performance
Tuning Data Augmentation to Boost Model PerformanceTuning Data Augmentation to Boost Model Performance
Tuning Data Augmentation to Boost Model PerformanceSigOpt
 
Modeling at Scale: SigOpt at TWIMLcon 2019
Modeling at Scale: SigOpt at TWIMLcon 2019Modeling at Scale: SigOpt at TWIMLcon 2019
Modeling at Scale: SigOpt at TWIMLcon 2019SigOpt
 
SigOpt at Ai4 Finance—Modeling at Scale
SigOpt at Ai4 Finance—Modeling at Scale SigOpt at Ai4 Finance—Modeling at Scale
SigOpt at Ai4 Finance—Modeling at Scale SigOpt
 
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...SigOpt
 
SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...
SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...
SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...SigOpt
 
SigOpt at O'Reilly - Best Practices for Scaling Modeling Platforms
SigOpt at O'Reilly - Best Practices for Scaling Modeling PlatformsSigOpt at O'Reilly - Best Practices for Scaling Modeling Platforms
SigOpt at O'Reilly - Best Practices for Scaling Modeling PlatformsSigOpt
 
SigOpt at GTC - Tuning the Untunable
SigOpt at GTC - Tuning the UntunableSigOpt at GTC - Tuning the Untunable
SigOpt at GTC - Tuning the UntunableSigOpt
 
Lessons for an enterprise approach to modeling at scale
Lessons for an enterprise approach to modeling at scaleLessons for an enterprise approach to modeling at scale
Lessons for an enterprise approach to modeling at scaleSigOpt
 
Modeling at scale in systematic trading
Modeling at scale in systematic tradingModeling at scale in systematic trading
Modeling at scale in systematic tradingSigOpt
 
SigOpt at MLconf - Reducing Operational Barriers to Model Training
SigOpt at MLconf - Reducing Operational Barriers to Model TrainingSigOpt at MLconf - Reducing Operational Barriers to Model Training
SigOpt at MLconf - Reducing Operational Barriers to Model TrainingSigOpt
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning InfrastructureSigOpt
 
Tuning the Untunable - Insights on Deep Learning Optimization
Tuning the Untunable - Insights on Deep Learning OptimizationTuning the Untunable - Insights on Deep Learning Optimization
Tuning the Untunable - Insights on Deep Learning OptimizationSigOpt
 
Tips and techniques for hyperparameter optimization
Tips and techniques for hyperparameter optimizationTips and techniques for hyperparameter optimization
Tips and techniques for hyperparameter optimizationSigOpt
 

More from SigOpt (20)

Optimizing BERT and Natural Language Models with SigOpt Experiment Management
Optimizing BERT and Natural Language Models with SigOpt Experiment ManagementOptimizing BERT and Natural Language Models with SigOpt Experiment Management
Optimizing BERT and Natural Language Models with SigOpt Experiment Management
 
Experiment Management for the Enterprise
Experiment Management for the EnterpriseExperiment Management for the Enterprise
Experiment Management for the Enterprise
 
Efficient NLP by Distilling BERT and Multimetric Optimization
Efficient NLP by Distilling BERT and Multimetric OptimizationEfficient NLP by Distilling BERT and Multimetric Optimization
Efficient NLP by Distilling BERT and Multimetric Optimization
 
Detecting COVID-19 Cases with Deep Learning
Detecting COVID-19 Cases with Deep LearningDetecting COVID-19 Cases with Deep Learning
Detecting COVID-19 Cases with Deep Learning
 
Metric Management: a SigOpt Applied Use Case
Metric Management: a SigOpt Applied Use CaseMetric Management: a SigOpt Applied Use Case
Metric Management: a SigOpt Applied Use Case
 
Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric Strategy
Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric StrategyTuning for Systematic Trading: Talk 3: Training, Tuning, and Metric Strategy
Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric Strategy
 
Tuning for Systematic Trading: Talk 1
Tuning for Systematic Trading: Talk 1Tuning for Systematic Trading: Talk 1
Tuning for Systematic Trading: Talk 1
 
Tuning Data Augmentation to Boost Model Performance
Tuning Data Augmentation to Boost Model PerformanceTuning Data Augmentation to Boost Model Performance
Tuning Data Augmentation to Boost Model Performance
 
Modeling at Scale: SigOpt at TWIMLcon 2019
Modeling at Scale: SigOpt at TWIMLcon 2019Modeling at Scale: SigOpt at TWIMLcon 2019
Modeling at Scale: SigOpt at TWIMLcon 2019
 
SigOpt at Ai4 Finance—Modeling at Scale
SigOpt at Ai4 Finance—Modeling at Scale SigOpt at Ai4 Finance—Modeling at Scale
SigOpt at Ai4 Finance—Modeling at Scale
 
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...
 
SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...
SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...
SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...
 
SigOpt at O'Reilly - Best Practices for Scaling Modeling Platforms
SigOpt at O'Reilly - Best Practices for Scaling Modeling PlatformsSigOpt at O'Reilly - Best Practices for Scaling Modeling Platforms
SigOpt at O'Reilly - Best Practices for Scaling Modeling Platforms
 
SigOpt at GTC - Tuning the Untunable
SigOpt at GTC - Tuning the UntunableSigOpt at GTC - Tuning the Untunable
SigOpt at GTC - Tuning the Untunable
 
Lessons for an enterprise approach to modeling at scale
Lessons for an enterprise approach to modeling at scaleLessons for an enterprise approach to modeling at scale
Lessons for an enterprise approach to modeling at scale
 
Modeling at scale in systematic trading
Modeling at scale in systematic tradingModeling at scale in systematic trading
Modeling at scale in systematic trading
 
SigOpt at MLconf - Reducing Operational Barriers to Model Training
SigOpt at MLconf - Reducing Operational Barriers to Model TrainingSigOpt at MLconf - Reducing Operational Barriers to Model Training
SigOpt at MLconf - Reducing Operational Barriers to Model Training
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning Infrastructure
 
Tuning the Untunable - Insights on Deep Learning Optimization
Tuning the Untunable - Insights on Deep Learning OptimizationTuning the Untunable - Insights on Deep Learning Optimization
Tuning the Untunable - Insights on Deep Learning Optimization
 
Tips and techniques for hyperparameter optimization
Tips and techniques for hyperparameter optimizationTips and techniques for hyperparameter optimization
Tips and techniques for hyperparameter optimization
 

Recently uploaded

Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxPurva Nikam
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 

Recently uploaded (20)

Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptx
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 

Using SigOpt to Tune Deep Learning Models with Nervana Cloud

  • 1. USING SIGOPT TO TUNE DEEP LEARNING MODELS WITH NERVANA CLOUD Scott Clark Co-founder and CEO of SigOpt scott@sigopt.com @DrScottClark
  • 2. TRIAL AND ERROR WASTES EXPERT TIME ● Deep Learning is extremely powerful ● Tuning Deep Learning systems is extremely non-intuitive
  • 3. UNRESOLVED PROBLEM IN ML https://www.quora.com/What-is-the-most-important-unresolved-problem-in-machine-learning-3 What is the most important unresolved problem in machine learning? “...we still don't really know why some configurations of deep neural networks work in some case and not others, let alone having a more or less automatic approach to determining the architectures and the hyperparameters.” Xavier Amatriain, VP Engineering at Quora (former Director of Research at Netflix)
  • 4. TUNING DEEP LEARNING MODELS Big Data Deep Learning System With tunable parameters Expertise
  • 5. TUNING DEEP LEARNING MODELS Big Data Metics Optimally suggests new parameters Objective New parameters Expertise Deep Learning System With tunable parameters
  • 6. TUNING DEEP LEARNING MODELS Big Data Metics Optimally suggests new parameters Objective New parameters Better Results Expertise Deep Learning System With tunable parameters
  • 7. COMMON APPROACH Random Search for Hyper-Parameter Optimization, James Bergstra et al., 2012 1. Random search or grid search 2. Expert defined grid search near “good” points 3. Refine domain and repeat steps - “grad student descent”
  • 8. COMMON APPROACH ● Expert intensive ● Computationally intensive ● Finds potentially local optima ● Does not fully exploit useful information Random Search for Hyper-Parameter Optimization, James Bergstra et al., 2012 1. Random search or grid search 2. Expert defined grid search near “good” points 3. Refine domain and repeat steps - “grad student descent”
  • 9. … the challenge of how to collect information as efficiently as possible, primarily for settings where collecting information is time consuming and expensive. Prof. Warren Powell - Princeton What is the most efficient way to collect information? Prof. Peter Frazier - Cornell How do we make the most money, as fast as possible? Me - @DrScottClark OPTIMAL LEARNING
  • 10. ● Optimize some Overall Evaluation Criterion (OEC) ○ Loss, Accuracy, Likelihood, Revenue ● Given tunable parameters ○ Hyperparameters, feature parameters ● In an efficient way ○ Sample function as few times as possible ○ Training on big data is expensive BAYESIAN GLOBAL OPTIMIZATION Details at https://sigopt.com/research
  • 11. EXAMPLE: TUNING DNN CLASSIFIERS CIFAR10 Dataset ● Photos of objects ● 10 classes ● Metric: Accuracy ○ [0.1, 1.0] Learning Multiple Layers of Features from Tiny Images, Alex Krizhevsky, 2009.
  • 12. ● All convolutional neural network ● Multiple convolutional and dropout layers ● Hyperparameter optimization mixture of domain expertise and grid search (brute force) USE CASE: ALL CONVOLUTIONAL http://arxiv.org/pdf/1412.6806.pdf
  • 13. EXAMPLE: NCLOUD/NEON ● epochs: “number of epochs to run fit” - int [1,∞] ● learning rate: influence on current value of weights at each step - double (0, 1] ● momentum coefficient: “the coefficient of momentum” - double (0, 1] ● weight decay: parameter affecting how quickly weight decays - double (0, 1] ● depth: parameter affecting number of layers in net - int [1, 20(?)] ● gaussian scale: standard deviation of initialization normal dist. - double (0,∞] ● momentum step change: mul. amount to decrease momentum - double (0, 1] ● momentum step schedule start: epoch to start decreasing momentum - int [1,∞] ● momentum schedule width: epoch stride for decreasing momentum - int [1,∞] Many tunable parameters... ...optimal values non-intuitive
  • 14. COMPARATIVE PERFORMANCE ● Expert baseline: 0.8995 ○ (using neon) ● SigOpt best: 0.9011 ○ 1.6% reduction in error rate ○ No expert time wasted in tuning
  • 15. USE CASE: DEEP RESIDUAL http://arxiv.org/pdf/1512.03385v1.pdf ● Explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions ● Variable depth ● Hyperparameter optimization mixture of domain expertise and grid search (brute force)
  • 16. COMPARATIVE PERFORMANCE Standard Method ● Expert baseline: 0.9339 ○ (from paper) ● SigOpt best: 0.9343 ○ Found after 17 trials ○ 0.61% reduction in error rate ○ No expert time wasted in tuning
  • 18. TRY OUT SIGOPT FOR FREE https://sigopt.com/get_started ● Quick example and intro to SigOpt ● No signup required ● Visual and code examples https://sigopt.com/text-classifier ● Jupyter Notebook ● Use SigOpt to tune feature and model parameters ● Detailed walkthrough with code
  • 19. MORE EXAMPLES https://github.com/sigopt/sigopt-examples Examples of using SigOpt in a variety of languages and contexts. Tuning Machine Learning Models (with code) A comparison of different hyperparameter optimization methods. Using Model Tuning to Beat Vegas (with code) Using SigOpt to tune a model for predicting basketball scores. Learn more about the technology behind SigOpt at https://sigopt.com/research
  • 20. HOW DOES IT WORK? 1. User reports data 2. SigOpt builds statistical model (Gaussian Process) 3. SigOpt finds the points of highest Expected Improvement 4. SigOpt suggests best parameters to test next 5. User tests those parameters and reports results to SigOpt 6. Repeat
  • 21. HOW DOES IT WORK? 1. User reports data 2. SigOpt builds statistical model (Gaussian Process) 3. SigOpt finds the points of highest Expected Improvement 4. SigOpt suggests best parameters to test next 5. User tests those parameters and reports results to SigOpt 6. Repeat
  • 22. HOW DOES IT WORK? 1. User reports data 2. SigOpt builds statistical model (Gaussian Process) 3. SigOpt finds the points of highest Expected Improvement 4. SigOpt suggests best parameters to test next 5. User tests those parameters and reports results to SigOpt 6. Repeat
  • 23. HOW DOES IT WORK? 1. User reports data 2. SigOpt builds statistical model (Gaussian Process) 3. SigOpt finds the points of highest Expected Improvement 4. SigOpt suggests best parameters to test next 5. User tests those parameters and reports results to SigOpt 6. Repeat
  • 24. HOW DOES IT WORK? 1. User reports data 2. SigOpt builds statistical model (Gaussian Process) 3. SigOpt finds the points of highest Expected Improvement 4. SigOpt suggests best parameters to test next 5. User tests those parameters and reports results to SigOpt 6. Repeat
  • 25. HOW DOES IT WORK? 1. User reports data 2. SigOpt builds statistical model (Gaussian Process) 3. SigOpt finds the points of highest Expected Improvement 4. SigOpt suggests best parameters to test next 5. User tests those parameters and reports results to SigOpt 6. Repeat
  • 27. overfit good fit underfit GPs: FITTING THE GP