SlideShare a Scribd company logo
1 of 58
Download to read offline
Winning Kaggle
Competitions
Hendrik Jacob van Veen - Nubank Brasil
About Kaggle
Biggest platform for competitive data science in the
world
Currently 500k + competitors
Great platform to learn about the latest techniques and
avoiding overfit
Great platform to share and meet up with other data
freaks
Approach
Get a good score as fast as possible
Using versatile libraries
Model ensembling
Get a good score as fast as
possible
Get the raw data into a universal format like SVMlight or
Numpy arrays.
Failing fast and failing often / Agile sprint / Iteration
Sub-linear debugging: 

“output enough intermediate information as a
calculation is progressing to determine before it
finishes whether you've injected a major defect or
a significant improvement.” Paul Mineiro
Using versatile libraries
Scikit-learn
Vowpal Wabbit
XGBoost
Keras
Other tools get Scikit-learn API wrappers
Model Ensembling
Voting
Averaging
Bagging
Boosting
Binning
Blending
Stacking
General Strategy
Try to create “machine learning”-learning algorithms with optimized
pipelines that are:
Data agnostic (Sparse, dense, missing values, larger than memory)
Problem agnostic (Classification, regression, clustering)
Solution agnostic (Production-ready, PoC, latency)
Automated (Turn on and go to bed)
Memory-friendly (Don’t want to pay for AWS)
Robust (Good generalization, concept drift, consistent)
First Overview I
Classification? Regression?
Evaluation Metric
Description
Benchmark code
“Predict human activities based on their smartphone usage. Predict
if a user is sitting, walking etc.” - Smartphone User Activity Prediction
Given the HTML of ~337k websites served to users of
StumbleUpon, identify the paid content disguised as real content. -
Dato Truly Native?
First Overview II
Counts
Images
Text
Categorical
Floats
Dates
0.28309984, -0.025501173, … , -0.11118051, 0.37447712
<!Doctype html><html><head><meta charset=utf-8> … </html>
First Overview III
Data size?
Dimensionality?
Number of train samples & test samples?
Online or offline learning?
Linear problem or non-linear problem?
Previous competitions that were similar?
Branch
If: Issues with the data -> Tedious clean-up
Join JSON tables, Impute missing values, Curse Kaggle
and join another competition
Else: Get data into Numpy arrays, we want:
X_train, y, X_test
Local Evaluation
Set up local evaluation according to competition metric
Create a simple benchmark (useful for exploration and
discarding models)
5-fold stratified cross-validation usually does the trick
Very important step for fast iteration and saving submissions,
yet easy to be lazy and use leaderboard.
Area Under the Curve, Multi-Class Classification
Accuracy
Data Exploration
Min, Max, Mean, Percentiles, Std, Plotting
Can detect: leakage, golden features, feature
engineering tricks, data health issues.
Caveat: At least one top 50 Kaggler used to not look at
the data at all:
“It’s called machine learning for a reason.”
Feature Engineering I
Log-transform count features, tf-idf transform text features
Unsupervised transforms / dimensionality reduction
Manual inspection of data
Dates -> day of month, is_holiday, season, etc.
Create histograms and cluster similar features
Using VW-varinfo or XGBfi to check 2-3-way interactions
Row stats: mean, max, min, number of NA’s.
Feature Engineering II
Bin numerical features to categorical features
Bayesian encoding of categorical features to likelihood
Genetic programming
Random-swap feature elimination
Time binning (customer bought in last week, last month, last year …)
Expand data (Coates & Ng, Random Bit Regression)
Automate all of this
Feature Engineering III
Categorical features need some special treatment
Onehot-encode for linear models (sparsity)
Colhot-encode for tree-based models (density)
Counthot-encode for large cardinality features
Likelihood-encode for experts…
Algorithms I
A bias-variance trade-off between simple and complex models
Algorithms II
There is No Free Lunch in statistical inference
We show that all algorithms that search for an extremum of a cost
function perform exactly the same, when averaged over all possible
cost functions. – Wolpert & Macready, No free lunch theorems for search
Practical Solution for low-bias low-variance models:
Use prior knowledge / experience to limit search (Let algo’s play to their
known strengths for particular problems)
Remove or avoid their weaknesses
Combine/Bag their predictions
Random Forests I
A Random Forest is an ensemble of decision trees.
"Random forests are a combination of tree
predictors such that each tree depends on the
values of a random vector sampled
independently and with the same distribution for
all trees in the forest. […] More robust to noise -
“Random Forest" Breiman
Random Forests II
Strengths
Fast
Easy to tune
Easy to inspect
Easy to explore data with
Good Benchmark
Very wide applicability
Can introduce randomness / Diversity
Weaknesses
Memory Hungry
Popular
Slower for test time
GBM I
A GBM trains weak models on samples that previous
models got wrong
"A method is described for converting a weak
learning algorithm [the learner can produce an
hypothesis that performs only slightly better
than random guessing] into one that achieves
arbitrarily high accuracy." - “The Strength of Weak
Learnability" Schapire
GBM II
Strengths
Can achieve very good results
Can model complex problems
Works on wide variety of
problems
Use custom loss functions
No need to scale data
Weaknesses
Slower to train
Easier to overfit than RF
Weak learner assumption is
broken along the way
Tricky to tune
Popular
SVM I
Classification and Regression using Support Vectors
"Nothing is more practical than a good theory."
‘The Nature of Statistical Learning Theory’, Vapnik
SVM II
Strengths
Strong theoretical guarantees
Tuning regularization parameter
helps prevent overfit
Kernel Trick: Use custom kernels,
turn linear kernel into non-linear
kernel
Achieve state-of-the-art on
select problems
Weaknesses
Slower to train
Memory heavy
Requires a tedious grid-search
for best performance
Will probably time-out on large
datasets
Nearest Neighbours I
Look at the distance to other samples
"The nearest neighbor decision rule assigns to
an unclassified sample point the classification of
the nearest of a set of previously classified
points." ‘Nearest neighbor pattern classification’, Cover
et. al.
Nearest Neighbours II
Strengths
Simple
Impopular
Non-linear
Easy to tune
Detect near-duplicates
Weaknesses
Simple
Does not work well on
average
Depending on data size:
Slow
Perceptron I
Update weights when wrong prediction, else do nothing
The embryo of an electronic computer that [the
Navy] expects will be able to walk, talk, see,
write, reproduce itself and be conscious of its
existence. ‘New York Times’, Rosenblatt
Perceptron II
Strengths
Cool / Street Cred
Extremely Simple
Fast / Sparse updates
Online Learning
Works well with text
Weaknesses
Other linear algo’s usually
beat it
Does not work well on
average
No regularization
Neural Networks I
Inspired by biological systems (Connected neurons firing
when threshold is reached)
Because of the "all-or-none" character of nervous
activity, neural events and the relations among
them can be treated by means of propositional
logic. […] for any logical expression satisfying
certain conditions, one can find a net behaving in
the fashion it describes. ‘A Logical Calculus of the
Ideas Immanent in Nervous Activity’, McCulloch & Pitts
Neural Networks II
Strengths
The best for images
Can model any function
End-to-end Training
Amortizes feature
representation
Weaknesses
Can be difficult to set up
Not very interpretable
Requires specialized
hardware
Underfit / Overfit
Vowpal Wabbit I
Online learning while optimizing a loss function
We present a system and a set of techniques for
learning linear predictors with convex losses on
terascale datasets, with trillions of features,
billions of training examples and millions of
parameters in an hour using a cluster of 1000
machines. ‘A Reliable Effective Terascale Linear
Learning System’, Agarwal et al.
Vowpal Wabbit II
Strengths
Fixed memory constraint
Extremely fast
Feature expansion
Difficult to overfit
Versatile
Weaknesses
Different API
Manual feature engineering
Loses against boosting
Requires practice
Hashing can obscure
Others
Factorization Machines
PCA
t-SNE
SVD / LSA
Ridge Regression
GLMNet
Genetic Algorithms
Bayesian
Logistic Regression
Quantile Regression
AdaBoosting
SGD
Ensembles I
Combine models in a way that outperforms individual
models.
“That’s how almost all ML competitions are won” -
‘Dark Knowledge’ Hinton et al.
Ensembles reduce the chance of overfit.
Bagging / Averaging -> Lower variance, slightly lower bias
Blending / Stacking -> Remove biases of base models
Ensembles II
Practical tips:
Use diverse models
Use diverse feature sets
Use many models
Do not leak any information
Stacked Generalization I
Train one model on the predictions of another model
A scheme for minimizing the generalization error rate of
one or more generalizers. Stacked generalization works
by deducing the biases of the generalizer(s) with
respect to a provided learning set. This deduction
proceeds by generalizing in a second space whose
inputs are (for example) the guesses of the original
generalizers when taught with part of the learning set
and trying to guess the rest of it, and whose output is
(for example) the correct guess. - ‘Stacked Generalization’,
Wolpert
Stacked Generalization II
Train one model on the predictions of another model
Stacked Generalization III
Using weak base models vs. using strong base models
Using average of out-of-fold predictors vs. One model
for testing
One can also stack features when these are not
available in test set.
Can share train set predictions based on different folds
StackNet
We need to go deeper:
Splitting node: x1 > 5? 1 else 0
Decision tree: x1 > 5 AND x2 < 12?
Random forest: avg ( x1 > 5 AND x2 < 12?, x3 > 2? )
Stacking-1: avg ( RF1_pred > 0.9?, RF2_pred > 0.92? )
Stacking-2: avg ( S1_pred > 0.93?, S2_pred < 0.77? )
Stacking-3: avg ( SS1_pred > 0.98?, SS2_pred > 0.97? )
Bagging Predictors I
Averaging submissions to reduce variance
"Bagging predictors is a method for generating
multiple versions of a predictor and using these
to get an aggregated predictor." - "Bagging
Predictors". Breiman
Bagging Predictors II
Train models with:
Different data sets
Different algorithms
Different features subsets
Different sample subsets
Then average / vote aggregate these
Bagging Predictors III
One can average with:
Plain average
Geometric mean
Rank mean
Harmonic mean
KazAnova’s brute-force weighted averaging
Caruana’s forward greedy model selection
Brute-Force Weighted
Average
Create out-of-fold predictions for train set for n models
Pick a stepsize s, and set n weights
Try every possible weight with stepsize s
Look which set of n weights improves the train set score
the most
Can do in cross-validation-style manner for extra
robustness.
Greedy forward model
selection (Caruana)
Create out-of-fold predictions for the train set
Start with a base ensemble of 3 best models
Loop: Add every model from library to ensemble and pick 4
models that give best train score performance
Using place-back of models, models can be picked multiple times
(weighing them)
Using random subset selection from library in loop avoids
overfitting to single best model.
Automated Stack ’n Bag I
Automatically train 1000s of models and 100s of
stackers, then average everything.
“Hodor!” - Hodor
Automated Stack ’n Bag II
Generalization
Train random models, random parameters, random data set transforms,
random feature sets, random sample sets.
Stacking
Train random models, random parameters, random base models, with and
without original features, random feature sets, random sample sets.
Bagging
Average random selection of Stackers and Generalizers. Either pick best
model, or create more random bags and keep averaging, ‘till no increase.
Automated Stack ’n Bag III
Strengths
Wins Kaggle competitions
Best generalization
No tuning
No selection
No human bias
Weaknesses
Extremely slow
Redundant
Inelegant
Very complex
Bad for environment
Leakage I
“The introduction of information about the data
mining target, which should not be legitimately
available to mine from.” - ‘Leakage in Data Mining:
Formulation, Detection, and Avoidance’, Kaufman et.
al.
“one of the top ten data mining mistakes” -
‘Handbook of Statistical Analysis and Data Mining
Applications.’, Nisbet et. al.
Leakage II
Exploiting Leakage:
In predictive modeling competitions: Allowed and
beneficial for results
In Science and Business: A very big NO NO!
In both: Accidental (Complex algo’s find leakage
automatically, or KNN finds duplicates)
Leakage III
Dato Truly Native?
This task suffered from data collection leakage:
Dates and certain keywords (Trump) were indicative, and generalized
to private LB (but not generalize to future data).
Smartphone activity prediction
This task had not enough randomization (order of samples in train and
test set was indicative)
Could manually change predictions, because classes were clustered.
Winning Dato Truly Native? I
Invented StackNet
“Data science is a team sport”: it helps to join up with #1 Kaggler :)
We used basic NLP: Cleaning, lowercasing, stemming, ngrams, chargrams, tf-
idf, SVD.
Trained a lot of different models on different datasets.
Started ensembling in the last 2 weeks.
Doing research and fun stuff, while waiting for models to complete.
XGBoost the big winner (somewhat rare to use boosting for sparse text)
Winning Dato Truly Native?
II
Winning Smartphone
Activity Prediction I
Prototyped Automated Stack ’n Bag (Kaggle Killer).
Let computer run for two days
Automatically inferred feature types
Did not look at the data
Beat very stiff competition
Winning Smartphone
Activity Prediction I
General strategy
Being #1 during competition sucks.
Team up
Go crazy with ensembling
Do not worry so much about replication that it freezes progress
Check previous competitions
Be patient and persistent (dont run out of steam)
Automate a lot
Stay up-to-date with State-of-the-art algorithms and tools
Complexity vs. Practicality I
Most Kaggle winner models are useless for production. It’s about
hyper-optimization. Top 10% probably good enough for business.
But what if we could use some Top 1% principles from Kaggle
models for business?
1-5% increase in accuracy can matter a lot!
Batch jobs allow us to overcome latency constraints
Ensembles are technically brittle, but give good generalization.
Leave no model behind!
Complexity vs. Practicality II
Future
Use re-usable holdout set
Use contextual bandits for training the ensemble
Find more models to add to library
Ensemble pruning / compression
Interpretable black box models

More Related Content

What's hot

Winning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangWinning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangVivian S. Zhang
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringSri Ambati
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitionsOwen Zhang
 
Kaggle and data science
Kaggle and data scienceKaggle and data science
Kaggle and data scienceAkira Shibata
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsDarius Barušauskas
 
Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective Saurabh Kaushik
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Krishnaram Kenthapadi
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models BootcampData Science Dojo
 
GenAI in Research with Responsible AI
GenAI in Researchwith Responsible AIGenAI in Researchwith Responsible AI
GenAI in Research with Responsible AILiming Zhu
 
Explainability and bias in AI
Explainability and bias in AIExplainability and bias in AI
Explainability and bias in AIBill Liu
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Mihai Criveti
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1DianaGray10
 
A Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxSaiPragnaKancheti
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfPo-Chuan Chen
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningLior Rokach
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationDataWorks Summit
 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? HackerEarth
 
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...Po-Chuan Chen
 

What's hot (20)

Winning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangWinning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen Zhang
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitions
 
Kaggle and data science
Kaggle and data scienceKaggle and data science
Kaggle and data science
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitions
 
Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
 
GenAI in Research with Responsible AI
GenAI in Researchwith Responsible AIGenAI in Researchwith Responsible AI
GenAI in Research with Responsible AI
 
Explainability and bias in AI
Explainability and bias in AIExplainability and bias in AI
Explainability and bias in AI
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
 
A Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptx
 
Xgboost
XgboostXgboost
Xgboost
 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdfRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to Implementation
 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ?
 
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
 
Deep learning
Deep learningDeep learning
Deep learning
 

Similar to Kaggle presentation

Smartphone Activity Prediction
Smartphone Activity PredictionSmartphone Activity Prediction
Smartphone Activity PredictionTriskelion_Kaggle
 
MEME – An Integrated Tool For Advanced Computational Experiments
MEME – An Integrated Tool For Advanced Computational ExperimentsMEME – An Integrated Tool For Advanced Computational Experiments
MEME – An Integrated Tool For Advanced Computational ExperimentsGIScRG
 
Brief Tour of Machine Learning
Brief Tour of Machine LearningBrief Tour of Machine Learning
Brief Tour of Machine Learningbutest
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningAkshay Kanchan
 
in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptxMonicaTimber
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
 Unexpected Challenges in Large Scale Machine Learning by Charles Parker Unexpected Challenges in Large Scale Machine Learning by Charles Parker
Unexpected Challenges in Large Scale Machine Learning by Charles ParkerBigMine
 
Introduction
IntroductionIntroduction
Introductionbutest
 
Introduction
IntroductionIntroduction
Introductionbutest
 
Introduction
IntroductionIntroduction
Introductionbutest
 
Mis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsMis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsVidya sagar Sharma
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.pptyang947066
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Sri Ambati
 
ML crash course
ML crash courseML crash course
ML crash coursemikaelhuss
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchjim
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Researchbutest
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchkevinlan
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for EveryoneAly Abdelkareem
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningKai Koenig
 

Similar to Kaggle presentation (20)

Smartphone Activity Prediction
Smartphone Activity PredictionSmartphone Activity Prediction
Smartphone Activity Prediction
 
MEME – An Integrated Tool For Advanced Computational Experiments
MEME – An Integrated Tool For Advanced Computational ExperimentsMEME – An Integrated Tool For Advanced Computational Experiments
MEME – An Integrated Tool For Advanced Computational Experiments
 
Brief Tour of Machine Learning
Brief Tour of Machine LearningBrief Tour of Machine Learning
Brief Tour of Machine Learning
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptx
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
 Unexpected Challenges in Large Scale Machine Learning by Charles Parker Unexpected Challenges in Large Scale Machine Learning by Charles Parker
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
 
Introduction
IntroductionIntroduction
Introduction
 
Introduction
IntroductionIntroduction
Introduction
 
Introduction
IntroductionIntroduction
Introduction
 
Mis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsMis End Term Exam Theory Concepts
Mis End Term Exam Theory Concepts
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
 
ML crash course
ML crash courseML crash course
ML crash course
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Research
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdfTop Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Kaggle presentation

  • 1. Winning Kaggle Competitions Hendrik Jacob van Veen - Nubank Brasil
  • 2. About Kaggle Biggest platform for competitive data science in the world Currently 500k + competitors Great platform to learn about the latest techniques and avoiding overfit Great platform to share and meet up with other data freaks
  • 3. Approach Get a good score as fast as possible Using versatile libraries Model ensembling
  • 4. Get a good score as fast as possible Get the raw data into a universal format like SVMlight or Numpy arrays. Failing fast and failing often / Agile sprint / Iteration Sub-linear debugging: 
 “output enough intermediate information as a calculation is progressing to determine before it finishes whether you've injected a major defect or a significant improvement.” Paul Mineiro
  • 5. Using versatile libraries Scikit-learn Vowpal Wabbit XGBoost Keras Other tools get Scikit-learn API wrappers
  • 7. General Strategy Try to create “machine learning”-learning algorithms with optimized pipelines that are: Data agnostic (Sparse, dense, missing values, larger than memory) Problem agnostic (Classification, regression, clustering) Solution agnostic (Production-ready, PoC, latency) Automated (Turn on and go to bed) Memory-friendly (Don’t want to pay for AWS) Robust (Good generalization, concept drift, consistent)
  • 8. First Overview I Classification? Regression? Evaluation Metric Description Benchmark code “Predict human activities based on their smartphone usage. Predict if a user is sitting, walking etc.” - Smartphone User Activity Prediction Given the HTML of ~337k websites served to users of StumbleUpon, identify the paid content disguised as real content. - Dato Truly Native?
  • 9. First Overview II Counts Images Text Categorical Floats Dates 0.28309984, -0.025501173, … , -0.11118051, 0.37447712 <!Doctype html><html><head><meta charset=utf-8> … </html>
  • 10. First Overview III Data size? Dimensionality? Number of train samples & test samples? Online or offline learning? Linear problem or non-linear problem? Previous competitions that were similar?
  • 11. Branch If: Issues with the data -> Tedious clean-up Join JSON tables, Impute missing values, Curse Kaggle and join another competition Else: Get data into Numpy arrays, we want: X_train, y, X_test
  • 12. Local Evaluation Set up local evaluation according to competition metric Create a simple benchmark (useful for exploration and discarding models) 5-fold stratified cross-validation usually does the trick Very important step for fast iteration and saving submissions, yet easy to be lazy and use leaderboard. Area Under the Curve, Multi-Class Classification Accuracy
  • 13. Data Exploration Min, Max, Mean, Percentiles, Std, Plotting Can detect: leakage, golden features, feature engineering tricks, data health issues. Caveat: At least one top 50 Kaggler used to not look at the data at all: “It’s called machine learning for a reason.”
  • 14. Feature Engineering I Log-transform count features, tf-idf transform text features Unsupervised transforms / dimensionality reduction Manual inspection of data Dates -> day of month, is_holiday, season, etc. Create histograms and cluster similar features Using VW-varinfo or XGBfi to check 2-3-way interactions Row stats: mean, max, min, number of NA’s.
  • 15. Feature Engineering II Bin numerical features to categorical features Bayesian encoding of categorical features to likelihood Genetic programming Random-swap feature elimination Time binning (customer bought in last week, last month, last year …) Expand data (Coates & Ng, Random Bit Regression) Automate all of this
  • 16. Feature Engineering III Categorical features need some special treatment Onehot-encode for linear models (sparsity) Colhot-encode for tree-based models (density) Counthot-encode for large cardinality features Likelihood-encode for experts…
  • 17. Algorithms I A bias-variance trade-off between simple and complex models
  • 18. Algorithms II There is No Free Lunch in statistical inference We show that all algorithms that search for an extremum of a cost function perform exactly the same, when averaged over all possible cost functions. – Wolpert & Macready, No free lunch theorems for search Practical Solution for low-bias low-variance models: Use prior knowledge / experience to limit search (Let algo’s play to their known strengths for particular problems) Remove or avoid their weaknesses Combine/Bag their predictions
  • 19. Random Forests I A Random Forest is an ensemble of decision trees. "Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. […] More robust to noise - “Random Forest" Breiman
  • 20. Random Forests II Strengths Fast Easy to tune Easy to inspect Easy to explore data with Good Benchmark Very wide applicability Can introduce randomness / Diversity Weaknesses Memory Hungry Popular Slower for test time
  • 21. GBM I A GBM trains weak models on samples that previous models got wrong "A method is described for converting a weak learning algorithm [the learner can produce an hypothesis that performs only slightly better than random guessing] into one that achieves arbitrarily high accuracy." - “The Strength of Weak Learnability" Schapire
  • 22. GBM II Strengths Can achieve very good results Can model complex problems Works on wide variety of problems Use custom loss functions No need to scale data Weaknesses Slower to train Easier to overfit than RF Weak learner assumption is broken along the way Tricky to tune Popular
  • 23. SVM I Classification and Regression using Support Vectors "Nothing is more practical than a good theory." ‘The Nature of Statistical Learning Theory’, Vapnik
  • 24. SVM II Strengths Strong theoretical guarantees Tuning regularization parameter helps prevent overfit Kernel Trick: Use custom kernels, turn linear kernel into non-linear kernel Achieve state-of-the-art on select problems Weaknesses Slower to train Memory heavy Requires a tedious grid-search for best performance Will probably time-out on large datasets
  • 25. Nearest Neighbours I Look at the distance to other samples "The nearest neighbor decision rule assigns to an unclassified sample point the classification of the nearest of a set of previously classified points." ‘Nearest neighbor pattern classification’, Cover et. al.
  • 26. Nearest Neighbours II Strengths Simple Impopular Non-linear Easy to tune Detect near-duplicates Weaknesses Simple Does not work well on average Depending on data size: Slow
  • 27. Perceptron I Update weights when wrong prediction, else do nothing The embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence. ‘New York Times’, Rosenblatt
  • 28. Perceptron II Strengths Cool / Street Cred Extremely Simple Fast / Sparse updates Online Learning Works well with text Weaknesses Other linear algo’s usually beat it Does not work well on average No regularization
  • 29. Neural Networks I Inspired by biological systems (Connected neurons firing when threshold is reached) Because of the "all-or-none" character of nervous activity, neural events and the relations among them can be treated by means of propositional logic. […] for any logical expression satisfying certain conditions, one can find a net behaving in the fashion it describes. ‘A Logical Calculus of the Ideas Immanent in Nervous Activity’, McCulloch & Pitts
  • 30. Neural Networks II Strengths The best for images Can model any function End-to-end Training Amortizes feature representation Weaknesses Can be difficult to set up Not very interpretable Requires specialized hardware Underfit / Overfit
  • 31. Vowpal Wabbit I Online learning while optimizing a loss function We present a system and a set of techniques for learning linear predictors with convex losses on terascale datasets, with trillions of features, billions of training examples and millions of parameters in an hour using a cluster of 1000 machines. ‘A Reliable Effective Terascale Linear Learning System’, Agarwal et al.
  • 32. Vowpal Wabbit II Strengths Fixed memory constraint Extremely fast Feature expansion Difficult to overfit Versatile Weaknesses Different API Manual feature engineering Loses against boosting Requires practice Hashing can obscure
  • 33. Others Factorization Machines PCA t-SNE SVD / LSA Ridge Regression GLMNet Genetic Algorithms Bayesian Logistic Regression Quantile Regression AdaBoosting SGD
  • 34. Ensembles I Combine models in a way that outperforms individual models. “That’s how almost all ML competitions are won” - ‘Dark Knowledge’ Hinton et al. Ensembles reduce the chance of overfit. Bagging / Averaging -> Lower variance, slightly lower bias Blending / Stacking -> Remove biases of base models
  • 35. Ensembles II Practical tips: Use diverse models Use diverse feature sets Use many models Do not leak any information
  • 36. Stacked Generalization I Train one model on the predictions of another model A scheme for minimizing the generalization error rate of one or more generalizers. Stacked generalization works by deducing the biases of the generalizer(s) with respect to a provided learning set. This deduction proceeds by generalizing in a second space whose inputs are (for example) the guesses of the original generalizers when taught with part of the learning set and trying to guess the rest of it, and whose output is (for example) the correct guess. - ‘Stacked Generalization’, Wolpert
  • 37. Stacked Generalization II Train one model on the predictions of another model
  • 38. Stacked Generalization III Using weak base models vs. using strong base models Using average of out-of-fold predictors vs. One model for testing One can also stack features when these are not available in test set. Can share train set predictions based on different folds
  • 39. StackNet We need to go deeper: Splitting node: x1 > 5? 1 else 0 Decision tree: x1 > 5 AND x2 < 12? Random forest: avg ( x1 > 5 AND x2 < 12?, x3 > 2? ) Stacking-1: avg ( RF1_pred > 0.9?, RF2_pred > 0.92? ) Stacking-2: avg ( S1_pred > 0.93?, S2_pred < 0.77? ) Stacking-3: avg ( SS1_pred > 0.98?, SS2_pred > 0.97? )
  • 40. Bagging Predictors I Averaging submissions to reduce variance "Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor." - "Bagging Predictors". Breiman
  • 41. Bagging Predictors II Train models with: Different data sets Different algorithms Different features subsets Different sample subsets Then average / vote aggregate these
  • 42. Bagging Predictors III One can average with: Plain average Geometric mean Rank mean Harmonic mean KazAnova’s brute-force weighted averaging Caruana’s forward greedy model selection
  • 43. Brute-Force Weighted Average Create out-of-fold predictions for train set for n models Pick a stepsize s, and set n weights Try every possible weight with stepsize s Look which set of n weights improves the train set score the most Can do in cross-validation-style manner for extra robustness.
  • 44. Greedy forward model selection (Caruana) Create out-of-fold predictions for the train set Start with a base ensemble of 3 best models Loop: Add every model from library to ensemble and pick 4 models that give best train score performance Using place-back of models, models can be picked multiple times (weighing them) Using random subset selection from library in loop avoids overfitting to single best model.
  • 45. Automated Stack ’n Bag I Automatically train 1000s of models and 100s of stackers, then average everything. “Hodor!” - Hodor
  • 46. Automated Stack ’n Bag II Generalization Train random models, random parameters, random data set transforms, random feature sets, random sample sets. Stacking Train random models, random parameters, random base models, with and without original features, random feature sets, random sample sets. Bagging Average random selection of Stackers and Generalizers. Either pick best model, or create more random bags and keep averaging, ‘till no increase.
  • 47. Automated Stack ’n Bag III Strengths Wins Kaggle competitions Best generalization No tuning No selection No human bias Weaknesses Extremely slow Redundant Inelegant Very complex Bad for environment
  • 48. Leakage I “The introduction of information about the data mining target, which should not be legitimately available to mine from.” - ‘Leakage in Data Mining: Formulation, Detection, and Avoidance’, Kaufman et. al. “one of the top ten data mining mistakes” - ‘Handbook of Statistical Analysis and Data Mining Applications.’, Nisbet et. al.
  • 49. Leakage II Exploiting Leakage: In predictive modeling competitions: Allowed and beneficial for results In Science and Business: A very big NO NO! In both: Accidental (Complex algo’s find leakage automatically, or KNN finds duplicates)
  • 50. Leakage III Dato Truly Native? This task suffered from data collection leakage: Dates and certain keywords (Trump) were indicative, and generalized to private LB (but not generalize to future data). Smartphone activity prediction This task had not enough randomization (order of samples in train and test set was indicative) Could manually change predictions, because classes were clustered.
  • 51. Winning Dato Truly Native? I Invented StackNet “Data science is a team sport”: it helps to join up with #1 Kaggler :) We used basic NLP: Cleaning, lowercasing, stemming, ngrams, chargrams, tf- idf, SVD. Trained a lot of different models on different datasets. Started ensembling in the last 2 weeks. Doing research and fun stuff, while waiting for models to complete. XGBoost the big winner (somewhat rare to use boosting for sparse text)
  • 52. Winning Dato Truly Native? II
  • 53. Winning Smartphone Activity Prediction I Prototyped Automated Stack ’n Bag (Kaggle Killer). Let computer run for two days Automatically inferred feature types Did not look at the data Beat very stiff competition
  • 55. General strategy Being #1 during competition sucks. Team up Go crazy with ensembling Do not worry so much about replication that it freezes progress Check previous competitions Be patient and persistent (dont run out of steam) Automate a lot Stay up-to-date with State-of-the-art algorithms and tools
  • 56. Complexity vs. Practicality I Most Kaggle winner models are useless for production. It’s about hyper-optimization. Top 10% probably good enough for business. But what if we could use some Top 1% principles from Kaggle models for business? 1-5% increase in accuracy can matter a lot! Batch jobs allow us to overcome latency constraints Ensembles are technically brittle, but give good generalization. Leave no model behind!
  • 58. Future Use re-usable holdout set Use contextual bandits for training the ensemble Find more models to add to library Ensemble pruning / compression Interpretable black box models