SlideShare a Scribd company logo
1 of 21
Download to read offline
EIGENAUTS
Bernard Ong (Lead)
Emma (Jielei) Zhu | Nanda Rajarathinam | Trinity (Miaozhi) Yu
NYC Data Science Academy Capstone Project
September 2016
CONSUMER
CREDIT DEFAULT
KAGGLE CHALLENGE
Improve on the state of the art in credit scoring by
predicting the probability that somebody will experience
financial distress in the next two years.
Credit scoring algorithms, which make a guess at the
probability of default, are the method banks use to
determine whether or not a loan should be granted. This
challenge seeks to improve on the state of the art in credit
scoring, by predicting the probability that somebody will
experience financial distress in the next two years.
The goal of this challenge is to build a model that borrowers
can use to help make the best financial decisions.
Features Features
Revolving Utilization of Unsecured Lines Debt Ratio
Age Monthly Income
Number of Time 30-59 Days Past Due Number of Open Credit Lines & Loans
Number of Time 60-89 Days Past Due Number of Real Estate Loans or Lines
Number of Times 90 Days Late Number of Dependents
Predict : Serious Delinquency in 2 Years (Default Risk)
GOALS
• Achieve the highest Area Under the Curve (AUC) Score on the Kaggle
Private Leaderboard
• Achieve the Top 5% position placement in the Kaggle Private
Leaderboard
DATASET SIZE
• Training Dataset = 150,000 Observations
• Test Dataset = 101,530 Observations
STRENGTHS
Leveraging what we
already know
r c
q s
A SWOT analysis allows our team to organize our thoughts and
focus on leveraging our greatest strengths, understand our
weaknesses, take advantage of opportunities, and have awareness
of any threats to our undertaking.
SWOT allows us to put ourselves on the right track right away, and
saves us from a lot of headaches later on.
SWOT
ANALYSIS
WEAKNESSES
Areas that we need
to improve upon
OPPORTUNITIES
Chance to apply &
learn from experience
THREATS
Risks that we need to
mitigate and manage
STRENGTHS
• Experience to leverage advanced stacking techniques and Agile
• A synergistic team with very complementary skills and experiences
• Lessons learned from previous Kaggle challenge can be applied
• Experience in the use of Agile process to parallel track work queues
WEAKNESSES
• Extreme time constraints would limit the scope/depth of exploration
• Unfamiliarity with new tools and models will limit firepower
• Learning while executing will slow down the entire process
• Sparse resources on some of the newer technologies employed
OPPORTUNITIES
• Learn how to strategize, optimize, and fine-tune models, algorithms,
and parameters to achieve the best default prediction possible
• Gain real time experience using Bayesian Optimizers
• Explore Deep Learning (Theano/Keras) in predicting default
THREATS
• Small dataset size presents tremendous challenges on generalization
that would impact modeling and ultimately, prediction accuracy
• Extremely tight tolerances in the AUC scores in the 10,000th decimal
• Top 5% target goal presents a high risk – no idea how feasible this is
With the constraint of time and resources, it is critical that the right process is utilized to maximize both our throughput and
output. We employed an Agile process adapted for Machine Learning that allows us to parallelize our model build, train,
validate, test, predict, and scoring cycles quickly in an iterative fashion.
AGILE PROCESS
MACHINE LEARNING
© 2016 Bernard Ong
MODEL EXECUTION
STRATEGY
MANAGING COMPLEXITY
Voting and
Stacking Blended
Models
Single and
Ensemble Models
Voting Classifier
Models
Advanced
Stacking Models
We employed a parallel tracking process where all modeling methods
were being performed at the same time. Each and every time a better
setting is found using automated optimization, it feeds up the entire
process cycle and synergies are gained instantaneously.
Single Model
Logistic Regression
Naïve Bayes
Ensemble Model
Gradient Boosting
XGBoost
2-Layer Stacking Model
Base Classifiers
(1) Gradient Boosting
(1) XGBoost
Meta Classifiers
(1) Logistic Regression
12-Classifier Voting Model
(7) Gradient Boosting
(1) Naïve Bayes
(3) Random Forest
(1) Adaboost
6-Model Combination
(2) Gradient Boosting
(2) Random Forest
(1) Adaboost
(1) Restricted Boltzman Machine
© 2016 Bernard Ong
FEATURE ANALYSIS
• “Monthly Income“ has
29,731 NAs (~20%)
• “Number of
Dependents” has 3,924
NAs (~3%)
• PCA suggests choosing 5
principal components, which
explains only 66% of the total
variance
• We only have 10 features, thus
PCA may not work very well
Outliers
To filter or not to filter,
that is a question…
almost all variables
have outlier problems
FEATURE CORRELATION
HEATMAP
• Correlation no longer
exists after the
outliers are filtered
Numberof30-59
DaysPastDue
Numberof90
DaysPastDue
Numberof60-89
DaysPastDue
VARIABLE IMPORTANCE
(3) Most Important
Variables
“RevolvingUtilizationOf
UnsecuredLines”
“DebtRatio”
“MonthlyIncome”
FEATURE ENGINEERING
• Created 7 training
data sets and 7
test datasets
• Results: Naive
Bayes and Logistic
regression AUC
scores improved
from ~0.7 to ~0.85
• However, for tree-
based methods,
the new features
did not improve
the scores
1
2
3
4
5
6
7
• Machine learning algorithms require careful
tuning of learning parameters and model
hyperparameters.
• This tuning is often a 'black art' requiring
expert experience, rules of thumb, or
sometimes brute-force search.
• Automated approaches help expedite the
determination of optimal combination of
parameters to fit a given learning algorithm/
model to the data at hand
• 4 types of hyperparameter optimization
• grid search
• random search
• gradient-based optimization
• bayesian optimization
Bayesian
Optimization
PARAMETERS vs HYPERPARAMETERS
Parameters Hyperparameters
Ridge & Lasso Regression • Beta coefficients • Regularization parameters (λ)
Support Vector Machines (SVM) • Beta coefficients • C
Tree-Based Methods • Which feature to split on at
each interval node
• At what threshold
• …..
• Criterion (“gini” vs “entropy”)
• Number of trees
• Maximum depth
• …..
Learned from
Fitting Models
Fixed during
Model Fitting
BAYESIAN OPTIMIZATION – bayes_opt
STACKING
• Stacking combines the Base
Classifiers in a non-linear fashion.
The combining task, called a
meta learner, integrates the
independently computed base
classifiers into a higher level
classifier, called a meta classifier,
by learning over a meta-level
training set
• Stacking model typically uses
multiple layers for combining the
probabilities and require more
careful tuning of the parameters
as the number of layers increase
• Stacking model generally works
well with various combinations of
algorithms (including XGBoost,
Random Forest, Boosting )
VOTING
• Voting achieves the classification
of an unlabeled instance
according to the class that
obtains the highest number of
votes (the most frequent vote)
• The concept of layering does not
apply to a Voting model and the
final probability / label prediction
is obtained using Majority Voting
• Voting may not work with certain
combinations of algorithms
• For example, in our scenario,
Voting classifier gave an error
when XGBoost was used as one
of the classifiers because
XGBoost is not part of the Python
Scikit-Learn library
THEANO / KERAS
NEURAL NETWORK
• Tried sequential neural network
• Tuned
• Number of epochs
• Batch size
• Initialization functions
• Optimization functions
Deep Learning
PATHWAY TO THE TOP SCORE
USE AGILE CHUNKING | FAIL FAST | KEEP ACCURATE LOGS | FOLLOW THE PROCESS
01
03
06
04
0205
ONE
BASIC SINGLE & ENSEMBLE MODEL
TESTING TO UNDERSTAND DATA
HEALTH & HEARTBEAT, &
INTERACTION WITH BEST MODEL
TWO
PARALLEL FEATURE ANALYSIS &
ENGINEERING RESULTS FEEDING
TRAINING & TESTING PIPELINE
THREE
PARALLEL BAYESIAN OPTIMIZATION
AGAINST TRAINING DATA TO GET
RELATIVELY BEST PARAMS &
HYPERPARAMETERS
SIX
USE BEST CROSS-VALIDATION
RESULTS TO PRODUCE SUBMISSION
FILE FOR KAGGLE SCORING, THEN
REPEAT CYCLE UNTIL TOP SCORE
ACHIEVED
FIVE
INTENSIVE CROSS VALIDATION
PROCESS TO ENSURE SETTINGS
PRODUCE CONSISTENT HIGH SCORES
WITHOUT OVERFITTING, PERFORM
FINE-GRAINED MANUAL TUNING
FOUR
PARALLEL TRACK MODEL STACKING &
VOTING CLASSIFIER TRAINING &
TESTING TO VALIDATE INITIAL
HYPERPARAMETERS SETTING
SUCCESS
© 2016 Bernard Ong
PARALLEL MODELING PROCESSAGILE MODELING PROCESS : WORK IN PROGRESS : DESIGNED BY BERNARD ONG
Missing Data
Imputation
Feature
Engineering
Single Model
& Ensembles
Stacking
Models
Voting
Classifiers
Blended
Stacking &
Voting
Classifiers
Mean
Imputation
Median
Imputation
Most Frequent
Value
Imputation
Replace with
Zeros
Replace with
-999
Engineered
Dataset #1
Engineered
Dataset #2
Engineered
Dataset #3
Engineered
Dataset #4
Engineered
Dataset #5
Engineered
Dataset #6
Engineered
Dataset #7
Naïve Bayes
Gaussian
Logistic
Regression
Gradient
Boosting
Random
Forest
Adaboost XGBoost
Deep Learning
Model
XGBoost Random
Seed Stack (Lvl-0)
Gradient Boosting
Random Seed
Stack (Lvl-0)
Adaboost
Random Seed
Stack (Lvl-0)
Random
Forest (Lvl-0)
Logistic
Regression
(Lvl-1)
Gradient Boosting
Random Seed
Voting Incumbents
Adaboost
Random Voting
Incumbents
Random
Forest Voting
Incumbents
Voting
Classifier
BayesianOptimization–BestHyperparametersCross-Pollination
Cross-ValidationTestingandFinalManualTuning
Restricted
Boltzman Machine
Neural Net Stack
(Lvl-0)
Gradient Boosting
Random Seed
Voting
Incumbents
Adaboost
Random
Voting
Incumbents
Random
Forest
Voting
Incumbents
Voting
Classifier
Logistic
Regression
Stack
(Lvl-1)
Best AUC
Score Model
Naïve Bayes
Gaussian
© 2016 Bernard Ong
Tradeoff between sensitivity (TPR) vs 1-specificity (FPR). At AUC = 0.88912, it is quite good (almost borderline excellent).
Curve follows the left-hand border and top border of the ROC closely, indicating a fairly good accuracy test.
AUC ROC CURVE
FINAL KAGGLE
• The models seemed to perform well without the
missing values being imputed
• Stacking and Voting, and the combination of the
two tend to have better predictive power
compared to ensembles
• Feature Engineering improved the AUC score for
single models (Naive Bayes and Logistic
Regression) from ~0.7 to ~0.85 but did not have
much impact on the Tree based methods
• The incremental increase in the predictive accuracy
(AUC) is in the order of 0.0001 as we move higher
in Kaggle leaderboard - the tuning gets a lot harder
RESULTS &
FINDINGS
• Hyperparameter tuning is a very time
consuming process
• Cross Validation is a very critical factor in
determining model accuracy
• The model needs to be tuned at a much more
granular level as the dataset gets smaller in
size (i.e. number of features and observations)
• Following an Agile process has continued to
be a proven factor for maximizing success
LESSONS &
INSIGHTS
FUTURE
STEPS
Explore how Theano / Keras
Deep Learning can be
integrated better into the Agile
modeling process.
Extend feature engineering to
add new polynomial and
transformed features.
TOP SCORE
ACHIEVED
We not only surpassed our original goal to get on
the Top 5% position, the team actually beat the high
AUC ranking and achieved the #1 spot on the Kaggle
challenge in 2 weeks.

More Related Content

What's hot

Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoHridyesh Bisht
 
Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015Chris Ohk
 
Algorithms Design Patterns
Algorithms Design PatternsAlgorithms Design Patterns
Algorithms Design PatternsAshwin Shiv
 
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Dongmin Lee
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter TuningJon Lederman
 
Algorithms and Programming
Algorithms and ProgrammingAlgorithms and Programming
Algorithms and ProgrammingMelanie Knight
 
Presentation
PresentationPresentation
Presentationbutest
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentationHJ van Veen
 
Lecture 5 machine learning updated
Lecture 5   machine learning updatedLecture 5   machine learning updated
Lecture 5 machine learning updatedVajira Thambawita
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsGabriel Moreira
 
Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019Faisal Siddiqi
 
Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsYoung-Geun Choi
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learningNimrita Koul
 
Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)Hayim Makabee
 
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021Chris Ohk
 
Heuristic approch monika sanghani
Heuristic approch  monika sanghaniHeuristic approch  monika sanghani
Heuristic approch monika sanghaniMonika Sanghani
 
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networksPR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networksTaesu Kim
 
LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019Faisal Siddiqi
 

What's hot (20)

Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demo
 
Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015
 
Algorithms Design Patterns
Algorithms Design PatternsAlgorithms Design Patterns
Algorithms Design Patterns
 
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Algorithms and Programming
Algorithms and ProgrammingAlgorithms and Programming
Algorithms and Programming
 
Presentation
PresentationPresentation
Presentation
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
 
Lecture 5 machine learning updated
Lecture 5   machine learning updatedLecture 5   machine learning updated
Lecture 5 machine learning updated
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
 
Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019
 
Ml7 bagging
Ml7 baggingMl7 bagging
Ml7 bagging
 
Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep models
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learning
 
Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)
 
Machine learning yearning
Machine learning yearningMachine learning yearning
Machine learning yearning
 
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021
 
Heuristic approch monika sanghani
Heuristic approch  monika sanghaniHeuristic approch  monika sanghani
Heuristic approch monika sanghani
 
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networksPR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
 
LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019
 

Similar to Meetup_Consumer_Credit_Default_Vers_2_All

Customer choice probabilities
Customer choice probabilitiesCustomer choice probabilities
Customer choice probabilitiesAllan D. Butler
 
Pricing like a data scientist
Pricing like a data scientistPricing like a data scientist
Pricing like a data scientistMatthew Evans
 
Presentation - Predicting Online Purchases Using Conversion Prediction Modeli...
Presentation - Predicting Online Purchases Using Conversion Prediction Modeli...Presentation - Predicting Online Purchases Using Conversion Prediction Modeli...
Presentation - Predicting Online Purchases Using Conversion Prediction Modeli...Christopher Sneed, MSDS, PMP, CSPO
 
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...Value Amplify Consulting
 
the application of machine lerning algorithm for SEE
the application of machine lerning algorithm for SEEthe application of machine lerning algorithm for SEE
the application of machine lerning algorithm for SEEKiranKumar671235
 
Auto-Train a Time-Series Forecast Model With AML + ADB
Auto-Train a Time-Series Forecast Model With AML + ADBAuto-Train a Time-Series Forecast Model With AML + ADB
Auto-Train a Time-Series Forecast Model With AML + ADBDatabricks
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10Roger Barga
 
Machine Learning Project - 1994 U.S. Census
Machine Learning Project - 1994 U.S. CensusMachine Learning Project - 1994 U.S. Census
Machine Learning Project - 1994 U.S. CensusTim Enalls
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxNAGARAJANS68
 
Chapter_6_Prescriptive_Analytics_Optimization_and_Simulation.pptx.pdf
Chapter_6_Prescriptive_Analytics_Optimization_and_Simulation.pptx.pdfChapter_6_Prescriptive_Analytics_Optimization_and_Simulation.pptx.pdf
Chapter_6_Prescriptive_Analytics_Optimization_and_Simulation.pptx.pdfAndresBelloAvila
 
Modeling at Scale: SigOpt at TWIMLcon 2019
Modeling at Scale: SigOpt at TWIMLcon 2019Modeling at Scale: SigOpt at TWIMLcon 2019
Modeling at Scale: SigOpt at TWIMLcon 2019SigOpt
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingTed Xiao
 
Initializing & Optimizing Machine Learning Models
Initializing & Optimizing Machine Learning ModelsInitializing & Optimizing Machine Learning Models
Initializing & Optimizing Machine Learning ModelsEng Teong Cheah
 
Build Deep Learning model to identify santader bank's dissatisfied customers
Build Deep Learning model to identify santader bank's dissatisfied customersBuild Deep Learning model to identify santader bank's dissatisfied customers
Build Deep Learning model to identify santader bank's dissatisfied customerssriram30691
 
Post Graduate Admission Prediction System
Post Graduate Admission Prediction SystemPost Graduate Admission Prediction System
Post Graduate Admission Prediction SystemIRJET Journal
 
Horizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at ScaleHorizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at ScaleDatabricks
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerDatabricks
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.ASHOK KUMAR
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial IndustrySubrat Panda, PhD
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for EveryoneAly Abdelkareem
 

Similar to Meetup_Consumer_Credit_Default_Vers_2_All (20)

Customer choice probabilities
Customer choice probabilitiesCustomer choice probabilities
Customer choice probabilities
 
Pricing like a data scientist
Pricing like a data scientistPricing like a data scientist
Pricing like a data scientist
 
Presentation - Predicting Online Purchases Using Conversion Prediction Modeli...
Presentation - Predicting Online Purchases Using Conversion Prediction Modeli...Presentation - Predicting Online Purchases Using Conversion Prediction Modeli...
Presentation - Predicting Online Purchases Using Conversion Prediction Modeli...
 
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
 
the application of machine lerning algorithm for SEE
the application of machine lerning algorithm for SEEthe application of machine lerning algorithm for SEE
the application of machine lerning algorithm for SEE
 
Auto-Train a Time-Series Forecast Model With AML + ADB
Auto-Train a Time-Series Forecast Model With AML + ADBAuto-Train a Time-Series Forecast Model With AML + ADB
Auto-Train a Time-Series Forecast Model With AML + ADB
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Machine Learning Project - 1994 U.S. Census
Machine Learning Project - 1994 U.S. CensusMachine Learning Project - 1994 U.S. Census
Machine Learning Project - 1994 U.S. Census
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptx
 
Chapter_6_Prescriptive_Analytics_Optimization_and_Simulation.pptx.pdf
Chapter_6_Prescriptive_Analytics_Optimization_and_Simulation.pptx.pdfChapter_6_Prescriptive_Analytics_Optimization_and_Simulation.pptx.pdf
Chapter_6_Prescriptive_Analytics_Optimization_and_Simulation.pptx.pdf
 
Modeling at Scale: SigOpt at TWIMLcon 2019
Modeling at Scale: SigOpt at TWIMLcon 2019Modeling at Scale: SigOpt at TWIMLcon 2019
Modeling at Scale: SigOpt at TWIMLcon 2019
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
 
Initializing & Optimizing Machine Learning Models
Initializing & Optimizing Machine Learning ModelsInitializing & Optimizing Machine Learning Models
Initializing & Optimizing Machine Learning Models
 
Build Deep Learning model to identify santader bank's dissatisfied customers
Build Deep Learning model to identify santader bank's dissatisfied customersBuild Deep Learning model to identify santader bank's dissatisfied customers
Build Deep Learning model to identify santader bank's dissatisfied customers
 
Post Graduate Admission Prediction System
Post Graduate Admission Prediction SystemPost Graduate Admission Prediction System
Post Graduate Admission Prediction System
 
Horizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at ScaleHorizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at Scale
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 

Meetup_Consumer_Credit_Default_Vers_2_All

  • 1. EIGENAUTS Bernard Ong (Lead) Emma (Jielei) Zhu | Nanda Rajarathinam | Trinity (Miaozhi) Yu NYC Data Science Academy Capstone Project September 2016
  • 2. CONSUMER CREDIT DEFAULT KAGGLE CHALLENGE Improve on the state of the art in credit scoring by predicting the probability that somebody will experience financial distress in the next two years. Credit scoring algorithms, which make a guess at the probability of default, are the method banks use to determine whether or not a loan should be granted. This challenge seeks to improve on the state of the art in credit scoring, by predicting the probability that somebody will experience financial distress in the next two years. The goal of this challenge is to build a model that borrowers can use to help make the best financial decisions. Features Features Revolving Utilization of Unsecured Lines Debt Ratio Age Monthly Income Number of Time 30-59 Days Past Due Number of Open Credit Lines & Loans Number of Time 60-89 Days Past Due Number of Real Estate Loans or Lines Number of Times 90 Days Late Number of Dependents Predict : Serious Delinquency in 2 Years (Default Risk) GOALS • Achieve the highest Area Under the Curve (AUC) Score on the Kaggle Private Leaderboard • Achieve the Top 5% position placement in the Kaggle Private Leaderboard DATASET SIZE • Training Dataset = 150,000 Observations • Test Dataset = 101,530 Observations
  • 3. STRENGTHS Leveraging what we already know r c q s A SWOT analysis allows our team to organize our thoughts and focus on leveraging our greatest strengths, understand our weaknesses, take advantage of opportunities, and have awareness of any threats to our undertaking. SWOT allows us to put ourselves on the right track right away, and saves us from a lot of headaches later on. SWOT ANALYSIS WEAKNESSES Areas that we need to improve upon OPPORTUNITIES Chance to apply & learn from experience THREATS Risks that we need to mitigate and manage STRENGTHS • Experience to leverage advanced stacking techniques and Agile • A synergistic team with very complementary skills and experiences • Lessons learned from previous Kaggle challenge can be applied • Experience in the use of Agile process to parallel track work queues WEAKNESSES • Extreme time constraints would limit the scope/depth of exploration • Unfamiliarity with new tools and models will limit firepower • Learning while executing will slow down the entire process • Sparse resources on some of the newer technologies employed OPPORTUNITIES • Learn how to strategize, optimize, and fine-tune models, algorithms, and parameters to achieve the best default prediction possible • Gain real time experience using Bayesian Optimizers • Explore Deep Learning (Theano/Keras) in predicting default THREATS • Small dataset size presents tremendous challenges on generalization that would impact modeling and ultimately, prediction accuracy • Extremely tight tolerances in the AUC scores in the 10,000th decimal • Top 5% target goal presents a high risk – no idea how feasible this is
  • 4. With the constraint of time and resources, it is critical that the right process is utilized to maximize both our throughput and output. We employed an Agile process adapted for Machine Learning that allows us to parallelize our model build, train, validate, test, predict, and scoring cycles quickly in an iterative fashion. AGILE PROCESS MACHINE LEARNING © 2016 Bernard Ong
  • 5. MODEL EXECUTION STRATEGY MANAGING COMPLEXITY Voting and Stacking Blended Models Single and Ensemble Models Voting Classifier Models Advanced Stacking Models We employed a parallel tracking process where all modeling methods were being performed at the same time. Each and every time a better setting is found using automated optimization, it feeds up the entire process cycle and synergies are gained instantaneously. Single Model Logistic Regression Naïve Bayes Ensemble Model Gradient Boosting XGBoost 2-Layer Stacking Model Base Classifiers (1) Gradient Boosting (1) XGBoost Meta Classifiers (1) Logistic Regression 12-Classifier Voting Model (7) Gradient Boosting (1) Naïve Bayes (3) Random Forest (1) Adaboost 6-Model Combination (2) Gradient Boosting (2) Random Forest (1) Adaboost (1) Restricted Boltzman Machine © 2016 Bernard Ong
  • 6. FEATURE ANALYSIS • “Monthly Income“ has 29,731 NAs (~20%) • “Number of Dependents” has 3,924 NAs (~3%) • PCA suggests choosing 5 principal components, which explains only 66% of the total variance • We only have 10 features, thus PCA may not work very well Outliers To filter or not to filter, that is a question… almost all variables have outlier problems
  • 7. FEATURE CORRELATION HEATMAP • Correlation no longer exists after the outliers are filtered Numberof30-59 DaysPastDue Numberof90 DaysPastDue Numberof60-89 DaysPastDue
  • 8. VARIABLE IMPORTANCE (3) Most Important Variables “RevolvingUtilizationOf UnsecuredLines” “DebtRatio” “MonthlyIncome”
  • 9. FEATURE ENGINEERING • Created 7 training data sets and 7 test datasets • Results: Naive Bayes and Logistic regression AUC scores improved from ~0.7 to ~0.85 • However, for tree- based methods, the new features did not improve the scores 1 2 3 4 5 6 7
  • 10. • Machine learning algorithms require careful tuning of learning parameters and model hyperparameters. • This tuning is often a 'black art' requiring expert experience, rules of thumb, or sometimes brute-force search. • Automated approaches help expedite the determination of optimal combination of parameters to fit a given learning algorithm/ model to the data at hand • 4 types of hyperparameter optimization • grid search • random search • gradient-based optimization • bayesian optimization Bayesian Optimization
  • 11. PARAMETERS vs HYPERPARAMETERS Parameters Hyperparameters Ridge & Lasso Regression • Beta coefficients • Regularization parameters (λ) Support Vector Machines (SVM) • Beta coefficients • C Tree-Based Methods • Which feature to split on at each interval node • At what threshold • ….. • Criterion (“gini” vs “entropy”) • Number of trees • Maximum depth • ….. Learned from Fitting Models Fixed during Model Fitting
  • 13. STACKING • Stacking combines the Base Classifiers in a non-linear fashion. The combining task, called a meta learner, integrates the independently computed base classifiers into a higher level classifier, called a meta classifier, by learning over a meta-level training set • Stacking model typically uses multiple layers for combining the probabilities and require more careful tuning of the parameters as the number of layers increase • Stacking model generally works well with various combinations of algorithms (including XGBoost, Random Forest, Boosting ) VOTING • Voting achieves the classification of an unlabeled instance according to the class that obtains the highest number of votes (the most frequent vote) • The concept of layering does not apply to a Voting model and the final probability / label prediction is obtained using Majority Voting • Voting may not work with certain combinations of algorithms • For example, in our scenario, Voting classifier gave an error when XGBoost was used as one of the classifiers because XGBoost is not part of the Python Scikit-Learn library
  • 14. THEANO / KERAS NEURAL NETWORK • Tried sequential neural network • Tuned • Number of epochs • Batch size • Initialization functions • Optimization functions Deep Learning
  • 15. PATHWAY TO THE TOP SCORE USE AGILE CHUNKING | FAIL FAST | KEEP ACCURATE LOGS | FOLLOW THE PROCESS 01 03 06 04 0205 ONE BASIC SINGLE & ENSEMBLE MODEL TESTING TO UNDERSTAND DATA HEALTH & HEARTBEAT, & INTERACTION WITH BEST MODEL TWO PARALLEL FEATURE ANALYSIS & ENGINEERING RESULTS FEEDING TRAINING & TESTING PIPELINE THREE PARALLEL BAYESIAN OPTIMIZATION AGAINST TRAINING DATA TO GET RELATIVELY BEST PARAMS & HYPERPARAMETERS SIX USE BEST CROSS-VALIDATION RESULTS TO PRODUCE SUBMISSION FILE FOR KAGGLE SCORING, THEN REPEAT CYCLE UNTIL TOP SCORE ACHIEVED FIVE INTENSIVE CROSS VALIDATION PROCESS TO ENSURE SETTINGS PRODUCE CONSISTENT HIGH SCORES WITHOUT OVERFITTING, PERFORM FINE-GRAINED MANUAL TUNING FOUR PARALLEL TRACK MODEL STACKING & VOTING CLASSIFIER TRAINING & TESTING TO VALIDATE INITIAL HYPERPARAMETERS SETTING SUCCESS © 2016 Bernard Ong
  • 16. PARALLEL MODELING PROCESSAGILE MODELING PROCESS : WORK IN PROGRESS : DESIGNED BY BERNARD ONG Missing Data Imputation Feature Engineering Single Model & Ensembles Stacking Models Voting Classifiers Blended Stacking & Voting Classifiers Mean Imputation Median Imputation Most Frequent Value Imputation Replace with Zeros Replace with -999 Engineered Dataset #1 Engineered Dataset #2 Engineered Dataset #3 Engineered Dataset #4 Engineered Dataset #5 Engineered Dataset #6 Engineered Dataset #7 Naïve Bayes Gaussian Logistic Regression Gradient Boosting Random Forest Adaboost XGBoost Deep Learning Model XGBoost Random Seed Stack (Lvl-0) Gradient Boosting Random Seed Stack (Lvl-0) Adaboost Random Seed Stack (Lvl-0) Random Forest (Lvl-0) Logistic Regression (Lvl-1) Gradient Boosting Random Seed Voting Incumbents Adaboost Random Voting Incumbents Random Forest Voting Incumbents Voting Classifier BayesianOptimization–BestHyperparametersCross-Pollination Cross-ValidationTestingandFinalManualTuning Restricted Boltzman Machine Neural Net Stack (Lvl-0) Gradient Boosting Random Seed Voting Incumbents Adaboost Random Voting Incumbents Random Forest Voting Incumbents Voting Classifier Logistic Regression Stack (Lvl-1) Best AUC Score Model Naïve Bayes Gaussian © 2016 Bernard Ong
  • 17. Tradeoff between sensitivity (TPR) vs 1-specificity (FPR). At AUC = 0.88912, it is quite good (almost borderline excellent). Curve follows the left-hand border and top border of the ROC closely, indicating a fairly good accuracy test. AUC ROC CURVE FINAL KAGGLE
  • 18. • The models seemed to perform well without the missing values being imputed • Stacking and Voting, and the combination of the two tend to have better predictive power compared to ensembles • Feature Engineering improved the AUC score for single models (Naive Bayes and Logistic Regression) from ~0.7 to ~0.85 but did not have much impact on the Tree based methods • The incremental increase in the predictive accuracy (AUC) is in the order of 0.0001 as we move higher in Kaggle leaderboard - the tuning gets a lot harder RESULTS & FINDINGS
  • 19. • Hyperparameter tuning is a very time consuming process • Cross Validation is a very critical factor in determining model accuracy • The model needs to be tuned at a much more granular level as the dataset gets smaller in size (i.e. number of features and observations) • Following an Agile process has continued to be a proven factor for maximizing success LESSONS & INSIGHTS
  • 20. FUTURE STEPS Explore how Theano / Keras Deep Learning can be integrated better into the Agile modeling process. Extend feature engineering to add new polynomial and transformed features.
  • 21. TOP SCORE ACHIEVED We not only surpassed our original goal to get on the Top 5% position, the team actually beat the high AUC ranking and achieved the #1 spot on the Kaggle challenge in 2 weeks.