SlideShare a Scribd company logo
1 of 26
Download to read offline
Ensembling & Boosting
概念介紹
Wayne Chen
201608
簡報目的
增加資料分析領域的 sense
遇到自稱打過比賽的人不會心裡涼涼的覺得你好神
Maybe 就算用不上 概念也有借鏡的地方
如果說 Deep Learning 改變了 ML 的遊戲規則
XGBoost : Kaggle Winning Solution
Giuliano Janson: Won two games and retired from Kaggle
Persistence: every Kaggler nowadays can put up a great model in a few hours
and usually achieve 95% of final score. Only persistence will get you the
remaining 5%.
Ensembling: need to know how to do it "like a pro". Forget about averaging
models. Nowadays many Kaggler do meta-models, and meta-meta-models.
Why Ensemble is needed?
奧卡姆剃刀 Occam's Razor
● An explanation of the data should be made as simple as possible, but no simpler.
簡單的方法,勝過複雜的方法。 Simple s good. 任何的浪費都是不好的
將多個簡單的模型組合起來,效果比一個複雜的模型還要好
● Training data might not provide sufficient information for choosing a single best learner.
● The search processes of the learning algorithms might be imperfect (difficult to achieve unique
best hypothesis)
● Hypothesis space being searched might not contain the true target function.
所謂簡單的方法是指
ID3, C4.5, CART … Tree base method
Entropy
ex. 找出愛花錢的人,以性別作為切分 5 愛(1M,4F), 9 不愛(6M,3F)
● E_all → -5/14 * log(5/14) - 9/14 * log(9/14)
● Entropy is 1 if 50% - 50%, 0 if 100% - 0%
Information Gain
● 選擇 a 當作 split attribute,之後 Entropy 比原本減少了多少
● E_gender → P(M) * E(1,6) + P(F) * E(4,3) Gain = E_all - E_gender
http://www.saedsayad.com/decision_tree.htm
這樣會有什麼問題?
越精準的模型可能是越偏頗的
http://blogs.sas.com/content/jmp/2013/03/25/partitioning-a-quadratic-in-jmp/
一句話講完 Boost Ensemble
知錯能改、善莫大焉
學習就是一遍一遍的的對錯誤加重記憶,然後改進
做錯的事就沒有後悔藥吃了,記取教訓努力在未來不再犯錯
1. 錯了就錯了,不要丟掉,也不要執著
2. 記住錯在哪裡,下次加重學習
3. 一直學到考試都可以考一百分 (誤)
一秒鐘學會用 Ensemble
我想你已經 try 過一些不同 model 了
● Decision tree, NN, SVM, Regression ..
Ensemble Kaggle submission CSV files. → It’s work!
Majority Voting
● Three models : 70%, 70%, 70%
● Majority vote ensemble will be ~78%.
● Averaging predictions often reduces overfit.
http://mlwave.com/kaggle-ensembling-guide/
Ensemble 的陷阱
把 Kobe, Curry, LBJ 組一隊,就會拿總冠軍嗎?
Uncorrelated models usually performed better
As more accurate as possible, and as more diverse aspossible
常見機制 Majority Vote, Weighted Averaging
Voting Ensemble → RandomForest → GradientBoostingMachine
1111111100 = 80% accuracy
1111111100 = 80% accuracy
1011111100 = 70% accuracy
1111111100 = 80% accuracy
1111111100 = 80% accuracy
0111011101 = 70% accuracy
1000101111 = 60% accuracy
1111111101 = 90% accuracy
你一定聽過的
Ensemble 方法
● Randomly sampling not
only dat but also feature
● Majority vote
● Minimal tuning
● Performance pass lots of
complex method
n: subsample size
m: subfeature set size
tree size, tree number
http://www.slideshare.net/0xdata/jan-vitek-distributedrandomforest522013
Base Learner:被拿來 ensemble 的基礎模型 ex. 一棵樹, simple neural network
● Train by base learning algorithm (ex. decision tree, neural network ..)
三大訓練方法分支:
● Boosting - Boost weak learners too strong learners (sequential learners)
● Bagging - Like RandomForest, sampling from data or features
● Stacking - 打包的概念 (parallel learners)
● Employing different learning algorithms to train individual learners
● Individual learners then combined by a second-level learner which is
called meta-learner.
Ensemble 的關鍵字
Bagging Ensemble Bootstrap Aggregating
每次取樣m個資料點 (bootstrap sample) train base learner by calling a base
learning algorithm
● Sampling 的比例是學問
● 甚至針對不同特徵的子資料集 train 不同 model
○ Cherkauer(1996) 火山鑑定工程 32 NN,依據不同 input feature 切分
● 加入 randomness 元素
○ backpropagation random init, tree random select feature
● Majority voting
優點 -- 保留整體假說的多樣化特徵
Boost Family
● AdaBoost (Adaptive Boosting)
● Gradient Tree Boosting
● XGBoost
Conbination of Additive Models
學習收斂效能好
有放大雜訊的危險性
● Bagging can significantly reduce the variance
● Boosting can significantly reduce the bias
http://slideplayer.com/slide/4816467/
Assigns equal weights to all the training examples,
increased the weights of incorrectly classified examples.
Adaboost 特性介紹
在大部分情況下,可以有非常好的
表現,但對於雜訊的放大,是其必
須克服的地方。
在每一次的分類中,我們要提升被
分錯的點再下一次被分對的機率,
以及降低被分錯的機率。
http://www.37steps.com/exam/adaboost_comp/html/adaboost_comp.html
Gradient Boosting
Additive training
● New predictor is optimized by moving in the opposite direction of the
gradient to minimize the loss function.
GBDT 中的決策樹深度較小一般不會超過5,葉子節點的數量也不會超過10
● Boosted Tree: GBDT, GBRT, MART, LambdaMART
Gradient Boosting Model Steps
● Leaf weighted cost score
● Additive training: 加入一個新模型到模型中 → 選擇一個
加入後 cost error 下降最多的模型
● Greedy algorithm to build new tree from a single leaf
● Gradient update weight
Training Tips
Shrinkage
● Reduces the influence of each individual tree and leaves space for
future trees to improve the model.
● Better to improve model by many small steps than lagre steps.
Subsampling, Early Stopping, Post-Prunning
● In 2015, 29 challenge winning solutions, 17 used XGBoost (deep neural
nets 11)
● KDDCup 2015 all winning solution mention it.
● 用了直接上 leaderboard top 10
Scalability enables data scientists to process hundred millions of examples
on a desktop.
● OpenMP CPU multi-thread
● DMatrix
● Cache-aware and Sparsity-aware
為什麼 XGBoost 這麼威
Column Block for Parallel Learning
The most time consuming part of tree learning is to get the data into sorted
order.
In memory block, compressed column format, each column sorted by the
corresponding feature value. Block Compression, Block Sharding.
Results
Use it in Python
xgb_model = XGBClassifier( learning_rate =0.1, n_estimators=1000,
max_depth=5, min_child_weight=1, gamma=0, subsample=0.8,
colsample_bytree=0.8, objective= 'binary:logistic', nthread=8,
scale_pos_weight=1, seed=27)
● gamma : Minimum loss reduction required to make a further partition on a
leaf node of the tree.
● min_child_weight : Minimum sum of instance weight(hessian) needed in a
child.
● colsample_bytree : Subsample ratio of columns when constructing each
tree.
Ensamble in Kaggle
Voting ensembles, Weighted majority vote, Bagged Perceptrons, Rank
averaging, Historical ranks, Stacked & Blending (Netflix)
圖片分類比賽
● Voting ensemble of around 30 convnets. The best single model scored
0.93170. Final score 0.94120.
Ensemble in Kaggle
No Free Lunch
Ensemble is much better than single learner.
Bias-variance tradeoff → Boosting or Average vote it.
● Not understandable -- like DNN, Non-linear SVM
● There is no ensemble method which outperforms other ensemble methods
consistently
Selecting some base learners instead of using all of them to compose an
ensemble is a better choice -- selective ensembles
XGBoost(tabular data) v.s. Deep Learning(more & complex data, hard tuning)
Reference
● Gradient boosting machines, a tutorial Alexey Natekin1* and Alois Knoll2
● XGBoost: A Scalable Tree Boosting System - Tianqi Chen
● NTU cmlab http://www.cmlab.csie.ntu.edu.tw/~cyy/learning/tutorials/
● http://mlwave.com/kaggle-ensembling-guide/

More Related Content

What's hot

Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetSungminYou
 
PRML Chapter 11
PRML Chapter 11PRML Chapter 11
PRML Chapter 11Sunwoo Kim
 
introduction to deep Learning with full detail
introduction to deep Learning with full detailintroduction to deep Learning with full detail
introduction to deep Learning with full detailsonykhan3
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptronomaraldabash
 
モデルアーキテクチャ観点からの高速化2019
モデルアーキテクチャ観点からの高速化2019モデルアーキテクチャ観点からの高速化2019
モデルアーキテクチャ観点からの高速化2019Yusuke Uchida
 
artificial neural network
artificial neural networkartificial neural network
artificial neural networkPallavi Yadav
 
Unit I & II in Principles of Soft computing
Unit I & II in Principles of Soft computing Unit I & II in Principles of Soft computing
Unit I & II in Principles of Soft computing Sivagowry Shathesh
 
PRML 5.5.6-5.6 畳み込みネットワーク(CNN)・ソフト重み共有・混合密度ネットワーク
PRML 5.5.6-5.6 畳み込みネットワーク(CNN)・ソフト重み共有・混合密度ネットワークPRML 5.5.6-5.6 畳み込みネットワーク(CNN)・ソフト重み共有・混合密度ネットワーク
PRML 5.5.6-5.6 畳み込みネットワーク(CNN)・ソフト重み共有・混合密度ネットワークKokiTakamiya
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based ClusteringSSA KPI
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network Yan Xu
 
Active Learning と Bayesian Neural Network
Active Learning と Bayesian Neural NetworkActive Learning と Bayesian Neural Network
Active Learning と Bayesian Neural NetworkNaoki Matsunaga
 
Introduction to random forest and gradient boosting methods a lecture
Introduction to random forest and gradient boosting methods   a lectureIntroduction to random forest and gradient boosting methods   a lecture
Introduction to random forest and gradient boosting methods a lectureShreyas S K
 
PRML復々習レーン#9 前回までのあらすじ
PRML復々習レーン#9 前回までのあらすじPRML復々習レーン#9 前回までのあらすじ
PRML復々習レーン#9 前回までのあらすじsleepy_yoshi
 
Toward Disentanglement through Understand ELBO
Toward Disentanglement through Understand ELBOToward Disentanglement through Understand ELBO
Toward Disentanglement through Understand ELBOKai-Wen Zhao
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Simplilearn
 
論文紹介:Dueling network architectures for deep reinforcement learning
論文紹介:Dueling network architectures for deep reinforcement learning論文紹介:Dueling network architectures for deep reinforcement learning
論文紹介:Dueling network architectures for deep reinforcement learningKazuki Adachi
 

What's hot (20)

Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
PRML Chapter 11
PRML Chapter 11PRML Chapter 11
PRML Chapter 11
 
introduction to deep Learning with full detail
introduction to deep Learning with full detailintroduction to deep Learning with full detail
introduction to deep Learning with full detail
 
Neural network
Neural networkNeural network
Neural network
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
モデルアーキテクチャ観点からの高速化2019
モデルアーキテクチャ観点からの高速化2019モデルアーキテクチャ観点からの高速化2019
モデルアーキテクチャ観点からの高速化2019
 
artificial neural network
artificial neural networkartificial neural network
artificial neural network
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Unit I & II in Principles of Soft computing
Unit I & II in Principles of Soft computing Unit I & II in Principles of Soft computing
Unit I & II in Principles of Soft computing
 
PRML 5.5.6-5.6 畳み込みネットワーク(CNN)・ソフト重み共有・混合密度ネットワーク
PRML 5.5.6-5.6 畳み込みネットワーク(CNN)・ソフト重み共有・混合密度ネットワークPRML 5.5.6-5.6 畳み込みネットワーク(CNN)・ソフト重み共有・混合密度ネットワーク
PRML 5.5.6-5.6 畳み込みネットワーク(CNN)・ソフト重み共有・混合密度ネットワーク
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based Clustering
 
Perceptron & Neural Networks
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
 
Active Learning と Bayesian Neural Network
Active Learning と Bayesian Neural NetworkActive Learning と Bayesian Neural Network
Active Learning と Bayesian Neural Network
 
Introduction to random forest and gradient boosting methods a lecture
Introduction to random forest and gradient boosting methods   a lectureIntroduction to random forest and gradient boosting methods   a lecture
Introduction to random forest and gradient boosting methods a lecture
 
PRML復々習レーン#9 前回までのあらすじ
PRML復々習レーン#9 前回までのあらすじPRML復々習レーン#9 前回までのあらすじ
PRML復々習レーン#9 前回までのあらすじ
 
Toward Disentanglement through Understand ELBO
Toward Disentanglement through Understand ELBOToward Disentanglement through Understand ELBO
Toward Disentanglement through Understand ELBO
 
PRML 5.5
PRML 5.5PRML 5.5
PRML 5.5
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
論文紹介:Dueling network architectures for deep reinforcement learning
論文紹介:Dueling network architectures for deep reinforcement learning論文紹介:Dueling network architectures for deep reinforcement learning
論文紹介:Dueling network architectures for deep reinforcement learning
 

Similar to Ensembling & Boosting 概念介紹

Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntEugene Yan Ziyou
 
Aaa ped-14-Ensemble Learning: About Ensemble Learning
Aaa ped-14-Ensemble Learning: About Ensemble LearningAaa ped-14-Ensemble Learning: About Ensemble Learning
Aaa ped-14-Ensemble Learning: About Ensemble LearningAminaRepo
 
Escaping the Black Box
Escaping the Black BoxEscaping the Black Box
Escaping the Black BoxRebecca Bilbro
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringSri Ambati
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionJaroslaw Szymczak
 
Decision tree and ensemble
Decision tree and ensembleDecision tree and ensemble
Decision tree and ensembleDanbi Cho
 
Reading group gan - 20170417
Reading group   gan - 20170417Reading group   gan - 20170417
Reading group gan - 20170417Shuai Zhang
 
Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat omarodibat
 
XGBOOST [Autosaved]12.pptx
XGBOOST [Autosaved]12.pptxXGBOOST [Autosaved]12.pptx
XGBOOST [Autosaved]12.pptxyadav834181
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning TechniquesBabu Priyavrat
 
Module 6: Ensemble Algorithms
Module 6:  Ensemble AlgorithmsModule 6:  Ensemble Algorithms
Module 6: Ensemble AlgorithmsSara Hooker
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmaxJaeJun Yoo
 
BaggingBoosting.pdf
BaggingBoosting.pdfBaggingBoosting.pdf
BaggingBoosting.pdfDynamicPitch
 

Similar to Ensembling & Boosting 概念介紹 (20)

Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
 
Aaa ped-14-Ensemble Learning: About Ensemble Learning
Aaa ped-14-Ensemble Learning: About Ensemble LearningAaa ped-14-Ensemble Learning: About Ensemble Learning
Aaa ped-14-Ensemble Learning: About Ensemble Learning
 
gan.pdf
gan.pdfgan.pdf
gan.pdf
 
2021 04-01-dalle
2021 04-01-dalle2021 04-01-dalle
2021 04-01-dalle
 
Ensemble methods
Ensemble methodsEnsemble methods
Ensemble methods
 
Escaping the Black Box
Escaping the Black BoxEscaping the Black Box
Escaping the Black Box
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
 
Decision tree
Decision treeDecision tree
Decision tree
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competition
 
Decision tree and ensemble
Decision tree and ensembleDecision tree and ensemble
Decision tree and ensemble
 
Reading group gan - 20170417
Reading group   gan - 20170417Reading group   gan - 20170417
Reading group gan - 20170417
 
Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat
 
XGBOOST [Autosaved]12.pptx
XGBOOST [Autosaved]12.pptxXGBOOST [Autosaved]12.pptx
XGBOOST [Autosaved]12.pptx
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning Techniques
 
Module 6: Ensemble Algorithms
Module 6:  Ensemble AlgorithmsModule 6:  Ensemble Algorithms
Module 6: Ensemble Algorithms
 
Machine Learning - Supervised Learning
Machine Learning - Supervised LearningMachine Learning - Supervised Learning
Machine Learning - Supervised Learning
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax
 
BaggingBoosting.pdf
BaggingBoosting.pdfBaggingBoosting.pdf
BaggingBoosting.pdf
 

Recently uploaded

CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 

Recently uploaded (17)

CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 

Ensembling & Boosting 概念介紹

  • 3. 如果說 Deep Learning 改變了 ML 的遊戲規則 XGBoost : Kaggle Winning Solution Giuliano Janson: Won two games and retired from Kaggle Persistence: every Kaggler nowadays can put up a great model in a few hours and usually achieve 95% of final score. Only persistence will get you the remaining 5%. Ensembling: need to know how to do it "like a pro". Forget about averaging models. Nowadays many Kaggler do meta-models, and meta-meta-models.
  • 4. Why Ensemble is needed? 奧卡姆剃刀 Occam's Razor ● An explanation of the data should be made as simple as possible, but no simpler. 簡單的方法,勝過複雜的方法。 Simple s good. 任何的浪費都是不好的 將多個簡單的模型組合起來,效果比一個複雜的模型還要好 ● Training data might not provide sufficient information for choosing a single best learner. ● The search processes of the learning algorithms might be imperfect (difficult to achieve unique best hypothesis) ● Hypothesis space being searched might not contain the true target function.
  • 5. 所謂簡單的方法是指 ID3, C4.5, CART … Tree base method Entropy ex. 找出愛花錢的人,以性別作為切分 5 愛(1M,4F), 9 不愛(6M,3F) ● E_all → -5/14 * log(5/14) - 9/14 * log(9/14) ● Entropy is 1 if 50% - 50%, 0 if 100% - 0% Information Gain ● 選擇 a 當作 split attribute,之後 Entropy 比原本減少了多少 ● E_gender → P(M) * E(1,6) + P(F) * E(4,3) Gain = E_all - E_gender http://www.saedsayad.com/decision_tree.htm
  • 7. 一句話講完 Boost Ensemble 知錯能改、善莫大焉 學習就是一遍一遍的的對錯誤加重記憶,然後改進 做錯的事就沒有後悔藥吃了,記取教訓努力在未來不再犯錯 1. 錯了就錯了,不要丟掉,也不要執著 2. 記住錯在哪裡,下次加重學習 3. 一直學到考試都可以考一百分 (誤)
  • 8. 一秒鐘學會用 Ensemble 我想你已經 try 過一些不同 model 了 ● Decision tree, NN, SVM, Regression .. Ensemble Kaggle submission CSV files. → It’s work! Majority Voting ● Three models : 70%, 70%, 70% ● Majority vote ensemble will be ~78%. ● Averaging predictions often reduces overfit. http://mlwave.com/kaggle-ensembling-guide/
  • 9. Ensemble 的陷阱 把 Kobe, Curry, LBJ 組一隊,就會拿總冠軍嗎? Uncorrelated models usually performed better As more accurate as possible, and as more diverse aspossible 常見機制 Majority Vote, Weighted Averaging Voting Ensemble → RandomForest → GradientBoostingMachine 1111111100 = 80% accuracy 1111111100 = 80% accuracy 1011111100 = 70% accuracy 1111111100 = 80% accuracy 1111111100 = 80% accuracy 0111011101 = 70% accuracy 1000101111 = 60% accuracy 1111111101 = 90% accuracy
  • 10. 你一定聽過的 Ensemble 方法 ● Randomly sampling not only dat but also feature ● Majority vote ● Minimal tuning ● Performance pass lots of complex method n: subsample size m: subfeature set size tree size, tree number http://www.slideshare.net/0xdata/jan-vitek-distributedrandomforest522013
  • 11. Base Learner:被拿來 ensemble 的基礎模型 ex. 一棵樹, simple neural network ● Train by base learning algorithm (ex. decision tree, neural network ..) 三大訓練方法分支: ● Boosting - Boost weak learners too strong learners (sequential learners) ● Bagging - Like RandomForest, sampling from data or features ● Stacking - 打包的概念 (parallel learners) ● Employing different learning algorithms to train individual learners ● Individual learners then combined by a second-level learner which is called meta-learner. Ensemble 的關鍵字
  • 12. Bagging Ensemble Bootstrap Aggregating 每次取樣m個資料點 (bootstrap sample) train base learner by calling a base learning algorithm ● Sampling 的比例是學問 ● 甚至針對不同特徵的子資料集 train 不同 model ○ Cherkauer(1996) 火山鑑定工程 32 NN,依據不同 input feature 切分 ● 加入 randomness 元素 ○ backpropagation random init, tree random select feature ● Majority voting 優點 -- 保留整體假說的多樣化特徵
  • 13. Boost Family ● AdaBoost (Adaptive Boosting) ● Gradient Tree Boosting ● XGBoost Conbination of Additive Models 學習收斂效能好 有放大雜訊的危險性 ● Bagging can significantly reduce the variance ● Boosting can significantly reduce the bias
  • 14. http://slideplayer.com/slide/4816467/ Assigns equal weights to all the training examples, increased the weights of incorrectly classified examples.
  • 16. Gradient Boosting Additive training ● New predictor is optimized by moving in the opposite direction of the gradient to minimize the loss function. GBDT 中的決策樹深度較小一般不會超過5,葉子節點的數量也不會超過10 ● Boosted Tree: GBDT, GBRT, MART, LambdaMART
  • 17. Gradient Boosting Model Steps ● Leaf weighted cost score ● Additive training: 加入一個新模型到模型中 → 選擇一個 加入後 cost error 下降最多的模型 ● Greedy algorithm to build new tree from a single leaf ● Gradient update weight
  • 18. Training Tips Shrinkage ● Reduces the influence of each individual tree and leaves space for future trees to improve the model. ● Better to improve model by many small steps than lagre steps. Subsampling, Early Stopping, Post-Prunning
  • 19. ● In 2015, 29 challenge winning solutions, 17 used XGBoost (deep neural nets 11) ● KDDCup 2015 all winning solution mention it. ● 用了直接上 leaderboard top 10 Scalability enables data scientists to process hundred millions of examples on a desktop. ● OpenMP CPU multi-thread ● DMatrix ● Cache-aware and Sparsity-aware 為什麼 XGBoost 這麼威
  • 20. Column Block for Parallel Learning The most time consuming part of tree learning is to get the data into sorted order. In memory block, compressed column format, each column sorted by the corresponding feature value. Block Compression, Block Sharding.
  • 22. Use it in Python xgb_model = XGBClassifier( learning_rate =0.1, n_estimators=1000, max_depth=5, min_child_weight=1, gamma=0, subsample=0.8, colsample_bytree=0.8, objective= 'binary:logistic', nthread=8, scale_pos_weight=1, seed=27) ● gamma : Minimum loss reduction required to make a further partition on a leaf node of the tree. ● min_child_weight : Minimum sum of instance weight(hessian) needed in a child. ● colsample_bytree : Subsample ratio of columns when constructing each tree.
  • 23. Ensamble in Kaggle Voting ensembles, Weighted majority vote, Bagged Perceptrons, Rank averaging, Historical ranks, Stacked & Blending (Netflix)
  • 24. 圖片分類比賽 ● Voting ensemble of around 30 convnets. The best single model scored 0.93170. Final score 0.94120. Ensemble in Kaggle
  • 25. No Free Lunch Ensemble is much better than single learner. Bias-variance tradeoff → Boosting or Average vote it. ● Not understandable -- like DNN, Non-linear SVM ● There is no ensemble method which outperforms other ensemble methods consistently Selecting some base learners instead of using all of them to compose an ensemble is a better choice -- selective ensembles XGBoost(tabular data) v.s. Deep Learning(more & complex data, hard tuning)
  • 26. Reference ● Gradient boosting machines, a tutorial Alexey Natekin1* and Alois Knoll2 ● XGBoost: A Scalable Tree Boosting System - Tianqi Chen ● NTU cmlab http://www.cmlab.csie.ntu.edu.tw/~cyy/learning/tutorials/ ● http://mlwave.com/kaggle-ensembling-guide/