SlideShare a Scribd company logo
1 of 46
Download to read offline
Les outils de modélisation des Big Data
SEPIA
3 dec 13
Pr Michel Béra
Chaire de Modélisation statistique du Risque
CNAM/SITI/IMATH
1
• Plan de l’exposé
– L’inégalité de Vapnik et les fondements d’une nouvelle
théorie de la robustesse (1971 et 1995)
– Éclairages sur les méthodes classiques (NN, Decision Trees,
analyse factorielle)
– La notion de géométrie des données et d’espace étendu –
le Kernel Trick – quali et quanti : un combat dépassé
– Big Data et monde vapnikien, utopies et réalités – notions
de complexité informatique
– Modélisation moderne : un enchaînement d’approches, du
Machine Learning aveugle aux finesses de l’Evidence based
Policy
2
Theoretical Statistics
« Data are as they are »
Applied Statistics
« modeling data then testing »
Theory of ill-posed problems
Empirical Methods
of conjuration (PCA,NN,Bayes)
1974 VC Dimension
2001: Start of the
internet era,
Millions of records
& thousands of variables
1980 SRM (Vapnik)
1995 Support Vector Machines (Vapnik)
1960: Mainframe.
Huge Datasets start appearing.
1930
Kolmogorov-SmirnovFisher
1950Cramer
High dimensionnal problems
malediction
STOP !
Watch out !
GO !
Statistical history
3
1. Le monde de Vapnik
- Conférence aux Bell Labs (New Jersey) de 1995
4
Consistency : definition
1) A learning process (model) is said to be consistent if
model error, measured on new data sampled from the
same underlying probability laws of our original
sample, converges, when original sample size
increases, towards model error, measured on original
sample.
2) A model that is consistent is also said to generalize well,
or to be robust
5
%error
number of training examples
Test error
Training error
Consistent training?
%error
number of training examples
Test error
Training error
6
Generalization: definition
• Generalization capacity for a model describes how
(ex: error function) a model will perform on data that
he has never seen before (in his training set)
• Good generalization for a model means that model
errors on new unknown data will be of the same
« size » than known error on his training set. The
model is also called « robust ».
7
Overfitting
-10 -8 -6 -4 -2 0 2 4 6 8 10
-0.5
0
0.5
1
1.5
Example: Polynomial regression
Target: a 10th degree polynomial + noise
Learning machine: y=w0+w1x + w2x2 …+ w10x10
8
Overfitting Avoidance
-10 -8 -6 -4 -2 0 2 4 6 8 10
-0.5
0
0.5
1
1.5
d=10, r= 0.01
-10 -8 -6 -4 -2 0 2 4 6 8 10
-0.5
0
0.5
1
1.5
d=10, r= 0.1
-10 -8 -6 -4 -2 0 2 4 6 8 10
-0.5
0
0.5
1
1.5
d=10, r= 1
-10 -8 -6 -4 -2 0 2 4 6 8 10
-0.5
0
0.5
1
1.5
d=10, r= 10
-10 -8 -6 -4 -2 0 2 4 6 8 10
-0.5
0
0.5
1
1.5
d=10, r=1e+002
-10 -8 -6 -4 -2 0 2 4 6 8 10
-0.5
0
0.5
1
1.5
d=10, r=1e+003
-10 -8 -6 -4 -2 0 2 4 6 8 10
-0.5
0
0.5
1
1.5
d=10, r=1e+004
-10 -8 -6 -4 -2 0 2 4 6 8 10
-0.5
0
0.5
1
1.5
d=10, r=1e+005
-10 -8 -6 -4 -2 0 2 4 6 8 10
-0.5
0
0.5
1
1.5
d=10, r=1e+006
-10 -8 -6 -4 -2 0 2 4 6 8 10
-0.5
0
0.5
1
1.5
d=10, r=1e+007
-10 -8 -6 -4 -2 0 2 4 6 8 10
-0.5
0
0.5
1
1.5
d=10, r=1e+008
Example: Polynomial regression
Target: a 10th degree polynomial + noise
Learning machine: y=w0+w1x + w2x2 …+ w10x10
9
Vapnik approach to modeling (1)
• Vapnik approach is based on the family of functions S =
{f(X,w), w ε W}, in which a model is chosen as a specific
function, described by a specific w
• For Vapnik, the model function must answer properly for a
given row X the question described by target Y, ie predict Y,
the quality of the answer being measured by a cost function Q
• Different families of functions may provide the same
« quality » of answer
10
Vapnik approach to modeling (2)
• All the trick is then to find a good family of functions S, that
not only answers in a « good way » the question described by
target Y, but that can also be easy to understand, ie also
provide a good description, allowing to explain easily what is
underlying the data behaviour of the problem question
• VC dimension will be a key to understand and control model
robustness
11
VC dimension - definition (1)
• Let us consider a sample (x1, .. , xL) from Rn
• There are 2L different ways to separate the sample in two sub-
samples
• A set S of functions f(X,w) shatters the sample if all 2L
separations can be defined by different f(X,w) from family S
12
VC dimension - definition (2)
A function family S has VC dimension h (h is an integer) if:
1) Every sample of h vectors from Rn can be shattered by a
function from S
2) There is at least one sample of h+1 vectors that cannot be
shattered by any function from S
13
Example: VC dimension
VC dimension:
- Measures the complexity of a
solution (function).
- Is not directly related to the
number of variables
VC dimension:
- Measures the complexity of a
solution (function).
- Is not directly related to the
number of variables
14
Other examples
• VC dimension for hyperplanes of Rn is n+1
• VC dimension of set of functions:
f(x,w) = sign (sin (w.x) ),
c <= x <= 1, c>0,
where w is a free parameter, is infinite.
– Conclusion : VC dimension is not always equal to the
number n of parameters (X1,..,Xn) of a given family S of
functions from Rn to {-1,+1}.
15
Key Example: linear models -> y = <w|x> + b
• VC dimension of family S of linear models:
with:
depends on C and can take any value between 0 and n.
This is the basis for Machine Learning approaches such as SVM
(Support Vector Machines) or Ridge Regression.
16
VC dimension : interpretation
• VC dimension of S: an integer, that measures the
shattering (or separating) power (“complexity”) of
function family S:
• We shall now show that VC dimension (a major
theorem from Vapnik) gives a powerful indication for
model consistency, hence “robustness”.
17
What is a Risk Functional?
• A function of the parameters of the
learning machine, assessing how much it is
expected to fail on a given task.
Parameter space (w)
R[f(x,w)]
w*
18
Examples of Risk Functionals
• Classification:
– Error rate
– AUC
• Regression:
– Mean square error
19
Lift Curve
O
MKI =
O M
Fraction of customers selected
Fractionofgoodcustomersselected
Ideal Lift
100%
100%Customers
ordered
according
to f(x);
selection
of the top
ranking
customers.
Gini index
0 ≤≤≤≤ KI ≤≤≤≤ 1
20
Statistical Learning
Theoretical Foundations
• Structural Risk Minimization
• Regularization
• Weight decay
• Feature selection
• Data compression
21
Learning Theory Problem (1)
• A model computes a function:
• Problem : minimize in w Risk Expectation
– w : a parameter that specifies the chosen model
– z = (X, y) are possible values for attributes (variables)
– Q measures (quantifies) model error cost
– P(z) is the underlying probability law (unknown) for data z
22
• We get L data from learning sample (z1, .. , zL), and we suppose them
iid sampled from law P(z).
• To minimize R(w), we start by minimizing Empirical Risk over this
sample :
• Example of classical cost functions :
– classification (eg. Q can be a cost function based on cost for
misclassified points)
– regression (eg. Q can be a cost function of least squares type)
Learning Theory Problem (2)
23
Learning Theory Problem (3)
• Central problem for Statistical Learning Theory:
What is the relation
between Risk Expectation R(W)
and Empirical Risk E(W)?
• How to define and measure a generalization capacity
(“robustness”) for a model ?
24
Four Pillars for SLT (1 and 2)
• Consistency (guarantees generalization)
– Under what conditions will a model be consistent ?
• Model convergence speed (a measure for
generalization capacity)
– How does generalization capacity improve when
sample size L grows?
25
Four Pillars for SLT (3 and 4)
• Generalization capacity control
– How to control in an efficient way model generalization
starting with the only given information we have: our
sample data?
• A strategy for good learning algorithms
– Is there a strategy that guarantees, measures and controls
our learning model generalization capacity ?
26
Vapnik main theorem
• Q : Under which conditions will a learning process (model) be
consistent?
• R : A model will be consistent if and only if the function f that
defines the model comes from a family of functions S with
finite VC dimension h
• A finite VC dimension h not only guarantees a generalization
capacity (consistency), but to pick f in a family S with finite VC
dimension h is the only way to build a model that generalizes.
27
Model convergence speed (generalization
capacity)
• Q : What is the nature of model risk difference between
learning data (sample: empirical risk) and test data (expected
risk), for a sample of finite size L?
• R : This difference is no greater than a limit that only depends
on the ratio between VC dimension h of model functions
family S, and sample size L, ie h/L
This statement is a new theorem that belongs to Kolmogorov-
Smirnov way for results, ie theorems that do not depend on
data’s underlying probability law.
28
Empirical risk minimization in LS case
• With probability 1-q, the following inequality is true:
where w0 is the parameter w value that minimizes
Empirical Risk:
29
Model convergence speed
Sample size L
Confidence
Interval
Exp R:Test data
Emp R: Learning sample
% error
30
“SRM” methodology: how to control model
generalization capacity
Expected Risk = Empirical Risk + Confidence Interval
• To minimize Empirical Risk alone will not always give a
good generalization capacity: one will want to minimize
the sum of Empirical Risk and Confidence Interval
• What is important is not Vapnik limit numerical value ,
most often too large to be of any practical use, it is the
fact that this limit is a non decreasing function of model
family function “richness”, ie shattering power
31
SRM strategy (1)
• With probability 1-q,
• When h/L is too large, second term of equation
becomes large
• SRM basic idea for strategy is to minimize
simultaneously both terms standing on the right of
this majoring equation for R(w)
• To do this, one has to make h a controlled parameter
32
SRM strategy (2)
• Let us consider a sequence S1 < S2 < .. < Sn of model
family functions, with respective growing VC
dimensions
h1 < h2 < .. < hn
• For each family Si of our sequence, the inequality
is valid
33
SRM strategy (3)
SRM : find i such that expected risk R(w) becomes
minimum, for a specific h*=hi, relating to a specific
family Si of our sequence; build model using f from Si
Empirical
Risk
Risk
Model Complexity
Total Risk
Confidence interval
In h/L
Best Model
h*
34
How to chose h*: cross-validation
• Learning sample of size L is divided in two: basic learning set
of size L1, and validation set of size L2
• For a given meta-parameter that controls the model family S
richness, hence its h, a model is built on basic learning set,
and its actual risk is measured on validation set
• Meta-parameter is determined so that model actual risk is
minimum on validation set: this leads to the best family, ie h*
• Final model is computed from this optimal family: best trade-
off between fit and robustness is achieved by construction
35
Some Learning Machines
• Linear models
• Polynomial models
• Kernel methods
• Neural networks
• Decision trees
36
Learning Process
• Learning machines include:
– Linear discriminant (including Naïve Bayes)
– Kernel methods
– Neural networks
– Decision trees, Random Forests
• Learning is tuning:
– Parameters (weights w or αααα, threshold b)
– Hyperparameters (basis functions, kernels, number of
units, number of features/attributes)
37
Industrial Data Mining: implementation example
x1
xn
x3
x2
Output
System
y1
yp
y2
Input
k
x k
y
DataPreparation
Learning
Algorithm
Class of Models
DataEncoding
LossCriterion
k
x
k
y
Descriptors
Automatic
via SRM
Ridge
regression
KI
(Gini index)
Polynomials
( )
κ, σκ, σκ, σκ, σ
γγγγ
w
38
Data Encoding/Compression
• Encodes nominal and ordinal variables
numerically.
• Encodes continuous variables non-linearly.
• Compresses variables in robust categories.
• Handles missing values and outliers.
• This process includes adjustable hyper-
parameters.
39
Multiple Structures
S1⊂ S2 ⊂ … SN
• Weight decay/Ridge regression:
Sk = { w | ||w||2< ωk }, ω1<ω2<…<ωk
γ1 > γ2 > γ3 >… > γk (γ is the ridge)
• Feature selection:
Sk = { w | ||w||0< σk },
σ1<σ2<…<σk (σ is the number of features)
• Data compression:
κ1<κ2<…<κk (κ may be the number of clusters)
40
Hyper-parameter selection
• w = parameter vector.
γ, σ, κ = hyper-parameters.
• Cross-validation with K-folds:
• For various values of γ, σ, κ:
– Adjust w on (K-1)/K training
examples.
– Test on K remaining examples.
– Rotate examples and average test
results (CV error).
– Select γ, σ, κ to minimize CV error.
– Re-compute w on all training
examples using opt. γ, σ, κ.
X y
Prospective
study /
“real”
validation
Trainingdata:MakeKfoldsTestdata
41
SRM put to work : campaign optimization
O
MKI =
O
M
Fraction of customers selected
Fractionofgoodcustomersselected
Ideal Lift
100%
100%Customers
ordered
according
to f(x);
selection
of the top
ranking
customers.
G
CV lift
O
GKR −=1
42
Summary
• Weight decay is a powerful mean of
overfitting avoidance.
• It is also known as “ridge regression”.
• It is grounded in the SRM theory.
• Multiple structures are used by most
current DM engines : ridge, feature
selection, data compression.
43
Quelques exemples concrets
• Census : expliquer ce qui fait que l’on gagne
plus ou moins de $50000/an
• Données biostatistiques : feature reduction
44
Ockham’s Razor
• Principle proposed by William of
Ockham in the fourteenth
century: “Pluralitas non est
ponenda sine neccesitate”.
• Of two theories providing
similarly good predictions, prefer
the simplest one.
• Shave off unnecessary
parameters of your models.
45
Vision : l’Atelier de modélisation prédictive
• Le data mining/machine learning intervient en
amont pour sélectionner dans un grand ensemble de
variables, sur une problématique, les « bonnes »
variables susceptibles d’inférence utile. Cette étape
peut être « automatisée »
• On met ensuite en place la stratification, la
randomisation et les RCT appropriés, à partir de ces
variables « particulièrement intéressantes »
• On finit par les tests sur les résultats (étape qui peut
être aussi automatisée)
• => un accélérateur de production de résultats pour
une Evidence Based Policy toujours plus efficace
46

More Related Content

Viewers also liked

Modélisation d'affaires et outils de gestion
Modélisation d'affaires et outils de gestionModélisation d'affaires et outils de gestion
Modélisation d'affaires et outils de gestionSophie Marchand
 
Streamlining Technology to Reduce Complexity and Improve Productivity
Streamlining Technology to Reduce Complexity and Improve ProductivityStreamlining Technology to Reduce Complexity and Improve Productivity
Streamlining Technology to Reduce Complexity and Improve ProductivityKevin Fream
 
07 history of cv vision paradigms - system - algorithms - applications - eva...
07  history of cv vision paradigms - system - algorithms - applications - eva...07  history of cv vision paradigms - system - algorithms - applications - eva...
07 history of cv vision paradigms - system - algorithms - applications - eva...zukun
 
Machine Learning techniques
Machine Learning techniques Machine Learning techniques
Machine Learning techniques Jigar Patel
 
Graphical Models for chains, trees and grids
Graphical Models for chains, trees and gridsGraphical Models for chains, trees and grids
Graphical Models for chains, trees and gridspotaters
 
Some Take-Home Message about Machine Learning
Some Take-Home Message about Machine LearningSome Take-Home Message about Machine Learning
Some Take-Home Message about Machine LearningGianluca Bontempi
 
One Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database RevolutionOne Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database Revolutionmark madsen
 
Power of Code: What you don’t know about what you know
Power of Code: What you don’t know about what you knowPower of Code: What you don’t know about what you know
Power of Code: What you don’t know about what you knowcdathuraliya
 
Applying Reinforcement Learning for Network Routing
Applying Reinforcement Learning for Network RoutingApplying Reinforcement Learning for Network Routing
Applying Reinforcement Learning for Network Routingbutest
 
Pattern Recognition and Machine Learning : Graphical Models
Pattern Recognition and Machine Learning : Graphical ModelsPattern Recognition and Machine Learning : Graphical Models
Pattern Recognition and Machine Learning : Graphical Modelsbutest
 
graphical models for the Internet
graphical models for the Internetgraphical models for the Internet
graphical models for the Internetantiw
 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer InsightMapR Technologies
 
Web Crawling and Reinforcement Learning
Web Crawling and Reinforcement LearningWeb Crawling and Reinforcement Learning
Web Crawling and Reinforcement LearningFrancesco Gadaleta
 
A real-time big data architecture for glasses detection using computer vision...
A real-time big data architecture for glasses detection using computer vision...A real-time big data architecture for glasses detection using computer vision...
A real-time big data architecture for glasses detection using computer vision...Alberto Fernandez Villan
 
[PRML 3.1~3.2] Linear Regression / Bias-Variance Decomposition
[PRML 3.1~3.2] Linear Regression / Bias-Variance Decomposition [PRML 3.1~3.2] Linear Regression / Bias-Variance Decomposition
[PRML 3.1~3.2] Linear Regression / Bias-Variance Decomposition DongHyun Kwak
 
A system to filter unwanted messages from osn user walls
A system to filter unwanted messages from osn user wallsA system to filter unwanted messages from osn user walls
A system to filter unwanted messages from osn user wallsIEEEFINALYEARPROJECTS
 
Big Data Paradigm - Analysis, Application and Challenges
Big Data Paradigm - Analysis, Application and ChallengesBig Data Paradigm - Analysis, Application and Challenges
Big Data Paradigm - Analysis, Application and ChallengesUyoyo Edosio
 
Aggregation for searching complex information spaces
Aggregation for searching complex information spacesAggregation for searching complex information spaces
Aggregation for searching complex information spacesMounia Lalmas-Roelleke
 

Viewers also liked (20)

L&B wk 3
L&B wk 3L&B wk 3
L&B wk 3
 
Modélisation d'affaires et outils de gestion
Modélisation d'affaires et outils de gestionModélisation d'affaires et outils de gestion
Modélisation d'affaires et outils de gestion
 
Streamlining Technology to Reduce Complexity and Improve Productivity
Streamlining Technology to Reduce Complexity and Improve ProductivityStreamlining Technology to Reduce Complexity and Improve Productivity
Streamlining Technology to Reduce Complexity and Improve Productivity
 
07 history of cv vision paradigms - system - algorithms - applications - eva...
07  history of cv vision paradigms - system - algorithms - applications - eva...07  history of cv vision paradigms - system - algorithms - applications - eva...
07 history of cv vision paradigms - system - algorithms - applications - eva...
 
Machine Learning techniques
Machine Learning techniques Machine Learning techniques
Machine Learning techniques
 
Graphical Models for chains, trees and grids
Graphical Models for chains, trees and gridsGraphical Models for chains, trees and grids
Graphical Models for chains, trees and grids
 
Some Take-Home Message about Machine Learning
Some Take-Home Message about Machine LearningSome Take-Home Message about Machine Learning
Some Take-Home Message about Machine Learning
 
Supervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured TextSupervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured Text
 
One Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database RevolutionOne Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database Revolution
 
Power of Code: What you don’t know about what you know
Power of Code: What you don’t know about what you knowPower of Code: What you don’t know about what you know
Power of Code: What you don’t know about what you know
 
Applying Reinforcement Learning for Network Routing
Applying Reinforcement Learning for Network RoutingApplying Reinforcement Learning for Network Routing
Applying Reinforcement Learning for Network Routing
 
Pattern Recognition and Machine Learning : Graphical Models
Pattern Recognition and Machine Learning : Graphical ModelsPattern Recognition and Machine Learning : Graphical Models
Pattern Recognition and Machine Learning : Graphical Models
 
graphical models for the Internet
graphical models for the Internetgraphical models for the Internet
graphical models for the Internet
 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer Insight
 
Web Crawling and Reinforcement Learning
Web Crawling and Reinforcement LearningWeb Crawling and Reinforcement Learning
Web Crawling and Reinforcement Learning
 
A real-time big data architecture for glasses detection using computer vision...
A real-time big data architecture for glasses detection using computer vision...A real-time big data architecture for glasses detection using computer vision...
A real-time big data architecture for glasses detection using computer vision...
 
[PRML 3.1~3.2] Linear Regression / Bias-Variance Decomposition
[PRML 3.1~3.2] Linear Regression / Bias-Variance Decomposition [PRML 3.1~3.2] Linear Regression / Bias-Variance Decomposition
[PRML 3.1~3.2] Linear Regression / Bias-Variance Decomposition
 
A system to filter unwanted messages from osn user walls
A system to filter unwanted messages from osn user wallsA system to filter unwanted messages from osn user walls
A system to filter unwanted messages from osn user walls
 
Big Data Paradigm - Analysis, Application and Challenges
Big Data Paradigm - Analysis, Application and ChallengesBig Data Paradigm - Analysis, Application and Challenges
Big Data Paradigm - Analysis, Application and Challenges
 
Aggregation for searching complex information spaces
Aggregation for searching complex information spacesAggregation for searching complex information spaces
Aggregation for searching complex information spaces
 

Similar to Les outils de modélisation des Big Data

ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..butest
 
ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..butest
 
06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptxSaharA84
 
maXbox starter67 machine learning V
maXbox starter67 machine learning VmaXbox starter67 machine learning V
maXbox starter67 machine learning VMax Kleiner
 
Surface features with nonparametric machine learning
Surface features with nonparametric machine learningSurface features with nonparametric machine learning
Surface features with nonparametric machine learningSylvain Ferrandiz
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017Manish Pandey
 
604_multiplee.ppt
604_multiplee.ppt604_multiplee.ppt
604_multiplee.pptRufesh
 
Computational Finance Introductory Lecture
Computational Finance Introductory LectureComputational Finance Introductory Lecture
Computational Finance Introductory LectureStuart Gordon Reid
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systemsrecsysfr
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperJames by CrowdProcess
 
Econometrics of High-Dimensional Sparse Models
Econometrics of High-Dimensional Sparse ModelsEconometrics of High-Dimensional Sparse Models
Econometrics of High-Dimensional Sparse ModelsNBER
 
Chapter4
Chapter4Chapter4
Chapter4Vu Vo
 
13ClassifierPerformance.pdf
13ClassifierPerformance.pdf13ClassifierPerformance.pdf
13ClassifierPerformance.pdfssuserdce5c21
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validationStéphane Canu
 
2.7 other classifiers
2.7 other classifiers2.7 other classifiers
2.7 other classifiersKrish_ver2
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)NYversity
 

Similar to Les outils de modélisation des Big Data (20)

ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..
 
ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..
 
06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx
 
Lecture6 xing
Lecture6 xingLecture6 xing
Lecture6 xing
 
BIIntro.ppt
BIIntro.pptBIIntro.ppt
BIIntro.ppt
 
maXbox starter67 machine learning V
maXbox starter67 machine learning VmaXbox starter67 machine learning V
maXbox starter67 machine learning V
 
Surface features with nonparametric machine learning
Surface features with nonparametric machine learningSurface features with nonparametric machine learning
Surface features with nonparametric machine learning
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 
604_multiplee.ppt
604_multiplee.ppt604_multiplee.ppt
604_multiplee.ppt
 
Computational Finance Introductory Lecture
Computational Finance Introductory LectureComputational Finance Introductory Lecture
Computational Finance Introductory Lecture
 
Machine learning
Machine learningMachine learning
Machine learning
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systems
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paper
 
Econometrics of High-Dimensional Sparse Models
Econometrics of High-Dimensional Sparse ModelsEconometrics of High-Dimensional Sparse Models
Econometrics of High-Dimensional Sparse Models
 
Chapter4
Chapter4Chapter4
Chapter4
 
13ClassifierPerformance.pdf
13ClassifierPerformance.pdf13ClassifierPerformance.pdf
13ClassifierPerformance.pdf
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validation
 
2.7 other classifiers
2.7 other classifiers2.7 other classifiers
2.7 other classifiers
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)
 

More from Kezhan SHI

Big data fp prez nouv. formation_datascience_15-sept
Big data fp prez nouv. formation_datascience_15-septBig data fp prez nouv. formation_datascience_15-sept
Big data fp prez nouv. formation_datascience_15-septKezhan SHI
 
Big data fiche data science 15 09 14
Big data fiche data science 15 09 14Big data fiche data science 15 09 14
Big data fiche data science 15 09 14Kezhan SHI
 
Big data ads gouvernance ads v2[
Big data ads   gouvernance ads v2[Big data ads   gouvernance ads v2[
Big data ads gouvernance ads v2[Kezhan SHI
 
Big data f prez formation_datascience_14-sept
Big data f prez formation_datascience_14-septBig data f prez formation_datascience_14-sept
Big data f prez formation_datascience_14-septKezhan SHI
 
B -technical_specification_for_the_preparatory_phase__part_ii_
B  -technical_specification_for_the_preparatory_phase__part_ii_B  -technical_specification_for_the_preparatory_phase__part_ii_
B -technical_specification_for_the_preparatory_phase__part_ii_Kezhan SHI
 
A -technical_specification_for_the_preparatory_phase__part_i_
A  -technical_specification_for_the_preparatory_phase__part_i_A  -technical_specification_for_the_preparatory_phase__part_i_
A -technical_specification_for_the_preparatory_phase__part_i_Kezhan SHI
 
20140806 traduction hypotheses_sous-jacentes_formule_standard
20140806 traduction hypotheses_sous-jacentes_formule_standard20140806 traduction hypotheses_sous-jacentes_formule_standard
20140806 traduction hypotheses_sous-jacentes_formule_standardKezhan SHI
 
20140613 focus-specifications-techniques-2014
20140613 focus-specifications-techniques-201420140613 focus-specifications-techniques-2014
20140613 focus-specifications-techniques-2014Kezhan SHI
 
20140516 traduction spec_tech_eiopa_2014_bilan
20140516 traduction spec_tech_eiopa_2014_bilan20140516 traduction spec_tech_eiopa_2014_bilan
20140516 traduction spec_tech_eiopa_2014_bilanKezhan SHI
 
C -annexes_to_technical_specification_for_the_preparatory_phase__part_i_
C  -annexes_to_technical_specification_for_the_preparatory_phase__part_i_C  -annexes_to_technical_specification_for_the_preparatory_phase__part_i_
C -annexes_to_technical_specification_for_the_preparatory_phase__part_i_Kezhan SHI
 
Qis5 technical specifications-20100706
Qis5 technical specifications-20100706Qis5 technical specifications-20100706
Qis5 technical specifications-20100706Kezhan SHI
 
Directive solvabilité 2
Directive solvabilité 2Directive solvabilité 2
Directive solvabilité 2Kezhan SHI
 
Directive omnibus 2
Directive omnibus 2Directive omnibus 2
Directive omnibus 2Kezhan SHI
 
Tableau de comparaison bilan S1 et bilan S2
Tableau de comparaison bilan S1 et bilan S2Tableau de comparaison bilan S1 et bilan S2
Tableau de comparaison bilan S1 et bilan S2Kezhan SHI
 
Optimal discretization of hedging strategies rosenbaum
Optimal discretization of hedging strategies   rosenbaumOptimal discretization of hedging strategies   rosenbaum
Optimal discretization of hedging strategies rosenbaumKezhan SHI
 
Machine learning pour les données massives algorithmes randomis´es, en ligne ...
Machine learning pour les données massives algorithmes randomis´es, en ligne ...Machine learning pour les données massives algorithmes randomis´es, en ligne ...
Machine learning pour les données massives algorithmes randomis´es, en ligne ...Kezhan SHI
 
Détection de profils, application en santé et en économétrie geissler
Détection de profils, application en santé et en économétrie   geisslerDétection de profils, application en santé et en économétrie   geissler
Détection de profils, application en santé et en économétrie geisslerKezhan SHI
 
Loi hamon sébastien bachellier
Loi hamon sébastien bachellierLoi hamon sébastien bachellier
Loi hamon sébastien bachellierKezhan SHI
 
Eurocroissance arnaud cohen
Eurocroissance arnaud cohenEurocroissance arnaud cohen
Eurocroissance arnaud cohenKezhan SHI
 
From data and information to knowledge : the web of tomorrow - Serge abitboul...
From data and information to knowledge : the web of tomorrow - Serge abitboul...From data and information to knowledge : the web of tomorrow - Serge abitboul...
From data and information to knowledge : the web of tomorrow - Serge abitboul...Kezhan SHI
 

More from Kezhan SHI (20)

Big data fp prez nouv. formation_datascience_15-sept
Big data fp prez nouv. formation_datascience_15-septBig data fp prez nouv. formation_datascience_15-sept
Big data fp prez nouv. formation_datascience_15-sept
 
Big data fiche data science 15 09 14
Big data fiche data science 15 09 14Big data fiche data science 15 09 14
Big data fiche data science 15 09 14
 
Big data ads gouvernance ads v2[
Big data ads   gouvernance ads v2[Big data ads   gouvernance ads v2[
Big data ads gouvernance ads v2[
 
Big data f prez formation_datascience_14-sept
Big data f prez formation_datascience_14-septBig data f prez formation_datascience_14-sept
Big data f prez formation_datascience_14-sept
 
B -technical_specification_for_the_preparatory_phase__part_ii_
B  -technical_specification_for_the_preparatory_phase__part_ii_B  -technical_specification_for_the_preparatory_phase__part_ii_
B -technical_specification_for_the_preparatory_phase__part_ii_
 
A -technical_specification_for_the_preparatory_phase__part_i_
A  -technical_specification_for_the_preparatory_phase__part_i_A  -technical_specification_for_the_preparatory_phase__part_i_
A -technical_specification_for_the_preparatory_phase__part_i_
 
20140806 traduction hypotheses_sous-jacentes_formule_standard
20140806 traduction hypotheses_sous-jacentes_formule_standard20140806 traduction hypotheses_sous-jacentes_formule_standard
20140806 traduction hypotheses_sous-jacentes_formule_standard
 
20140613 focus-specifications-techniques-2014
20140613 focus-specifications-techniques-201420140613 focus-specifications-techniques-2014
20140613 focus-specifications-techniques-2014
 
20140516 traduction spec_tech_eiopa_2014_bilan
20140516 traduction spec_tech_eiopa_2014_bilan20140516 traduction spec_tech_eiopa_2014_bilan
20140516 traduction spec_tech_eiopa_2014_bilan
 
C -annexes_to_technical_specification_for_the_preparatory_phase__part_i_
C  -annexes_to_technical_specification_for_the_preparatory_phase__part_i_C  -annexes_to_technical_specification_for_the_preparatory_phase__part_i_
C -annexes_to_technical_specification_for_the_preparatory_phase__part_i_
 
Qis5 technical specifications-20100706
Qis5 technical specifications-20100706Qis5 technical specifications-20100706
Qis5 technical specifications-20100706
 
Directive solvabilité 2
Directive solvabilité 2Directive solvabilité 2
Directive solvabilité 2
 
Directive omnibus 2
Directive omnibus 2Directive omnibus 2
Directive omnibus 2
 
Tableau de comparaison bilan S1 et bilan S2
Tableau de comparaison bilan S1 et bilan S2Tableau de comparaison bilan S1 et bilan S2
Tableau de comparaison bilan S1 et bilan S2
 
Optimal discretization of hedging strategies rosenbaum
Optimal discretization of hedging strategies   rosenbaumOptimal discretization of hedging strategies   rosenbaum
Optimal discretization of hedging strategies rosenbaum
 
Machine learning pour les données massives algorithmes randomis´es, en ligne ...
Machine learning pour les données massives algorithmes randomis´es, en ligne ...Machine learning pour les données massives algorithmes randomis´es, en ligne ...
Machine learning pour les données massives algorithmes randomis´es, en ligne ...
 
Détection de profils, application en santé et en économétrie geissler
Détection de profils, application en santé et en économétrie   geisslerDétection de profils, application en santé et en économétrie   geissler
Détection de profils, application en santé et en économétrie geissler
 
Loi hamon sébastien bachellier
Loi hamon sébastien bachellierLoi hamon sébastien bachellier
Loi hamon sébastien bachellier
 
Eurocroissance arnaud cohen
Eurocroissance arnaud cohenEurocroissance arnaud cohen
Eurocroissance arnaud cohen
 
From data and information to knowledge : the web of tomorrow - Serge abitboul...
From data and information to knowledge : the web of tomorrow - Serge abitboul...From data and information to knowledge : the web of tomorrow - Serge abitboul...
From data and information to knowledge : the web of tomorrow - Serge abitboul...
 

Recently uploaded

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 

Recently uploaded (20)

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 

Les outils de modélisation des Big Data

  • 1. Les outils de modélisation des Big Data SEPIA 3 dec 13 Pr Michel Béra Chaire de Modélisation statistique du Risque CNAM/SITI/IMATH 1
  • 2. • Plan de l’exposé – L’inégalité de Vapnik et les fondements d’une nouvelle théorie de la robustesse (1971 et 1995) – Éclairages sur les méthodes classiques (NN, Decision Trees, analyse factorielle) – La notion de géométrie des données et d’espace étendu – le Kernel Trick – quali et quanti : un combat dépassé – Big Data et monde vapnikien, utopies et réalités – notions de complexité informatique – Modélisation moderne : un enchaînement d’approches, du Machine Learning aveugle aux finesses de l’Evidence based Policy 2
  • 3. Theoretical Statistics « Data are as they are » Applied Statistics « modeling data then testing » Theory of ill-posed problems Empirical Methods of conjuration (PCA,NN,Bayes) 1974 VC Dimension 2001: Start of the internet era, Millions of records & thousands of variables 1980 SRM (Vapnik) 1995 Support Vector Machines (Vapnik) 1960: Mainframe. Huge Datasets start appearing. 1930 Kolmogorov-SmirnovFisher 1950Cramer High dimensionnal problems malediction STOP ! Watch out ! GO ! Statistical history 3
  • 4. 1. Le monde de Vapnik - Conférence aux Bell Labs (New Jersey) de 1995 4
  • 5. Consistency : definition 1) A learning process (model) is said to be consistent if model error, measured on new data sampled from the same underlying probability laws of our original sample, converges, when original sample size increases, towards model error, measured on original sample. 2) A model that is consistent is also said to generalize well, or to be robust 5
  • 6. %error number of training examples Test error Training error Consistent training? %error number of training examples Test error Training error 6
  • 7. Generalization: definition • Generalization capacity for a model describes how (ex: error function) a model will perform on data that he has never seen before (in his training set) • Good generalization for a model means that model errors on new unknown data will be of the same « size » than known error on his training set. The model is also called « robust ». 7
  • 8. Overfitting -10 -8 -6 -4 -2 0 2 4 6 8 10 -0.5 0 0.5 1 1.5 Example: Polynomial regression Target: a 10th degree polynomial + noise Learning machine: y=w0+w1x + w2x2 …+ w10x10 8
  • 9. Overfitting Avoidance -10 -8 -6 -4 -2 0 2 4 6 8 10 -0.5 0 0.5 1 1.5 d=10, r= 0.01 -10 -8 -6 -4 -2 0 2 4 6 8 10 -0.5 0 0.5 1 1.5 d=10, r= 0.1 -10 -8 -6 -4 -2 0 2 4 6 8 10 -0.5 0 0.5 1 1.5 d=10, r= 1 -10 -8 -6 -4 -2 0 2 4 6 8 10 -0.5 0 0.5 1 1.5 d=10, r= 10 -10 -8 -6 -4 -2 0 2 4 6 8 10 -0.5 0 0.5 1 1.5 d=10, r=1e+002 -10 -8 -6 -4 -2 0 2 4 6 8 10 -0.5 0 0.5 1 1.5 d=10, r=1e+003 -10 -8 -6 -4 -2 0 2 4 6 8 10 -0.5 0 0.5 1 1.5 d=10, r=1e+004 -10 -8 -6 -4 -2 0 2 4 6 8 10 -0.5 0 0.5 1 1.5 d=10, r=1e+005 -10 -8 -6 -4 -2 0 2 4 6 8 10 -0.5 0 0.5 1 1.5 d=10, r=1e+006 -10 -8 -6 -4 -2 0 2 4 6 8 10 -0.5 0 0.5 1 1.5 d=10, r=1e+007 -10 -8 -6 -4 -2 0 2 4 6 8 10 -0.5 0 0.5 1 1.5 d=10, r=1e+008 Example: Polynomial regression Target: a 10th degree polynomial + noise Learning machine: y=w0+w1x + w2x2 …+ w10x10 9
  • 10. Vapnik approach to modeling (1) • Vapnik approach is based on the family of functions S = {f(X,w), w ε W}, in which a model is chosen as a specific function, described by a specific w • For Vapnik, the model function must answer properly for a given row X the question described by target Y, ie predict Y, the quality of the answer being measured by a cost function Q • Different families of functions may provide the same « quality » of answer 10
  • 11. Vapnik approach to modeling (2) • All the trick is then to find a good family of functions S, that not only answers in a « good way » the question described by target Y, but that can also be easy to understand, ie also provide a good description, allowing to explain easily what is underlying the data behaviour of the problem question • VC dimension will be a key to understand and control model robustness 11
  • 12. VC dimension - definition (1) • Let us consider a sample (x1, .. , xL) from Rn • There are 2L different ways to separate the sample in two sub- samples • A set S of functions f(X,w) shatters the sample if all 2L separations can be defined by different f(X,w) from family S 12
  • 13. VC dimension - definition (2) A function family S has VC dimension h (h is an integer) if: 1) Every sample of h vectors from Rn can be shattered by a function from S 2) There is at least one sample of h+1 vectors that cannot be shattered by any function from S 13
  • 14. Example: VC dimension VC dimension: - Measures the complexity of a solution (function). - Is not directly related to the number of variables VC dimension: - Measures the complexity of a solution (function). - Is not directly related to the number of variables 14
  • 15. Other examples • VC dimension for hyperplanes of Rn is n+1 • VC dimension of set of functions: f(x,w) = sign (sin (w.x) ), c <= x <= 1, c>0, where w is a free parameter, is infinite. – Conclusion : VC dimension is not always equal to the number n of parameters (X1,..,Xn) of a given family S of functions from Rn to {-1,+1}. 15
  • 16. Key Example: linear models -> y = <w|x> + b • VC dimension of family S of linear models: with: depends on C and can take any value between 0 and n. This is the basis for Machine Learning approaches such as SVM (Support Vector Machines) or Ridge Regression. 16
  • 17. VC dimension : interpretation • VC dimension of S: an integer, that measures the shattering (or separating) power (“complexity”) of function family S: • We shall now show that VC dimension (a major theorem from Vapnik) gives a powerful indication for model consistency, hence “robustness”. 17
  • 18. What is a Risk Functional? • A function of the parameters of the learning machine, assessing how much it is expected to fail on a given task. Parameter space (w) R[f(x,w)] w* 18
  • 19. Examples of Risk Functionals • Classification: – Error rate – AUC • Regression: – Mean square error 19
  • 20. Lift Curve O MKI = O M Fraction of customers selected Fractionofgoodcustomersselected Ideal Lift 100% 100%Customers ordered according to f(x); selection of the top ranking customers. Gini index 0 ≤≤≤≤ KI ≤≤≤≤ 1 20
  • 21. Statistical Learning Theoretical Foundations • Structural Risk Minimization • Regularization • Weight decay • Feature selection • Data compression 21
  • 22. Learning Theory Problem (1) • A model computes a function: • Problem : minimize in w Risk Expectation – w : a parameter that specifies the chosen model – z = (X, y) are possible values for attributes (variables) – Q measures (quantifies) model error cost – P(z) is the underlying probability law (unknown) for data z 22
  • 23. • We get L data from learning sample (z1, .. , zL), and we suppose them iid sampled from law P(z). • To minimize R(w), we start by minimizing Empirical Risk over this sample : • Example of classical cost functions : – classification (eg. Q can be a cost function based on cost for misclassified points) – regression (eg. Q can be a cost function of least squares type) Learning Theory Problem (2) 23
  • 24. Learning Theory Problem (3) • Central problem for Statistical Learning Theory: What is the relation between Risk Expectation R(W) and Empirical Risk E(W)? • How to define and measure a generalization capacity (“robustness”) for a model ? 24
  • 25. Four Pillars for SLT (1 and 2) • Consistency (guarantees generalization) – Under what conditions will a model be consistent ? • Model convergence speed (a measure for generalization capacity) – How does generalization capacity improve when sample size L grows? 25
  • 26. Four Pillars for SLT (3 and 4) • Generalization capacity control – How to control in an efficient way model generalization starting with the only given information we have: our sample data? • A strategy for good learning algorithms – Is there a strategy that guarantees, measures and controls our learning model generalization capacity ? 26
  • 27. Vapnik main theorem • Q : Under which conditions will a learning process (model) be consistent? • R : A model will be consistent if and only if the function f that defines the model comes from a family of functions S with finite VC dimension h • A finite VC dimension h not only guarantees a generalization capacity (consistency), but to pick f in a family S with finite VC dimension h is the only way to build a model that generalizes. 27
  • 28. Model convergence speed (generalization capacity) • Q : What is the nature of model risk difference between learning data (sample: empirical risk) and test data (expected risk), for a sample of finite size L? • R : This difference is no greater than a limit that only depends on the ratio between VC dimension h of model functions family S, and sample size L, ie h/L This statement is a new theorem that belongs to Kolmogorov- Smirnov way for results, ie theorems that do not depend on data’s underlying probability law. 28
  • 29. Empirical risk minimization in LS case • With probability 1-q, the following inequality is true: where w0 is the parameter w value that minimizes Empirical Risk: 29
  • 30. Model convergence speed Sample size L Confidence Interval Exp R:Test data Emp R: Learning sample % error 30
  • 31. “SRM” methodology: how to control model generalization capacity Expected Risk = Empirical Risk + Confidence Interval • To minimize Empirical Risk alone will not always give a good generalization capacity: one will want to minimize the sum of Empirical Risk and Confidence Interval • What is important is not Vapnik limit numerical value , most often too large to be of any practical use, it is the fact that this limit is a non decreasing function of model family function “richness”, ie shattering power 31
  • 32. SRM strategy (1) • With probability 1-q, • When h/L is too large, second term of equation becomes large • SRM basic idea for strategy is to minimize simultaneously both terms standing on the right of this majoring equation for R(w) • To do this, one has to make h a controlled parameter 32
  • 33. SRM strategy (2) • Let us consider a sequence S1 < S2 < .. < Sn of model family functions, with respective growing VC dimensions h1 < h2 < .. < hn • For each family Si of our sequence, the inequality is valid 33
  • 34. SRM strategy (3) SRM : find i such that expected risk R(w) becomes minimum, for a specific h*=hi, relating to a specific family Si of our sequence; build model using f from Si Empirical Risk Risk Model Complexity Total Risk Confidence interval In h/L Best Model h* 34
  • 35. How to chose h*: cross-validation • Learning sample of size L is divided in two: basic learning set of size L1, and validation set of size L2 • For a given meta-parameter that controls the model family S richness, hence its h, a model is built on basic learning set, and its actual risk is measured on validation set • Meta-parameter is determined so that model actual risk is minimum on validation set: this leads to the best family, ie h* • Final model is computed from this optimal family: best trade- off between fit and robustness is achieved by construction 35
  • 36. Some Learning Machines • Linear models • Polynomial models • Kernel methods • Neural networks • Decision trees 36
  • 37. Learning Process • Learning machines include: – Linear discriminant (including Naïve Bayes) – Kernel methods – Neural networks – Decision trees, Random Forests • Learning is tuning: – Parameters (weights w or αααα, threshold b) – Hyperparameters (basis functions, kernels, number of units, number of features/attributes) 37
  • 38. Industrial Data Mining: implementation example x1 xn x3 x2 Output System y1 yp y2 Input k x k y DataPreparation Learning Algorithm Class of Models DataEncoding LossCriterion k x k y Descriptors Automatic via SRM Ridge regression KI (Gini index) Polynomials ( ) κ, σκ, σκ, σκ, σ γγγγ w 38
  • 39. Data Encoding/Compression • Encodes nominal and ordinal variables numerically. • Encodes continuous variables non-linearly. • Compresses variables in robust categories. • Handles missing values and outliers. • This process includes adjustable hyper- parameters. 39
  • 40. Multiple Structures S1⊂ S2 ⊂ … SN • Weight decay/Ridge regression: Sk = { w | ||w||2< ωk }, ω1<ω2<…<ωk γ1 > γ2 > γ3 >… > γk (γ is the ridge) • Feature selection: Sk = { w | ||w||0< σk }, σ1<σ2<…<σk (σ is the number of features) • Data compression: κ1<κ2<…<κk (κ may be the number of clusters) 40
  • 41. Hyper-parameter selection • w = parameter vector. γ, σ, κ = hyper-parameters. • Cross-validation with K-folds: • For various values of γ, σ, κ: – Adjust w on (K-1)/K training examples. – Test on K remaining examples. – Rotate examples and average test results (CV error). – Select γ, σ, κ to minimize CV error. – Re-compute w on all training examples using opt. γ, σ, κ. X y Prospective study / “real” validation Trainingdata:MakeKfoldsTestdata 41
  • 42. SRM put to work : campaign optimization O MKI = O M Fraction of customers selected Fractionofgoodcustomersselected Ideal Lift 100% 100%Customers ordered according to f(x); selection of the top ranking customers. G CV lift O GKR −=1 42
  • 43. Summary • Weight decay is a powerful mean of overfitting avoidance. • It is also known as “ridge regression”. • It is grounded in the SRM theory. • Multiple structures are used by most current DM engines : ridge, feature selection, data compression. 43
  • 44. Quelques exemples concrets • Census : expliquer ce qui fait que l’on gagne plus ou moins de $50000/an • Données biostatistiques : feature reduction 44
  • 45. Ockham’s Razor • Principle proposed by William of Ockham in the fourteenth century: “Pluralitas non est ponenda sine neccesitate”. • Of two theories providing similarly good predictions, prefer the simplest one. • Shave off unnecessary parameters of your models. 45
  • 46. Vision : l’Atelier de modélisation prédictive • Le data mining/machine learning intervient en amont pour sélectionner dans un grand ensemble de variables, sur une problématique, les « bonnes » variables susceptibles d’inférence utile. Cette étape peut être « automatisée » • On met ensuite en place la stratification, la randomisation et les RCT appropriés, à partir de ces variables « particulièrement intéressantes » • On finit par les tests sur les résultats (étape qui peut être aussi automatisée) • => un accélérateur de production de résultats pour une Evidence Based Policy toujours plus efficace 46