Big Data Competition: maximizing your potential  exampled with the 2014 Higgs Boson Machine Learning Challenge

BIG
DATA
COMPETITION:
MAXIMIZING
YOUR
POTENTIAL
EXAMPLED
WITH
THE
2014
HIGGS
BOSON
MACHINE
LEARNING
CHALLENGE
Dr.
Cheng
CHEN
email:
cchen@goDCI.com
twitter:
@cheng_chen_us
Development
Consulting
International
LLC
goDCI.com
this
presentation
is
copyright
protected
© 1

PRESENTER
Ohio State University, Tongji University
Ph.D. Civil Engineering
M.S. Applied Statistics
Minor Computer Science
Advanced trainings:
City and Regional Planning
Industrial and Systems Engineering
Mathematics
Passion: (this) machine learning
2

HIGGS
BOSON
MACHINE
LEARNING
CHALLENGE
• Goal: improve the procedure that produces the selection region of Higgs Boson
• 4 Month Duration
• 1,785 teams
• Many machine learning experts, statisticians, and physicist
• Top 5 are from 5 different countries
3
Hungary
Netherlands
France
Russia
http://www.kaggle.com/c/higgs-‐boson/leaderboard U.S.A/China

Background
4
Data
Model
Understand
read
visualize
read
discuss
Explore Enhance
reduce
generate
cross
validate innovate
find
Train Select Optimize
Validate
apply
fine-‐tune
©

Background
5
Data
Model
Understand
read
visualize
Explore Enhance
reduce
generate
find
innovate
read
discuss
Validate
apply
fine-‐tune
cross
validate
©

• a.k.a
HIGGS
BOSON
the
God
Particle
(explains
some
mass)
• A
fundamental
particle
theorized
in
1964
in
the
Standard
Model
of
Particle
Physics
• “Considered”
discovered
in
2011
–
2013
in
LHC
by
CERN
• A
number
of
prestigious
awards
in
2013,
including
a
Nobel
prize
7
A
"definitive"
answer
might
require
"another
few
years"
after
the
collider's
2015
restart.
deputy
chair
of
physics
at
Brookhaven
National
Laboratory
http://en.wikipedia.org/wiki/Higgs_boson
http://upload.wikimedia.org/wikipedia/commons/0/00/Standard_Model_of_Elementary_Particles.svg

CERN:
THE
EUROPEAN
ORGANIZATION
• Established
FOR
NUCLEAR
RESEARCH
in
1954
• Birth
of
World
Wide
Web
(1989)
8
maps.google.com

• 27
LARGE
HADRON
COLLIDER
(LHC)
km
(17
mi)
in
circumference
• 175
meters
(574
ft)
beneath
ground
• Built
from
1998
to
2008
• Over
10,000
scientists
and
engineers
• Over
100
counties
• Seven
particle
detectors
https://www.llnl.gov/news/llnl-‐set-‐host-‐international-‐lattice-‐physics-‐conference 9
http://en.wikipedia.org/wiki/Large_Hadron_Collider

• 46
meters
long
• 25
meters
in
diameter
• Weighs
about
7,000
tonnes
• Contains
some
3000
km
of
cable
• Involves
ATLAS
roughly
3,000
physicists
from
over
175
institutions
in
38
countries.
10
http://higgsml.lal.in2p3.fr/documentation/

• 46
meters
long
• 25
meters
in
diameter
• Weighs
about
7,000
tonnes
• Contains
some
3000
km
of
cable
• Involves
ATLAS
roughly
3,000
physicists
from
over
175
institutions
in
38
countries.
11

• 46
meters
long
• 25
meters
in
diameter
• Weighs
about
7,000
tonnes
• Contains
some
3000
km
of
cable
• Involves
ATLAS
roughly
3,000
physicists
from
over
175
institutions
in
38
countries.
12

• Higgs
CHALLENGES
IN
DETECTION
OF
HIGGS
BOSON
Boson
can
not
be
measured
directly
(decays
immediately
into
lighter
particles)
• Other
particles
can
decay
into
the
same
set
of
lighter
particles
• PRODUCTION
and
DECAY
of
Higgs
Boson
depends
on
the
mass,
while
mass
was
not
predicted
by
theory
(now
we
know
it
is
close
to
125
Gev)
13
Seeing
a
circular
shaped
shadow
does
not
mean
the
real
object
is
a
sphere
ball
https://www2.physics.ox.ac.uk/sites/default/files/2012-‐03-‐27/sinead_farrington_pdf_17376.pdf

CURRENT
DETECTION
MECHANISM
• Raw
data
collected
from
LHC
• Hundreds
of
millions
of
proton-‐proton
collisions
(event)
per
second
• 400
events
of
interest
are
selected
per
second
– Signal
event
(i.e.
Higgs
Boson)
– Background
event
(i.e.
other
particles)
• Events
in
Ad
Hoc
selection
region
(in
certain
channels)
exceeding
background
noise
14
Needs
improvement
in
significance
and
robustness
in
selection
criteria

SIMPLIFICATIONS
FOR
COMPETITION
• Simulated
Data
• Fixed
mass
(125
GeV)
• Simplified
decay
channel
– Next
Slide
• Simplified
background
events
(three
representative
types
only)
–Decay
of
the
Z
boson
(91.2
GeV)
into
Tau-‐Tau
–Decay
of
a
pair
of
top
quarks
into
lepton
and
hadronic
tau
–“Decay”
of
the
W
boson
into
lepton
and
hadronic
tau
due
to
imperfections
in
the
particle
identification
procedure
• Simplified
objective
function
(significance
score)
15

• Decay
SIMPLIFIED
DECAY
CHANNEL
of
Tau-‐Tau
Channel
only
• One
tau
decays
into
lepton
and
two
neutrino
• The
other
tau
decays
into
hadronic
tau
and
a
neutrino
• (Note:
Neutrinos
can
not
be
detected)
hadronic tau:
a bunch of hadrons
16

• Decay
SIMPLIFIED
DECAY
CHANNEL
of
Tau-‐Tau
Channel
only
• One
tau
decays
into
lepton
and
two
neutrino
• The
other
tau
decays
into
hadronic
tau
and
a
neutrino
• (Note:
Neutrinos
can
not
be
detected)
hadronic tau:
a bunch of hadrons
17

• Decay
SIMPLIFIED
DECAY
CHANNEL
of
Tau-‐Tau
Channel
only
• One
tau
decays
into
lepton
and
two
neutrino
• The
other
tau
decays
into
hadronic
tau
and
a
neutrino
• (Note:
Neutrinos
can
not
be
detected)
18
Jets MET
vectorized
momenta
are
given
hadronic tau:
a bunch of hadrons

Background
19
Data
Model
Understand
read
visualize
Explore Enhance
reduce
generate
find
innovate
read
discuss
Validate
apply
fine-‐tune
cross
validate
©

• 250,000
training
• 550,000
testing
• 30
variables
– 17
Primitive
• Momenta
• Direction
– 13
Derived
DATA
DIMENSION
20
4 rows in training data
EventId
DER_ma
ss_MMC
DER_ma
ss_trans
verse_m
et_lep
DER_ma
ss_vis
DER_pt_
h
DER_del
taeta_jet
_jet
DER_ma
ss_jet_je
t
DER_pro
deta_jet_
jet
DER_del
tar_tau_l
ep
DER_pt_
tot
DER_su
m_pt
100000 138.47 51.655 97.827 27.98 0.91 124.711 2.666 3.064 41.928 197.76
100001 160.937 68.768 103.235 48.146 NA NA NA 3.473 2.078 125.157
100002 NA 162.172 125.953 35.635 NA NA NA 3.148 9.336 197.814
100003 143.905 81.417 80.943 0.414 NA NA NA 3.31 0.414 75.968
EventId
DER_pt_
ratio_lep
_tau
DER_me
t_phi_ce
ntrality
DER_lep
_eta_cen
trality
PRI_tau_
pt
PRI_tau_
eta
PRI_tau_
phi
PRI_lep_
pt
PRI_lep_
eta
PRI_lep_
phi
PRI_met
100000 1.582 1.396 0.2 32.638 1.017 0.381 51.626 2.273 -2.414 16.824
100001 0.879 1.414 NA 42.014 2.039 -3.011 36.918 0.501 0.103 44.704
100002 3.776 1.414 NA 32.154 -0.705 -2.093 121.409 -0.953 1.052 54.283
100003 2.354 -1.285 NA 22.647 -1.655 0.01 53.321 -0.522 -3.1 31.082
EventId
PRI_met
_phi
PRI_met
_sumet
PRI_jet_
num
PRI_jet_l
eading_
pt
PRI_jet_l
eading_e
ta
PRI_jet_l
eading_
phi
PRI_jet_
subleadi
ng_pt
PRI_jet_
subleadi
ng_eta
PRI_jet_
subleadi
ng_phi
PRI_jet_
all_pt
100000 -0.277 258.733 2 67.435 2.15 0.444 46.062 1.24 -2.475 113.497
100001 -1.916 164.546 1 46.226 0.725 1.158 NA NA NA 46.226
100002 -2.186 260.414 1 44.251 2.053 -2.028 NA NA NA 44.251
100003 0.06 86.062 0 NA NA NA NA NA NA 0
EventId Weight Label
100000 0.00265331s133733
100001 2.23358448b717
100002 2.34738894b364
100003 5.44637821b192
Data
loaded
correctly
Notice
NA
values

MISSING
VALUES
21
col_name NA_count
NA_pct
1 EventId
2 DER_mass_MMC
38,114
15%
3 DER_mass_transverse_met_lep
4 DER_mass_vis
5 DER_pt_h
6 DER_deltaeta_jet_jet
177,457
71%
7 DER_mass_jet_jet
177,457
71%
8 DER_prodeta_jet_jet
177,457
71%
9 DER_deltar_tau_lep
10 DER_pt_tot
11 DER_sum_pt
12 DER_pt_ratio_lep_tau
13 DER_met_phi_centrality
14 DER_lep_eta_centrality
177,457
71%
15 PRI_tau_pt
16 PRI_tau_eta
17 PRI_tau_phi
18 PRI_lep_pt
19 PRI_lep_eta
20 PRI_lep_phi
21 PRI_met
22 PRI_met_phi
23 PRI_met_sumet
24 PRI_jet_num
25 PRI_jet_leading_pt
99,913
40%
26 PRI_jet_leading_eta
99,913
40%
27 PRI_jet_leading_phi
99,913
40%
28 PRI_jet_subleading_pt
177,457
71%
29 PRI_jet_subleading_eta
177,457
71%
30 PRI_jet_subleading_phi
177,457
71%
31 PRI_jet_all_pt
32 Weight
33 Label

MISSING
VALUES
22
col_name NA_count
NA_pct
1 EventId
2 DER_mass_MMC
38,114
15%
3 DER_mass_transverse_met_lep
4 DER_mass_vis
5 DER_pt_h
6 DER_deltaeta_jet_jet
177,457
71%
7 DER_mass_jet_jet
177,457
71%
8 DER_prodeta_jet_jet
177,457
71%
9 DER_deltar_tau_lep
10 DER_pt_tot
11 DER_sum_pt
12 DER_pt_ratio_lep_tau
13 DER_met_phi_centrality
14 DER_lep_eta_centrality
177,457
71%
15 PRI_tau_pt
16 PRI_tau_eta
17 PRI_tau_phi
18 PRI_lep_pt
19 PRI_lep_eta
20 PRI_lep_phi
21 PRI_met
22 PRI_met_phi
23 PRI_met_sumet
24 PRI_jet_num
25 PRI_jet_leading_pt
99,913
40%
26 PRI_jet_leading_eta
99,913
40%
27 PRI_jet_leading_phi
99,913
40%
28 PRI_jet_subleading_pt
177,457
71%
29 PRI_jet_subleading_eta
177,457
71%
30 PRI_jet_subleading_phi
177,457
71%
31 PRI_jet_all_pt
32 Weight
33 Label
Notice
the
consistency
in
missing
values

HOW
TO
HANDLE
MISSING
VALUES
• Assign
a
value
– Generate
a
random
value
– Fit
a
value
(mean,
median,
nearest
neighbor,
etc.)
– Fix
a
value
(domain
knowledge)
• Remove
the
record
• Leave
as
is
23

HOW
TO
HANDLE
MISSING
VALUES
• Assign
a
value
– Generate
a
random
value
– Fit
a
value
(mean,
median,
nearest
neighbor,
etc.)
– Fix
a
value
(domain
knowledge)
• Remove
the
record
• Leave
as
is
24

HISTOGRAM
PRI_jet_leading_pt
Count
Log
transformation
Count
Inverse
transformation
Count
Density
is
more
meaningful
in
the
range
of
x No
fuzzy
jump
at
the
edge 25

HISTOGRAM
(CONT’D)
DER_pt_h
Count
Log
transformation
Bi-‐modality
is
revealed 26
Count
Inverse
transformation
Count

INTERACTIVE
VISUALIZATION
R
SHINY
27
http://chencheng.shinyapps.DEMO io/demo_higgs

INTERACTIVE
VISUALIZATION
R
SHINY
28

INTERACTIVE
VISUALIZATION
R
SHINY
29
Use
a
reasonable
number
of
bins
to
display
the
underlying
distribution

INTERACTIVE
VISUALIZATION
R
SHINY
30
Use
a
reasonable
transformation
to
display
the
underlying
distribution

HISTOGRAM
(CONT’D)
31
Count
Transformations
aPrReI
_stoamu_eetitma es
not
necessary

32
Do
that
for
all
30
variables

PAIRWISE
CORRELATIONS
33
Count
Count
BKG
SGN
PRI_lep_phi
&
PRI_met_phi

PAIRWISE
CORRELATIONS
34
Count
BKG
SGN
PRI_lep_phi
&
PRI_met_phi
Set
transparency
parameter
appropriately
to
reveal
important
pattCeronusnt

PAIRWISE
CORRELATIONS
35
Count
BKG
SGN
PRI_lep_phi
&
PRI_met_phi
Correlation
coefficient
==
0
does
not
mean
no
correlation Count

PAIRWISE
CORRELATIONS
36
Count
Count
BKG
SGN
PRI_lep_phi
&
PRI_met_phi

FEATURE
ENHANCEMENT
ROTATION
BKG
SGN
rotated
PRI_lep_phi
&
PRI_met_phi
Validate
visual
“evidence”
from
various
perspectives 37

FEATURE
ENHANCEMENT
ROTATION
BKG
SGN
rotated
PRI_lep_phi
&
PRI_met_phi
Validate
visual
“evidence”
from
various
perspectives 38

PAIRWISE
VARIABLES
—
LOW
RES.
39
Count
Count
BKG
SGN
DER_pt_h
&
DER_deltar_tau_lep

PAIRWISE
VARIABLES
—
HIGH
RES.
Try
High
Resolution 40
Count
Count
BKG
SGN
DER_pt_h
&
DER_deltar_tau_lep

PAIRWISE
VARIABLES
—
HIGH
RES.
Curve
fitting 41
Count
Count
BKG
SGN
DER_pt_h
&
DER_deltar_tau_lep

FEATURE
ENHANCEMENT
CURVE
FITTING
Enhance
a
variable
based
on
correlation
with
another
variable 42
Count
Count
BKG
SGN
DER_pt_h
&
DER_deltar_tau_lep

FEATURE
ENHANCEMENT
ROTATION
BY
PRI_TAU_PHI
43
Domain
Knowledge
Count
Count
BKG
SGN
DER_pt_h
&
PRI_lep_phi

FEATURE
ENHANCEMENT
ROTATION
BY
PRI_TAU_PHI
Feature
enhancement
by
applying
domain
knowledge
44
Count
Count
BKG
SGN
DER_pt_h
&
PRI_lep_phi
Domain
Knowledge

FEATURE
ENHANCEMENT
ROTATION
45
Count
Count
BKG
SGN
PRI_jet_leading_eta
&
PRI_jet_subleading_eta

• Select
DATA
DRILL
DOWN
variable(s):
One
var.
for
histogram,
two
var.
for
scatter
plot
46

• Dynamically
DATA
DRILL
DOWN
select
a
subset
of
data
—
PRI_jet_num
=
2
47

• Patterns
DATA
DRILL
DOWN
in
the
subset
data
—
PRI_jet_leading_eta
&
48

• Dynamically
DATA
DRILL
DOWN
select
a
subset
of
data
—
PRI_jet_num
=
3
49

• Patterns
DATA
DRILL
DOWN
in
the
subset
data
—
PRI_jet_leading_eta
&
50

• Patterns
DATA
DRILL
DOWN
in
the
subset
data
—
PRI_jet_leading_eta
&
51
PRI_jet_num
=
2 PRI_jet_num
=
3
Interactive
data
visualization
techniques
are
helpful

52
Do
that
for
all
30
*
29
~=
900
pairs

PARTICLE
LOCATION
—
(0,
S)
53
Animation
Convert
numerical
data
back
into
actual
object
with
meaning

PARTICLE
LOCATION
—
(0,
B)
54
Animation

INSPIRATION
FROM
ANIMATION
• Distance
ratio
between
MET-‐Lep
and
Tau-‐Lep
d(MET,
Lep)/d(Tau,
Lep)
55
Inspiration
from
meaningful
visualization
can
be
helpful
Count
dist_ratio_met_lep_tau
BKG
SGN

INSPIRATION
FROM
ANIMATION
• Distance
ratio
between
MET-‐Lep
and
Tau-‐Lep
d(MET,
Lep)/d(Tau,
Lep)
BKG
SGN
56
Adjust
visualization
for
better
efficiency
Count
Count
BKG
SGN

• Variable
reduction
– Simple
rotation
– Transformation
– Domain
knowledge
– …
• Feature
generation
– Domain
knowledge
– Inspiration
from
various
visualizations
– Statistical
approaches
–…
FEATURE
ENHANCEMENT
45
degree
rotation
Curve
fitting
Rotation
by
phi
distance_ratio
Principle
component
analysis
57

Background
58
Data
Model
Understand
read
visualize
read
discuss
Explore Enhance
reduce
generate
apply innovate
fine-‐tune
Validate
find
cross
validate
©

• Gradient
boosting
tree
• Neural
network
• Bayesian
network
• Support
vector
machine
• Generalized
additive
model
MODELS
59

• Gradient
boosting
tree
• Neural
network
• Bayesian
network
• Support
vector
machine
• Generalized
additive
model
MODELS
60

• Decision
GRADIENT
BOOSTING
TREE
tree
– Build
many
shallow
trees
• Boosting
– Build
trees
based
on
residual
• Bagging
– Each
tree
uses
a
subset
of
the
data
• Ensembling
– Combine
the
trees
61

• Decision
GRADIENT
BOOSTING
TREE
tree
– Build
many
shallow
trees
• Boosting
– Build
trees
based
on
residual
• Bagging
– Each
tree
uses
a
subset
of
the
data
• Ensembling
– Combine
the
trees
62

• Regression
tree
DECISION
TREE
63
1.0
0.5
0.0
−0.5
−1.0
0.0 2.5 5.0 7.5 10.0
x
y

• Regression
tree
DECISION
TREE
64
1.0
0.5
0.0
−0.5
−1.0
0.0 2.5 5.0 7.5 10.0
x
y
Depth
=
1
|
x< 6.614
x>=6.614
0.19
n=100
−0.08
n=64
0.66
n=36
Regression Tree with Node Depth = 1

• Regression
tree
DECISION
TREE
65
0.19
n=100
|
x< 6.614
x>=6.614
x>=3.049 x>=8.953
x< 3.049 x< 8.953
−0.08
n=64
−0.53
n=40
0.67
n=24
0.66
n=36
0.086
n=7
0.8
n=29
1.0
0.5
0.0
−0.5
−1.0
0.0 2.5 5.0 7.5 10.0
x
y
Depth
=
2

• Regression
tree
DECISION
TREE
66
|
x< 6.614
x>=3.049
x< 5.862
x>=8.953
x< 7.207
x>=6.614
x< 3.049
x>=5.862
x< 8.953
x>=7.207
0.19
n=100
−0.08
n=64
−0.53
n=40
−0.67
n=32
0.045
n=8
0.67
n=24
0.66
n=36
0.086
n=7
0.8
n=29
0.57
n=7
0.87
n=22
1.0
0.5
0.0
−0.5
−1.0
0.0 2.5 5.0 7.5 10.0
x
y
Depth
=
3

• Regression
tree
DECISION
TREE
67
|
x< 6.614
x>=3.049
x< 5.862
x>=3.594
x>=8.953
x< 7.207
x>=6.614
x< 3.049
x>=5.862
x< 3.594
x< 8.953
x>=7.207
0.19
n=100
−0.08
n=64
−0.53
n=40
−0.67
n=32
−0.8
n=25
−0.23
n=7
0.045
n=8
0.67
n=24
0.66
n=36
0.086
n=7
0.8
n=29
0.57
n=7
0.87
n=22
1.0
0.5
0.0
−0.5
−1.0
0.0 2.5 5.0 7.5 10.0
x
y
Depth
=
4

DECISION
TREE
X0
=
X;
Y0
=
Y;
latest_model
=
train_tree(X,
Y);
for
ii
=
1:NUM_ITER
Index_train
=
random(1:NUM_REC,
FRAC_TRAIN
*
NUM_REC)
X
=
X0[Index_train];
Y
=
Y0[Index_train];
v_resid
=
Y
-‐
wts
*
latest_model(X);
tree(ii)
=
train_tree(X,
v_pseudo_resid,
wts);
latest_model
+=
LARNING_RATE
*
tree(ii)
68
base
model

GRADIENT
BOOSTING
TREE
(V.
1)
X0
=
X;
Y0
=
Y;
latest_model
=
train_tree(X,
Y);
for
ii
=
1:NUM_ITER
Index_train
=
random(1:NUM_REC,
FRAC_TRAIN
*
NUM_REC)
X
=
X0[Index_train];
Y
=
Y0[Index_train];
v_resid
=
Y
-‐
latest_model(X);
tree_add=
train_tree(X,
v_resid);
latest_model
+=
LARNING_RATE
*
tree_add
get
the
residuals
fit
a
tree
for
residuals
additive
model
69

(STOCHASTIC)
GRADIENT
BOOSTING
TREE
X0
=
X;
Y0
=
Y;
latest_model
=
train_tree(X,
Y);
for
ii
=
1:NUM_ITER
Index_train
=
random(1:NUM_REC,
FRAC_TRAIN
*
NUM_REC)
X
=
X0[Index_train];
Y
=
Y0[Index_train];
v_resid
=
Y
-‐
latest_model(X);
tree_add
=
train_tree(X,
v_resid);
latest_model
+=
LARNING_RATE
*
tree_add
get
sampled
index
sampled
records
as
input
70
store
input

(STOCHASTIC)
GRADIENT
BOOSTING
TREE
WITH
WEIGHT
X0
=
X;
Y0
=
Y;
latest_model
=
train_tree(X,
Y,
wts);
for
ii
=
1:NUM_ITER
Index_train
=
random(1:NUM_REC,
FRAC_TRAIN
*
NUM_REC)
X
=
X0[Index_train];
Y
=
Y0[Index_train];
v_resid
=
Y
-‐
wts
*
latest_model(X);
tree_add
=
train_tree(X,
v_resid,
wts);
latest_model
+=
LARNING_RATE
*
tree_add
71

(GENERAL)
GRADIENT
BOOSTING
X0
=
X;
Y0
=
Y;
latest_model
=
train_base_model(X,
Y,
wts);
for
ii
=
1:NUM_ITER
Index_train
=
random(1:NUM_REC,
FRAC_TRAIN
*
NUM_REC)
X
=
X0[Index_train];
Y
=
Y0[Index_train];
v_pseudo_resid
=
get_pseudo_residual(X,
Y,
wts,
latest_model,
LOSS_FUNCTION_TYPE);
model_add_base
=
train_base_model(X,
v_pseudo_resid,
wts);
alpha
=
linear_search(cost_function,
model_add_base,
X,
Y,
wts);
latest_model
+=
LARNING_RATE
*
(alpha
*
model_add_base)
[Stochastic Gradient Boosting] Jerome H. Friedman, 1999
72

Background
73
Data
Model
Understand
read
visualize
read
discuss
Explore Enhance
reduce
generate
apply innovate
fine-‐tune
Validate
find
cross
validate
©

APPLYING
GBM
IN
R
gbm_model
=
gbm.fit(
x=train[,x_vars,
with
=
FALSE],
y=train$Label,
distribution
=
char_distr,
w
=
w,
n.trees
=
n_trees,
interaction.depth
=
num_inter,
n.minobsinnode
=
min_obs_node,
shrinkage
=
shrinkage_rate,
bag.fraction
=
frac_bag)
74

VARIABLE
IMPORTANCE
75
Relative
Importance

APPLY
MODEL
ON
TEST
DATA
76
EventId Score RankOrder Class
1 0.98 501 s
2 0.42 259,579 b
3 0.46 264,125 b
. . . .
. . . .
449,998 0.86 31,154 s
449,999 0.12 489,251 b
550,000 0.79 110,154 b

Background
77
Data
Model
Understand
read
visualize
read
discuss
Explore Enhance
reduce
generate
apply innovate
fine-‐tune
Validate
find
cross
validate

GRADIENT
BOOSTING
PARAMETERS
• Number
of
iteration
• Minimum
observation
for
each
node
• Fraction
of
bagging
(0.5
~
0.8)
• Learning
rate
(<0.1)
• Depth
of
tree
(4
~
8)
78

Background
79
Data
Model
Understand
read
visualize
read
discuss
Explore Enhance
reduce
generate
apply innovate
fine-‐tune
Validate
find
cross
validate

• Split
training
data
– 70%
CROSS
VALIDATION
for
training
– 30%
for
cross
validation
• Train
model
(70%)
• Measure
performance
(30%)
80

PERFORMANCE
BASED
ON
AMS
81
Trade-‐off
between:
Ratio
of
Signal/Background
events
Number
of
records
in
selection
region
EventId Score RankOrd
er
Class truth
1 0.98 501 S S
2 0.42 259,579 B
3 0.46 264,125 B
. . . .
. . . .
449,998 0.86 31,154 S B
449,999 0.12 489,251 B
550,000 0.79 110,154 B
Selection
Region
s
=
sum(S)
b=
sum(B)

PERFORMANCE
BASED
ON
AMS
82
Percentile
AMS
AMS
percentage
of
signal

COMPARE
TWO
MODEL
RESULTS
Percentile
83
Training
Cross
validation
Percentile
AMS
AMS
percentage
of
signal

Percentile
84
COMPARE
TWO
MODEL
RESULTS
Training
Cross
validation
Percentile
AMS
AMS
percentage
of
signal

AMS
BY
NUM.
ITERATION
85
Percentile
AMS
Animation

Background
86
Data
Model
Understand
read
visualize
read
discuss
Explore Enhance
reduce
generate
apply innovate
fine-‐tune
Validate
find
cross
validate

s
b
>>
4
HEAT
MAP
OF
AMS
ON
B-‐S
PLAN
87

OPTIMIZATION
BASED
ON
OBJECTIVE
FUNCTION
Percentile
88
A
B
C
AMS

HEAT
MAP
OF
AMS
ON
B-‐S
PLAN
89
s
b
A
B
C

HEAT
MAP
OF
AMS
ON
B-‐S
PLAN
90
s
b
A
B
C
Inspiration
from
Lagrangian
Method
Weight
signal
and
background
events
by
partial
derivatives
of
AMS
function

AMS
CURVE
ON
B-‐S
PLAN
91
A
B
C
Inspiration
from
Lagrangian
Method
Weight
signal
and
background
events
by
partial
derivatives
of
AMS
function
s
partial
derivative
of
AMS
against
s
partial
derivative
of
AMS
against
b
b
Ratio
of
the
derivatives
==>
relative
weight

IMPROVEMENT
DUE
TO
WEIGHTING
92
AMS*
Num_Iterations
AMS

IMPROVEMENT
DUE
TO
WEIGHTING
(CONT’D)
93
Num_Iterations
AMS*
AMS

AUGMENTED
GRADIENT
BOOSTING
95
Apply
GBM
Weight
Adjustment
Remove
very
high
and
very
low
score
records
from
train
and
test
©

IMPROVEMENT
DUE
TO
ELIMINATION
96
Num_Iterations
AMS*
AMS

IMPROVEMENT
DUE
TO
ELIMINATION
(CONT’D)
97
Num_Iterations
AMS*
AMS

AUGMENTED
GRADIENT
BOOSTING
98
Apply
ML
Model
Weight
Adjustment
Remove
very
high
and
very
low
score
records
from
train
and
test
©

Background
99
Data
Model
Understand
read
visualize
read
discuss
Explore Enhance
reduce
generate
apply innovate
fine-‐tune
Validate
find
cross
validate

• Version
OTHER
TOPICS
control
(Git,
Source
Tree)
– Effectively
implement
many
different
ideas
• File
organization
– Efficiently
pull
out
the
file
needed
• Effective
code
(R,
Python)
– it
matters
so
much
when
dealing
with
big
data
100

Thanks
you
for
your
participation!
Any
Questions?
goDCI.com

Big Data Competition: maximizing your potential  exampled with the 2014 Higgs Boson Machine Learning Challenge

Recommended

Recommended

More Related Content

Similar to Big Data Competition: maximizing your potential  exampled with the 2014 Higgs Boson Machine Learning Challenge

Similar to Big Data Competition: maximizing your potential  exampled with the 2014 Higgs Boson Machine Learning Challenge (20)

Recently uploaded

Recently uploaded (20)