Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
1415 track 1 wu_using his laptop
1. 10/24/2017
1
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
the black art of
machine learning
Michael Wu, PhD (@mich8elwu)
chief scientist @ lithium tech
2017.10.31
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
Michael Wu, PhD (@mich8elwu)
chief scientist @ lithium tech
2017.09.28
@mich8elwu
2
2. 10/24/2017
2
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
• data info insight
buying calcium, zinc, magnesium,
cotton balls, and switching to
unscented lotions + soaps is a
predictor of pregnancy
• decision action
coupons for moms, timed to specific
stages of pregnancy
• result
↗ revenue
$44B (2002) → $67B (2010)
THE POWER OF BIG DATA + DATA SCIENCE
btw, did you know your
daughter is pregnant?
big data + analytics
7
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
• data info insight
filling out an loan application with only
capital or lower case letter is
predictive of loan default
• decision action
augment traditional underwriting
regression model w/ thousands of
variables & 10+ models
• result
↘ loan default rate by 40%
↗ market share by 25%
THE POWER OF BIG DATA + DATA SCIENCE
8
3. 10/24/2017
3
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD 9
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
• data has huge amount of
statistical redundancy
duplication
spatial + temporal correlation
collinearity (causality)
• much info we extract from the
data are not insightful
• insights must be
interpretable
relevant
novel (not already known)
DATA ≠ INFORMATION ≠ INSIGHT
big that’s
not statistically
redundant = information
data
that’s not
already known = insight
10
4. 10/24/2017
4
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
• relevant data: signal vs. noise
• relevance is context specific
who:
one man’s signal is
another man’s noise
NOT ALL DATA/INFORMATION ARE RELEVANT
information
data
insight
relevant
to me
relevant to you
noise
14
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
• relevant data: signal vs. noise
• relevance is context specific
who:
one man’s signal is
another man’s noise
when?
where?
what’s relevant is determined by the
problem you are trying to solve or the
question you are trying to answer
NOT ALL DATA/INFORMATION ARE RELEVANT
information
data
insight
relevant
to me when I
am traveling
in Istanbul
today
noise
context is usually
specified in the
problem/question
15
5. 10/24/2017
5
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
big data is
very noisy
17
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
• how do people use data before big data?
WHY IS BIG DATA SO NOISY?
data is almost
always relevant
problem/question
Q
data collection
data
data is collected
specifically to address the
problem/question
18
6. 10/24/2017
6
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
• enables data capture/storage before we have a question
WHAT HAPPENS W/ BIG DATA TECHNOLOGIES?
most big data will be
irrelevant (only a tiny
% of it will be relevant)
data collection problem/question
Q
data is collected
irrespective of any
specific problem /
question / purpose
must find the “relevant
data” whenever we got
a problem/question
19
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
• for all data (any data):
data ≥ information ≥ insight
• for big data:
data information insight
• “a single grain of rice can tip the
scale”
• “1 bit of insightful info. may be
the difference between victory
and defeat”
DATA ≠ INFORMATION ≠ INSIGHT
information
data
insight
>> >>
20
7. 10/24/2017
7
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
• look beyond what’s relevant
look at what you thought were the
irrelevant data/info
• don’t look too far beyond your
relevance boundary
it’s costly and wasteful
hard to establish causality
• you might not find anything, but
when you do, it will be insightful
zest finance
WHERE DO YOU LOOK IN YOUR BIG DATA TO FIND INSIGHTS?
information
data
insight
noise
relevant
signal
21
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
big
data
infor-
mation
DO BIZ REALLY WANT BIG DATA?
insight business
needs
hadoop hivehbase pig
big data
tech.
noSQL impalaspark storm …
hugegap
22
8. 10/24/2017
8
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
big
data
infor-
mation
THE BIG DATA GAP: FROM DATA TO INSIGHTS
insight
?
23
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
big
data
infor-
mation
FROM DATA TO INSIGHTS
insight
data scientist
is currently the only way
companies know how to fill this gap
24
9. 10/24/2017
9
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
so what do data
scientists do?
29
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
DATA SCIENCE INPUT + OUTPUT
30
input output
10. 10/24/2017
10
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
STEP 1: GET THE DATA
data scientist is
~50% data janitor
normalization: type, range, unit,
format, foreign key ref …
exception handling: spam, missing
data, incomplete data …
dedupe, metadata tagging …
POS tagging
entity detection
sampling + sample selection
special handling for rich media
…
31
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
RAW DATA USUALLY DON’T PERFORM WELL
raw data
text, image, sound, video
directly measured data, etc.
“Hello, how are you?”
072 101 108 108 111 044 032 104 111 119 032 097 114 101 032 121 111 117 063
can a machine tell a bird from a plane?
how?
34
11. 10/24/2017
11
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
RAW DATA USUALLY DON’T PERFORM WELL
raw data
text, image, sound, video
directly measured data, etc.
35
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
RAW DATA USUALLY DON’T PERFORM WELL
raw data
text, image, sound, video
directly measured data, etc.
bird (0)
plane (1)
probabilityofbird/plane
any pixel’s color/intensity
~50%
~50%
the info in a pixel is not
discriminating enough
for this task
36
12. 10/24/2017
12
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
probabilityofbird/plane
RAW DATA USUALLY DON’T PERFORM WELL
raw data
text, image, sound, video
directly measured data, etc.
bird (0)
plane (1)
anotherpixel’scolor/intensity
any pixel’s color/intensity
0
0
0
0
0
0
0 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0 0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
any pixel can be part
of a bird or a plane.
37
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
are bad features 38
13. 10/24/2017
13
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
raw data are not only noisy
and “dirty,” they are bad
features!
39
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
FEATURES AND FEATURE ENGINEERING
raw data
text, image, sound, video
directly measured data, etc.
features
any information you derive from the raw
data and make explicit to the learning
algorithm
namesageloan amountincome
normalized
defaultrate
~10%
~10%
any raw data
loan
income
frequency of late (or early) payment,
income (or spent) volatility (stdev),
married or not, have kids or not …
debt
income
avg. monthly spent
income
use of proper capitalization in the application
# saves before submitting,
avg. time between saving (or opening) the application,
date, time, day of week when filling the application …
hair color, eye color,
height, weight …
where did they fill out the application,
sunny or rainy when filling the application …
online application
x = name, age, ID
info, loan amount,
income, spending
+ payment habit…
any
any feature
normalized
defaultrate
the info doesn’t even have
to be in the raw data, they
just have to be derivable
height
feature engineering
the extraction of implicit (or externally
derived) information in (from) the raw data
feature
engineering
feature engineering
the extraction of implicit (or externally
derived) information in (from) the raw data
40
14. 10/24/2017
14
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
occurrenceprobabilityofbeak
occurrence probability of stabilizer
COMING UP WITH BETTER FEATURES
raw data
text, image, sound, video
directly measured data, etc.
birds:
have beak,
have eyes,
have feets,
have feathers …
plane:
have stabilizers,
have engines,
have windows …
feature
engineering
model
obtained by optimizing some
objective function (error, likelihood,
etc.) + model validation
statistics
features
any information you derive from the raw
data and make explicit to the learning
algorithm
41
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
occurrenceprobabilityofbeak
occurrence probability of stabilizer
COMING UP WITH BETTER FEATURES
raw data
text, image, sound, video
directly measured data, etc.
birds:
have beak,
have eyes,
have feets,
have feathers …
plane:
have stabilizers,
have engines,
have windows …
model
obtained by optimizing some
objective function (error, likelihood,
etc.) + model validation
features
any information you derive from the raw
data and make explicit to the learning
algorithm
42
machine learningfeature
engineering statistics
15. 10/24/2017
15
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
data science is
~25% handcrafting
… of features
44
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
“Coming up with features is difficult,
time-consuming, requires expert
knowledge. Applied machine learning
is basically feature engineering.”
—Andrew Ng
hand crafted features are:
- domain specific,
- task specific,
- not generalizable
45
16. 10/24/2017
16
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
CAN WE LEARN “GOOD FEATURES” DIRECTLY FROM THE DATA?
47
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
CAN WE LEARN “GOOD FEATURES” DIRECTLY FROM THE DATA?
raw data
text, image, sound, video
directly measured data, etc.
birds:
have beak,
have eyes,
have bird feet,
have feathers …
plane:
have stabilizers,
have engines,
have windows …
feature
engineering
model
obtained by optimizing some
objective function (error, likelihood,
etc.) + model validation
statistics
features
any information you derive from the raw
data and make explicit to the learning
algorithm
48
17. 10/24/2017
17
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
CAN WE LEARN “GOOD FEATURES” DIRECTLY FROM THE DATA?
raw data
text, image, sound, video
directly measured data, etc.
birds:
have beak,
have eyes,
have bird feet,
have feathers …
plane:
have stabilizers,
have engines,
have windows …
feature
engineering
model
obtained by optimizing some
objective function (error, likelihood,
etc.) + model validation
statistics
features
any information you derive from the raw
data and make explicit to the learning
algorithm
shapes edges pixels
49
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
traditional
machine learning
handcrafted
by experts
work for most (80%) of the
problems in business
faces
DEEP LEARNING
raw data
text, image, sound, video
directly measured data, etc.
feature
engineering
model
obtained by optimizing some
objective function (error, likelihood,
etc.) + model validation
statistics
features
any information you derive from the raw
data and make explicit to the learning
algorithm
deep learning
deep neural network
automatically learned from the data
with different levels of abstraction
....
input=
layer3
layer2
layer1
carselephantschairsfaces
+cars
+airplanes,
+motorbikes
combination of
pixels → edges
combination of edges
→ object parts combination of parts → the object
50
18. 10/24/2017
18
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
traditional
machine learning
handcrafted
by experts
work for most (80%) of the
problems in business
DEEP LEARNING
raw data
text, image, sound, video
directly measured data, etc.
feature
engineering
model
obtained by optimizing some
objective function (error, likelihood,
etc.) + model validation
statistics
features
any information you derive from the raw
data and make explicit to the learning
algorithm
deep learning
deep neural network
automatically learned from the data
with different levels of abstraction
google brain:
16,000 cpu
1,000,000,000+
connections
10,000,000
training images
from youtube
extraordinarily generalizable:
makes machine behaves & think more like
human, but requires lots of data to train
success stories:
computer vision: image labeling, search …
audio signal processing: speaker ID, speech
recognition (speech-text) …
text processing: machine translation, etc.
interesting problems in the industry:
—sentiment analysis
—actionability & intention prediction
—fraud, spam detection …
51
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
domain
expertise
WHAT DO DATA SCIENTIST DO?
raw data
text, image, sound, video
directly measured data, etc.
feature
engineering
model
obtained by optimizing some
objective function (error, likelihood,
etc.) + model validation
statistics
features
any information you derive from the raw
data and make explicit to the learning
algorithm
computer
science
math +
statistics
communication
data visualization, storytelling,
translation of data to business
insights, decisions, and action
domain
expertise
plumbing
cleaning
janitoring
handcrafting
52
19. 10/24/2017
19
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
domain
expertise
WHAT DO DATA SCIENTIST DO?
raw data
text, image, sound, video
directly measured data, etc.
feature
engineering
model
obtained by optimizing some
objective function (error, likelihood,
etc.) + model validation
statistics
features
any information you derive from the raw
data and make explicit to the learning
algorithm
computer
science
math +
statistics
communication
data visualization, storytelling,
translation of data to business
insights, decisions, and action
domain
expertise
domain
expertise
math +
statistics
computer
science
data
science
plumbing
cleaning
janitoring
handcrafting
54
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
thank you, q&a,
+ follow me
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
135
20. 10/24/2017
20
c o n f i d e n t i a l
twitter: @mich8elwu
linkedin.com/in/MichaelWuPhD
want to dig deeper?
sos sos2
http://pages.lithium.com/science-of-social
http://www.lithium.com/library/science-of-social-2
136