2. About Me
• Former Member of the Search team at @WalmartLabs
• Former
Head
of
Metrics
&
Measurements
team
• I
also
led
the
Human
Evaluation
team
• About the Metrics and Measurements team
• A
team
of
engineers,
analysts
and
scientists
in
charge
of
providing
accurate
and
exhaustive
measurements
• we
also
had
an
auditing
role
towards
adjacent
teams
• What do we measure?
• Engineering
metrics
related
to
model
and
data
quality
• Business
metrics
(revenue,
etc.)
• More
exotic
customer-‐centric
metrics
(customer
value,
customer
satisfaction,
model
impact,
etc.)
• Currently Head of Data Science at Atlassian
• In
charge
of
the
Search
&
Smarts
team
3. About Me
• Former Member of the Search team at @WalmartLabs
• Former
Head
of
Metrics
&
Measurements
team
• I
also
led
the
Human
Evaluation
team
• About the Metrics and Measurements team
• A
team
of
engineers,
analysts
and
scientists
in
charge
of
providing
accurate and
exhaustive measurements
• we
also
had
an
auditing
role
towards
adjacent
teams
• What do we measure?
• Engineering
metrics
related
to
model
and
data
quality
• Business
metrics
(revenue,
etc.)
• More
exotic
customer-‐centric
metrics
(customer
value,
customer
satisfaction,
model
impact,
etc.)
• Currently Head of Data Science at Atlassian
• In
charge
of
the
Search
&
Smarts
team
4. About Me
• Former Member of the Search team at @WalmartLabs
• Former
Head
of
Metrics
&
Measurements
team
• I
also
led
the
Human
Evaluation
team
• About the Metrics and Measurements team
• A
team
of
engineers,
analysts
and
scientists
in
charge
of
providing
accurate and
exhaustive measurements
• we
also
had
an
auditing
role
towards
adjacent
teams
• What do we measure?
• Engineering
metrics
related
to
model
and
data
quality
• Business
metrics
(revenue,
etc.)
• More
exotic
customer-‐centric
metrics
(customer
value,
customer
satisfaction,
model
impact,
etc.)
• Currently Head of Data Science at Atlassian
• In
charge
of
the
Search
&
Smarts
team
5. About Me
• Former Member of the Search team at @WalmartLabs
• Former
Head
of
Metrics
&
Measurements
team
• I
also
led
the
Human
Evaluation
team
• About the Metrics and Measurements team
• A
team
of
engineers,
analysts
and
scientists
in
charge
of
providing
accurate and
exhaustive measurements
• we
also
had
an
auditing
role
towards
adjacent
teams
• What do we measure?
• Engineering
metrics
related
to
model
and
data
quality
• Business
metrics
(revenue,
etc.)
• More
exotic
customer-‐centric
metrics
(customer
value,
customer
satisfaction,
model
impact,
etc.)
• Currently Head of Data Science at Atlassian
• In
charge
of
the
Search
&
Smarts
team
6. q Humans & Big Data
• The
role
of
human
beings
in
the
era
of
Big
Data
• Why
do
we
need
to
tag
data?
• How
to
get
tagged
data?
q The Era of Crowdsourcing
• What
is
Crowdsourcing?
• Use
cases
and
details
about
Crowdsourcing
• Traditional
crowds
vs.
curated
crowds
q The Human-‐in-‐the-‐Loop Paradigm
• Definition
and
details
about
Human-‐In-‐The-‐Loop
ML
• Introduction
to
Active
Learning
Outline
7. q Humans & Big Data
• The
role
of
human
beings
in
the
era
of
Big
Data
• Why
do
we
need
to
tag
data?
• How
to
get
tagged
data?
q The Era of Crowdsourcing
• What
is
Crowdsourcing?
• Use
cases
and
details
about
Crowdsourcing
• Traditional
crowds
vs.
curated
crowds
q The Human-‐in-‐the-‐Loop Paradigm
• Definition
and
details
about
Human-‐In-‐The-‐Loop
ML
• Introduction
to
Active
Learning
Outline
8. q Humans & Big Data
• The
role
of
human
beings
in
the
era
of
Big
Data
• Why
do
we
need
to
tag
data?
• How
to
get
tagged
data?
q The Era of Crowdsourcing
• What
is
Crowdsourcing?
• Use
cases
and
details
about
Crowdsourcing
• Traditional
crowds
vs.
curated
crowds
q The Human-‐in-‐the-‐Loop Paradigm
• Definition
and
details
about
Human-‐In-‐The-‐Loop
ML
• Introduction
to
Active
Learning
Outline
9. Humans & Big Data:
The Role of Human Beings in the Era of
Machine Learning
10. The Era of Very Big Data
q VOLUME
• More
data created
from
2013
to
2015
than
in
the
entire
previous
history
of
the
human
race
• By
2020,
accumulated
data
will
reach
44 trillion gigabytes
q VELOCITY
• By
2020,
~1.7
MB of
new
data
/
second
/
human
being
• 1.2
trillion
search
queries
on
Google
per
year
q VARIETY
• 31
million
messages/2.8
million
videos per
minute
on
Facebook
• Up
to 300
hours of
video
/
minute
are
uploaded
to
YouTube
• In
2015, 1
trillion
photos taken;
billions
shared
online
data center at Google
11. The Era of Very Big Data
q VOLUME
• More
data created
from
2013
to
2015
than
in
the
entire
previous
history
of
the
human
race
• By
2020,
accumulated
data
will
reach
44 trillion gigabytes
q VELOCITY
• By
2020,
~1.7
MB of
new
data
/
second
/
human
being
• 1.2
trillion
search
queries
on
Google
per
year
q VARIETY
• 31
million
messages/2.8
million
videos per
minute
on
Facebook
• Up
to 300
hours of
video
/
minute
are
uploaded
to
YouTube
• In
2015, 1
trillion
photos taken;
billions
shared
online
data center at Google
12. The Era of Very Big Data
q VOLUME
• More
data created
from
2013
to
2015
than
in
the
entire
previous
history
of
the
human
race
• By
2020,
accumulated
data
will
reach
44 trillion gigabytes
q VELOCITY
• By
2020,
~1.7
MB of
new
data
/
second
/
human
being
• 1.2
trillion
search
queries
on
Google
per
year
q VARIETY
• 31
million
messages/2.8
million
videos per
minute
on
Facebook
• Up
to 300
hours of
video
/
minute
are
uploaded
to
YouTube
• In
2015, 1
trillion
photos taken;
billions
shared
online
data center at Google
13. The Era of Very Big Data
q VOLUME
• More
data created
from
2013
to
2015
than
in
the
entire
previous
history
of
the
human
race
• By
2020,
accumulated
data
will
reach
44 trillion gigabytes
q VELOCITY
• By
2020,
~1.7
MB of
new
data
/
second
/
human
being
• 1.2
trillion
search
queries
on
Google
per
year
q VARIETY
• 31
million
messages/2.8
million
videos per
minute
on
Facebook
• Up
to 300
hours of
video
/
minute
are
uploaded
to
YouTube
• In
2015, 1
trillion
photos taken;
billions
shared
online
data center at Google
14. Supervised vs. Unsupervised Machine Learning
Supervised ML
requires tagged data
• Classification:
problem
where
the
output
variable
is
a
category
examples:
SVM,
random
forest,
Bayesian
classifiers
• Regression:
problem
where
the
output
variable
is
a
real
value
examples:
linear
regression,
random
forest
15. Supervised vs. Unsupervised Machine Learning
Supervised ML
requires tagged data
Unsupervised ML
doesn’t require tagged data
• Classification:
problem
where
the
output
variable
is
a
category
examples:
SVM,
random
forest,
Bayesian
classifiers
• Regression:
problem
where
the
output
variable
is
a
real
value
examples:
linear
regression,
random
forest
• Clustering:
discovery of inherent groupings in the data
examples: k-‐means, k-‐nearest neighbors
• Association rules:
discovery of rules describing the data
example: Apriori algorithm
16. Supervised vs. Unsupervised Machine Learning
Supervised ML
requires tagged data
Unsupervised ML
doesn’t require tagged data
Supervised:
• Image
Recognition
• Speech
Recognition
Unsupervised
• Feature
Learning
• Autoencoders
• Classification:
problem
where
the
output
variable
is
a
category
examples:
SVM,
random
forest,
Bayesian
classifiers
• Regression:
problem
where
the
output
variable
is
a
real
value
examples:
linear
regression,
random
forest
• Clustering:
discovery of inherent groupings in the data
examples: k-‐means, k-‐nearest neighbors
• Association rules:
discovery of rules describing the data
example: Apriori algorithm
The Case of Deep Learning
both supervised and unsupervised applications
NB:
Deep
Learning
algorithms
are
data-‐greedy…
17. • Gathering quality tagged training data is a common bottleneck in ML
• Expensive
• Quality
control
is
hard,
requires
second
human
pass
• Hardly
scalable
à heavy
use
of
sampling
strategies
• How do companies doing Machine Learning get tagged data?
• Implicit
tagging:
customer
engagement
• Explicit
tagging:
manual
labor
• A few strategies to get tagged data for cheap/free:
• Games
(Google
Quick
Draw)
• Incentivization
(extra
lives
or
bonuses
in
games)
Tagged Data
18. • Gathering quality tagged training data is a common bottleneck in ML
• Expensive
• Quality
control
is
hard,
requires
second
human
pass
• Hardly
scalable
à heavy
use
of
sampling
strategies
• How do companies doing Machine Learning get tagged data?
• Implicit
tagging:
customer
engagement
• Explicit
tagging:
manual
labor
• A few strategies to get tagged data for cheap/free:
• Games
(Google
Quick
Draw)
• Incentivization
(extra
lives
or
bonuses
in
games)
Tagged Data
19. • Gathering quality tagged training data is a common bottleneck in ML
• Expensive
• Quality
control
is
hard,
requires
second
human
pass
• Hardly
scalable
à heavy
use
of
sampling
strategies
• How do companies doing Machine Learning get tagged data?
• Implicit
tagging:
customer
engagement
• Explicit
tagging:
manual
labor
• A few strategies to get tagged data for cheap/free:
• Games
(Google
Quick
Draw)
• Incentivization
(extra
lives
or
bonuses
in
games)
Tagged Data
https://quickdraw.withgoogle.com/
20. Why
human
input
matters:
the
use
case
of
image
colorization
The Wisdom from the Crowd
21. Why
human
input
matters:
the
use
case
of
image
colorization
The Wisdom from the Crowd
Colorization
Model
à Colorization
is
straightforward
to
humans
because
they
can
‘tap’
into
their
general
knowledge
22. The Wisdom from the Crowd
image
recognition
watermelon
grapesbananas
pineapple
orange
tagged training
data
set
“Bananas
are
generally
”
‘general’
knowledge
• obvious
for
human
beings
• fastidious
for
machines
colorization
Why
human
input
matters:
the
use
case
of
image
colorization
24. What is Crowdsourcing?
the
process
of
getting
labor
or
funding,
usually
online,
from
a
crowd
of
people
Crowdsourcing
25. What is Crowdsourcing?
Ø Crowdsourcing
=
'crowd'
+
'outsourcing'
Ø Act
of
taking
a
function
once
performed
by
employees
and
outsourcing
it
to
an
undefined
(generally
large)
network
of
people
in
the
form
of
an
open
call
the
process
of
getting
labor
or
funding,
usually
online,
from
a
crowd
of
people
History of Crowdsourcing
• Term
was
first
used
in
2005
by
the
editors
at Wired
• Official
definition
published
in
Wired
article
“The
Rise
of
Crowdsourcing”,
June
2016
• Describes
how
businesses
were
using
the
Internet
to
“outsource
work
to
the
crowd”
What
Crowdsourcing
helps
with:
• Scale
à peer-‐production
(for jobs
to
be
performed
collaboratively)
• Reach
à connect
with
a
large
network
of
potential
laborers
(if
task
undertaken
by
sole
individuals)
Crowdsourcing
26. What is Crowdsourcing?
Ø Crowdsourcing
=
'crowd'
+
'outsourcing'
Ø Act
of
taking
a
function
once
performed
by
employees
and
outsourcing
it
to
an
undefined
(generally
large)
network
of
people
in
the
form
of
an
open
call
the
process
of
getting
labor
or
funding,
usually
online,
from
a
crowd
of
people
History of Crowdsourcing
• Term
was
first
used
in
2005 by
the
editors
at Wired
• Official
definition
published
in
Wired
article
“The
Rise
of
Crowdsourcing”,
June
2006
• Describes
how
businesses
were
using
the
Internet
to
“outsource
work
to
the
crowd”
What
Crowdsourcing
helps
with:
• Scale
à peer-‐production
(for jobs
to
be
performed
collaboratively)
• Reach
à connect
with
a
large
network
of
potential
laborers
(if
task
undertaken
by
sole
individuals)
Crowdsourcing
27. What is Crowdsourcing?
Ø Crowdsourcing
=
'crowd'
+
'outsourcing'
Ø Act
of
taking
a
function
once
performed
by
employees
and
outsourcing
it
to
an
undefined
(generally
large)
network
of
people
in
the
form
of
an
open
call
the
process
of
getting
labor
or
funding,
usually
online,
from
a
crowd
of
people
Crowdsourcing
History of Crowdsourcing
• Term
was
first
used
in
2005 by
the
editors
at Wired
• Official
definition
published
in
Wired
article
“The
Rise
of
Crowdsourcing”,
June
2016
• Describes
how
businesses
were
using
the
Internet
to
“outsource
work
to
the
crowd”
What
Crowdsourcing
helps
with:
• Scale
à peer-‐production
(for jobs
to
be
performed
collaboratively)
• Reach
à connect
with
a
large
network
of
potential
laborers
(if
task
undertaken
by
sole
individuals)
28. The Nature of Crowdsourcing
• Data
generation: user
generated
content
such
as
reviews,
pictures,
translations,
etc.
• Data
validation:
validation
of
translation,
etc.
• Data
tagging:
image
tagging,
product
categorization,
etc.
• Data
curation:
curation
of
news
feeds,
etc.
Microtasks
Funding
Macrotasks
• Solution
development:
algorithm
improvement,
etc.
• Crowd
contest:
design
competition,
algorithmic
competition,
etc.
29. The Nature of Crowdsourcing
• Data
generation: user
generated
content
such
as
reviews,
pictures,
translations,
etc.
• Data
validation:
validation
of
translation,
etc.
• Data
tagging:
image
tagging,
product
categorization,
etc.
• Data
curation:
curation
of
news
feeds,
etc.
Microtasks
Funding
Macrotasks
• Solution
development:
algorithm
improvement,
etc.
• Crowd
contest:
design
competition,
algorithmic
competition,
etc.
30. The Nature of Crowdsourcing
• Data
generation: user
generated
content
such
as
reviews,
pictures,
translations,
etc.
• Data
validation:
validation
of
translation,
etc.
• Data
tagging:
image
tagging,
product
categorization,
etc.
• Data
curation:
curation
of
news
feeds,
etc.
Microtasks
Funding
Macrotasks
• Solution
development:
algorithm
improvement,
etc.
• Crowd
contest:
design
competition,
algorithmic
competition,
etc.
32. Some Cool Crowdsourcing Applications
Mapping
• Photo
Sphere
• Google
Maps
crowdsources
info
for
wheelchair-‐accessible
places
33. Some Cool Crowdsourcing Applications
Mapping
• Photo
Sphere
• Google
Maps
crowdsources
info
for
wheelchair-‐accessible
places
Traffic
• Google
Traffic
• Waze:
Traffic
reporting
app
34. Some Cool Crowdsourcing Applications
Mapping
• Photo
Sphere
• Google
Maps
crowdsources
info
for
wheelchair-‐accessible
places
Traffic
• Google
Traffic
• Waze:
Traffic
reporting
app
Translation
• Google
Translate
35. Some Cool Crowdsourcing Applications
Mapping
• Photo
Sphere
• Google
Maps
crowdsources
info
for
wheelchair-‐accessible
places
Traffic
• Google
Traffic
• Waze:
Traffic
reporting
app
Epidemiology
• Flu
tracking
applications
Translation
• Google
Translate
36. Companies Based on Crowdsourcing
Quora is
a question-‐and-‐answer
site where
questions
are
asked,
answered,
edited
and
organized
by
its
community
of
users.
Waze
is
a
community-‐based
traffic
and
navigation
app
where
drivers
share
real-‐time
traffic
and
road
info
Kaggle is
a
platform
for predictive
modelling competitions
in
which
companies
post
data
and
data
miners
compete
to
produce
the
best
models.
Stack
Overflow
is
a
platform
for
users
to
ask
and
answer
questions
and
to
vote
questions
and
answers
up
or
down
and
edit
them.
Flickr is
an image
and
video
hosting website that
is
widely
used
by bloggers to
host
images
that
they
embed
in
social
media.
38. Reliability
• Retail: Absence
of
emotional
involvement
(judges
are
not
actually
spending
money
on
items)
• Waze:
Locals
were
sending
fake
information
to
limit
traffic
in
their
area
Relevance
of
knowledge
• Retail:
Judges
might
not
have
appropriate
knowledge
of
the
items
they
are
evaluating
Subjectivity
• Search: Relevance
score
varies
depending
on
profile
and
personal
preferences
Speed & cost
• Human
evaluations
take
time,
can
only
be
performed
sporadically
and
on
samples
• Not
practical
for
measurement
purposes
The Challenges of Crowdsourcing
39. Reliability
• Retail: Absence
of
emotional
involvement
(judges
are
not
actually
spending
money
on
items)
• Waze:
Locals
were
sending
fake
information
to
limit
traffic
in
their
area
Relevance
of
knowledge
• Retail:
Judges
might
not
have
appropriate
knowledge
of
the
items
they
are
evaluating
Subjectivity
• Search: Relevance
score
varies
depending
on
profile
and
personal
preferences
Speed & cost
• Human
evaluations
take
time,
can
only
be
performed
sporadically
and
on
samples
• Not
practical
for
measurement
purposes
The Challenges of Crowdsourcing
40. Reliability
• Retail: Absence
of
emotional
involvement
(judges
are
not
actually
spending
money
on
items)
• Waze:
Locals
were
sending
fake
information
to
limit
traffic
in
their
area
Relevance
of
knowledge
• Retail:
Judges
might
not
have
appropriate
knowledge
of
the
items
they
are
evaluating
Subjectivity
• Search: Relevance
score
varies
depending
on
profile
and
personal
preferences
Speed & cost
• Human
evaluations
take
time,
can
only
be
performed
sporadically
and
on
samples
• Not
practical
for
measurement
purposes
The Challenges of Crowdsourcing
41. Reliability
• Retail: Absence
of
emotional
involvement
(judges
are
not
actually
spending
money
on
items)
• Waze:
Locals
were
sending
fake
information
to
limit
traffic
in
their
area
Relevance
of
knowledge
• Retail:
Judges
might
not
have
appropriate
knowledge
of
the
items
they
are
evaluating
Subjectivity
• Search: Relevance
score
varies
depending
on
profile
and
personal
preferences
Speed & cost
• Human
evaluations
take
time,
can
only
be
performed
sporadically
and
on
samples
• Not
practical
for
measurement
purposes
The Challenges of Crowdsourcing
42. Crowdsourcing vs. Curated Crowds
Traditional Crowdsourcing Model
$$$$$
+ Speed:
• many
hands
generate
light
work
+ Lower
cost:
• typically
a
few
pennies
per
task
-‐ No
quality
control
-‐ Lack
of
control:
• little
to
no
incentive
to
deliver
on
time
-‐ High
maintenance:
• clear
instructions
needed
• automated
understanding
checks
-‐ Lower
reliability:
• high
overlap
required
-‐ Lack
of
confidentiality:
• anyone
can
see
your
tasks
Curated Crowd
$$$$$
+ Quality
control:
• judges
submitted
to
quality
metrics
• removed
if
they
don’t
deliver
required
quality
+ Better
quality:
• very
little
overlap
needed
+ Expertise:
• judges
become
experts
at
required
task
+ Constraints
on
crowd:
• judges
less
likely
to
drop
out
-‐ More
expensive:
• typically
primary
source
of
income
for
judges
-‐ Consistency
required:
• need
frequent
tasks
to
keep
sharp
skills
43. Catalog Curation
• Product
Description
Curation
• Product
Tagging
& Categorization
• Product
Deduplication
• Taxonomy
Testing
Search Relevance Evaluation
• Relevance
score
(query-‐item
pair
scores)
• Engine
comparison
(ranking-‐to-‐ranking)
Review Moderation
• Removal/flagging
of
obscene
reviews
Mystery Shopping
• Analysis
and
discovery
of
new
trends
• Evaluation
of
new
products
• Competitive
analysis
Crowdsourcing Applications in e-‐Commerce
44. Catalog Curation
• Product
Description
Curation
• Product
Tagging
& Categorization
• Product
Deduplication
• Taxonomy
Testing
Search Relevance Evaluation
• Relevance
score
(query-‐item
pair
scores)
• Engine
comparison
(ranking-‐to-‐ranking)
Review Moderation
• Removal/flagging
of
obscene
reviews
Mystery Shopping
• Analysis
and
discovery
of
new
trends
• Evaluation
of
new
products
• Competitive
analysis
Crowdsourcing Applications in e-‐Commerce
The
example
of
Product
Tagging
45. Catalog Curation
• Product
Description
Curation
• Product
Tagging
& Categorization
• Product
Deduplication
• Taxonomy
Testing
Search Relevance Evaluation
• Relevance
score
(query-‐item
pair
scores)
• Engine
comparison
(ranking-‐to-‐ranking)
Review Moderation
• Removal/flagging
of
obscene
reviews
Mystery Shopping
• Analysis
and
discovery
of
new
trends
• Evaluation
of
new
products
• Competitive
analysis
Crowdsourcing Applications in e-‐Commerce
The
example
of
Product
Tagging
46. Catalog Curation
• Product
Description
Curation
• Product
Tagging
& Categorization
• Product
Deduplication
• Taxonomy
Testing
Search Relevance Evaluation
• Relevance
score
(query-‐item
pair
scores)
• Engine
comparison
(ranking-‐to-‐ranking)
Review Moderation
• Removal/flagging
of
obscene
reviews
Mystery Shopping
• Analysis
and
discovery
of
new
trends
• Evaluation
of
new
products
• Competitive
analysis
Crowdsourcing Applications in e-‐Commerce
The
example
of
Product
Tagging
47. Catalog Curation
• Product
Description
Curation
• Product
Tagging
& Categorization
• Product
Deduplication
• Taxonomy
Testing
Search Relevance Evaluation
• Relevance
score
(query-‐item
pair
scores)
• Engine
comparison
(ranking-‐to-‐ranking)
Review Moderation
• Removal/flagging
of
obscene
reviews
Mystery Shopping
• Analysis
and
discovery
of
new
trends
• Evaluation
of
new
products
• Competitive
analysis
Crowdsourcing Applications in e-‐Commerce
The
example
of
Product
Tagging
48. Use Case: Evaluation of Search Engine Relevance
à Human
evaluation
makes
it
possible
to
measure
the
intangible
with
little
risk
Ranking BRanking A
Side-‐by-‐Side Engine Comparison
Judge
1:
Prefers
ranking
A
Judge
2:
Prefers
ranking
A
Judge
3:
Prefers
ranking
B
49. Use Case: Evaluation of Search Engine Relevance
5/5
5/5
5/5
4/5
3/5
2/5
5/5
5/5
5/5
5/5
5/5
5/5
Query-‐Item Relevance Scoring for
Measurement of Ranking Quality
𝐷𝐶𝐺$ = &
𝑟𝑒𝑙*
𝑙𝑜𝑔-(𝑖 + 1)
$
*34
𝑛𝐷𝐶𝐺$ =
𝐷𝐶𝐺$
𝐼𝐷𝐶𝐺$
𝐼𝐷𝐶𝐺$ = &
289:; − 1
𝑙𝑜𝑔-(𝑖 + 1)
=>?
*34
where
graded
relevance
of item at
position i
Discounted
cumulative
gain
51. The Dream of Automation
FIRST REVOLUTION – 1784
Mechanical
production,
railroad,
steam
power
SECOND REVOLUTION – 1870
Mass
production,
electrical
power,
assembly
lines
THIRD REVOLUTION – 1969
Automated
production,
electronics,
computers
FOURTH REVOLUTION – ongoing
Artificial
intelligence,
big
data
The 4 Industrial Revolutions
52. The Dream of Automation
FIRST REVOLUTION – 1784
Mechanical
production,
railroad,
steam
power
SECOND REVOLUTION – 1870
Mass
production,
electrical
power,
assembly
lines
THIRD REVOLUTION – 1969
Automated
production,
electronics,
computers
FOURTH REVOLUTION – ongoing
Artificial
intelligence,
big
data
à Automation is not a new idea
The 4 Industrial Revolutions
53. The Dream of Automation
FIRST REVOLUTION – 1784
Mechanical
production,
railroad,
steam
power
SECOND REVOLUTION – 1870
Mass
production,
electrical
power,
assembly
lines
THIRD REVOLUTION – 1969
Automated
production,
electronics,
computers
FOURTH REVOLUTION – ongoing
Artificial
intelligence,
big
data
à Automation is not a new idea
The 4 Industrial Revolutions
the
use
of
various control
systems for
operating
equipment
such
as
machinery
and
processes
with
minimal
or
reduced
human
intervention.
Automation
54. The Dream of Automation
the
use
of
various control
systems for
operating
equipment
such
as
machinery
and
processes
with
minimal
or
reduced
human
intervention.
FIRST REVOLUTION – 1784
Mechanical
production,
railroad,
steam
power
SECOND REVOLUTION – 1870
Mass
production,
electrical
power,
assembly
lines
THIRD REVOLUTION – 1969
Automated
production,
electronics,
computers
FOURTH REVOLUTION – ongoing
Artificial
intelligence,
big
data
Why?
• Automate
boring/repetitive
tasks
• Perform
tasks
at
scale
• Perform
tasks
with
enhanced
precision
• Deliver
consistent products
• Use
machines
where
they
outperform
humans
à Automation is not a new idea
The 4 Industrial Revolutions Automation
55. When Full Automation can’t be Achieved…
Human-‐in-‐the-‐Loop
Human-in-the-loop or HITL is defined as a model or a system that requires human interaction
56. The idea of using human beings to enhance the machine is not new
We
have
been
doing
Human-‐in-‐the-‐Loop
all
along…
• Example:
Autopilot
technology
for
planes
Human intervention/presence is useful:
• To
handle
corner
cases
(outlier
management)
• To
“keep
an
eye”
on
the
system
(sanity
check)
• To
correct
unwanted
behavior
(refinement)
• To
validate
appropriate
behavior
(validation)
When Full Automation can’t be Achieved…
Human-in-the-loop or HITL is defined as a model or a system that requires human interaction
Human-‐in-‐the-‐Loop
57. The idea of using human beings to enhance the machine is not new
We
have
been
doing
Human-‐in-‐the-‐Loop
all
along…
• Example:
Autopilot
technology
for
planes
Human intervention/presence is useful:
• To
handle
corner
cases
(outlier
management)
• To
“keep
an
eye”
on
the
system
(sanity
check)
• To
correct
unwanted
behavior
(refinement)
• To
validate
appropriate
behavior
(validation)
When Full Automation can’t be Achieved…
Human-in-the-loop or HITL is defined as a model or a system that requires human interaction
Human-‐in-‐the-‐Loop
58. Human-‐in-‐the-‐Loop Paradigm
Pareto Principle
aka
the
80/20
rule,
the law
of
the
vital
few, or
the principle
of
factor
sparsity
-‐ states
that,
for
many
events,
roughly
80%
of
the
effects
come
from
20%
of
the
causes
59. ML version of the Pareto Principle:
• Evidence
suggests
that
some
of
the
most
accurate
ML
systems
to
date need:
• 80%
computer
AI-‐driven
• 19%
human
input
• 1
%
unknown
randomness
to
balance
things
out
• The
combination
of
machine
and
human
intervention
achieves
maximum
machine
accuracy
How can human knowledge be incorporated to ML models?
A. Helping
label
the
original
dataset
that
will
be
fed
into
a
ML
model
B. Helping
correct
inaccurate
predictions
that
arise
as
the
system
goes
live.
Human-‐in-‐the-‐Loop Paradigm
aka
the
80/20
rule,
the law
of
the
vital
few, or
the principle
of
factor
sparsity
-‐ states
that,
for
many
events,
roughly
80%
of
the
effects
come
from
20%
of
the
causes
Pareto Principle
60. ML version of the Pareto Principle:
• Evidence
suggests
that
some
of
the
most
accurate
ML
systems
to
date need:
• 80%
computer
AI-‐driven
• 19%
human
input
• 1
%
unknown
randomness
to
balance
things
out
• The
combination
of
machine
and
human
intervention
achieves
maximum
machine
accuracy
How can human knowledge be incorporated to ML models?
A. Helping
label
the
original
dataset
that
will
be
fed
into
a
ML
model
B. Helping
correct
inaccurate
predictions
that
arise
as
the
system
goes
live
Human-‐in-‐the-‐Loop Paradigm
aka
the
80/20
rule,
the law
of
the
vital
few, or
the principle
of
factor
sparsity
-‐ states
that,
for
many
events,
roughly
80%
of
the
effects
come
from
20%
of
the
causes
Pareto Principle
63. Human-‐In-‐The-‐Loop Use Case #1
Mary
Roberto
Victoria
LauraSebastian
Cecelia
Accuracy
• Facebook's
DeepFace Software
reaches
97.25%
of
accuracy
HITL as a feedback loop
• When
the
confidence
is
below
a
certain
threshold,
it:
• suggests
a
label
• ask
the
uploader
to
validate/approve
or
correct
the
suggestion
• The
new
data
is
used
to
improve
the
accuracy
of
the
algorithm
An example of HITL approach: face recognition
64. Human-‐In-‐The-‐Loop Use Case #1
Mary
Roberto
Victoria
LauraSebastian
Cecelia
Accuracy
• Facebook's
DeepFace Software
reaches
97.25%
of
accuracy
HITL as a feedback loop
• When
the
confidence
is
below
a
certain
threshold,
it:
• suggests a
label
• ask
the
uploader
to
validate/approve
or
correct
the
suggestion
• The
new
data
is
used
to
improve
the
accuracy
of
the
algorithm
An example of HITL approach: face recognition
66. Teaching the machine
• Driving
systems
were
trained
using
a
human
to
oversee
the
process
Accuracy considerations
• Autopilot
system
is
now
over
99%
accurate
• However,
a
99%
accuracy
means
that
people
can
die
1%
of
the
time
(!!)
• Though
we
have
seen
huge
advances
in
accuracy
of
pure
machine-‐
driven
systems,
they
tend
to fall
short
of
acceptable accuracy
rates
Human-‐In-‐The-‐Loop Use Case #2
An example of HITL approach: autonomous vehicles
67. Teaching the machine
• Driving
systems
were
trained
using
a
human
to
oversee
the
process
Accuracy considerations
• Autopilot
system
is
now
over
99%
accurate
• However,
a
99%
accuracy
means
that
people
can
die
1%
of
the
time
(!!)
• Though
we
have
seen
huge
advances
in
accuracy
of
pure
machine-‐
driven
systems,
they
tend
to fall
short
of
acceptable accuracy
rates
Human-‐In-‐The-‐Loop Use Case #2
An example of HITL approach: autonomous vehicles
68. Teaching the machine
• Driving
systems
were
trained
using
a
human
to
oversee
the
process
Accuracy considerations
• Autopilot
system
is
now
over
99%
accurate
• However,
a
99%
accuracy
means
that
people
can
die
1%
of
the
time
(!!)
• Though
we
have
seen
huge
advances
in
accuracy
of
pure
machine-‐
driven
systems,
they
tend
to fall
short
of
acceptable accuracy
rates
Human-‐In-‐The-‐Loop Use Case #2
An example of HITL approach: autonomous vehicles
Corner
cases
• Fun
fact: Volvo’s
self-‐driving
cars
fail
in
Australia
because
of
kangaroos
• Reaching
100%
is
hard
because
of
corner
cases
• A
HITL
approach
helps
get
the
accuracy
to
~100%
• get
the
accuracy
to
~100%
Volvo's
driverless
cars
'confused'
by
kangaroos
69. The Success of Human-‐In-‐The-‐Loop
The Example of Chess
70. The Human vs. the Machine
• In
1997,
Chess
Master
Garry
Kasparov
is
beaten
by
IBM
supercomputer
Deep
Blue
The Success of Human-‐In-‐The-‐Loop
The Example of Chess
Garry
Kasparov
71. The Human vs. the Machine
• In
1997,
Chess
Master
Garry
Kasparov
is
beaten
by
IBM
supercomputer
Deep
Blue
The Success of Human-‐In-‐The-‐Loop
The Example of Chess
Freestyle
or
“Advanced”
Chess
• Advanced:
A
human
chess
master
works
with
a
computer
to
find
the
best
possible
move
• Freestyle:
A
team
can
be
made
of
any
combination
of
human
beings
+
computers
• In
2005,
Steven
Cramton,
Zackary
Stephen
and
their
3
computers
win
Freestyle
Chess
Tournament
Why it works
• Computers
are
great
at
reading
tough
tactical
situations
• But
humans
are
better
at
understanding
long
term
strategy
• Computers
to
limit
“blunders”
while
using
their
intuition
to
force
the
opponent
into
board
states
that
confuses
the
computer(s)
Garry
Kasparov
72. The Human vs. the Machine
• In
1997,
Chess
Master
Garry
Kasparov
is
beaten
by
IBM
supercomputer
Deep
Blue
The Success of Human-‐In-‐The-‐Loop
The Example of Chess
Freestyle
or
“Advanced”
Chess
• Advanced:
A
human
chess
master
works
with
a
computer
to
find
the
best
possible
move
• Freestyle:
A
team
can
be
made
of
any
combination
of
human
beings
+
computers
• In
2005,
Steven
Cramton,
Zackary
Stephen
and
their
3
computers
win
Freestyle
Chess
Tournament
Why it works
• Computers
are
great
at
reading
tough
tactical
situations
• But
humans
are
better
at
understanding
long
term
strategy
• Computers
to
limit
“blunders”
while
using
their
intuition
to
force
the
opponent
into
board
states
that
confuses
the
computer(s)
Garry
Kasparov
74. Active Learning
a special case of semi-‐supervised ML in which a learning algorithm can interactively query the
user (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance
Active Learning
75. Active Learning
a special case of semi-‐supervised ML in which a learning algorithm can interactively query the
user (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance
General Strategy
If
D
is the
entire
data
set,
a each
iteration i , D is
broken
up
into
three
subsets
1. DK,
i :
data
points
where
the
label
is known
2. DU,
i :
data
points
where
the
label
is unknown
3. DQ,
i :
data
points for
which
the
label
is
queried
(sometimes,
even
when
the
label
is
known)
Benefits
• Query
labels
only
when
necessary
(lower
cost)
Next Generation Algorithms
• Proactive
learning:
• relaxes
the
assumption
that
the
oracle
is
always
right
• casts
the
problem
as
an
optimization
problem w/
a budget
constraint
Active Learning
76. Active Learning
a special case of semi-‐supervised ML in which a learning algorithm can interactively query the
user (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance
General Strategy
If
D
is the
entire
data
set,
a each
iteration i , D is
broken
up
into
three
subsets
1. DK,
i :
data
points
where
the
label
is known
2. DU,
i :
data
points
where
the
label
is unknown
3. DQ,
i :
data
points for
which
the
label
is
queried
(sometimes,
even
when
the
label
is
known)
Benefits
• Query
labels
only
when
necessary
(lower
cost)
Next Generation Algorithms
• Proactive
learning:
• relaxes
the
assumption
that
the
oracle
is
always
right
• casts
the
problem
as
an
optimization
problem w/
a budget
constraint
Active Learning
77. Active Learning
a special case of semi-‐supervised ML in which a learning algorithm can interactively query the
user (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance
General Strategy
If
D
is the
entire
data
set,
a each
iteration i , D is
broken
up
into
three
subsets
1. DK,
i :
data
points
where
the
label
is known
2. DU,
i :
data
points
where
the
label
is unknown
3. DQ,
i :
data
points for
which
the
label
is
queried
(sometimes,
even
when
the
label
is
known)
Benefits
• Query
labels
only
when
necessary
(lower
cost)
Next Generation Algorithms
• Proactive
learning:
• relaxes
the
assumption
that
the
oracle
is
always
right
• casts
the
problem
as
an
optimization
problem w/
a budget
constraint
Active Learning
79. Active Learning: How does it Work?
Machine
Learning
needs
• Logics
(algorithm)
• Data
• Optimization
• Feedback
ß Human-‐in-‐the-‐Loop
Active
Learning
=
a
Machine
Learning
Algorithm
using
an
“oracle”
to
reduce
mistakes/uncertainty
Query
Strategy
-‐ Labels
are
queried
when:
• Data
points
for
which
model
uncertainty
is
high
(uncertainty
sampling)
• Data
points
for
which
the
different
models
of
an
ensemble
method
disagree
the
most
(query
by
committee)
• Data
points
causing
the
most
changes
on
the
model
(expected
model
change)
• Data
points
caused
overall
variance
to
be
high
(variance
reduction)
80. Active Learning: How does it Work?
Unlabeled Data
Active
Learning
Algorithm
select/remove
single
example
Labeled Data
Classifier
Oracle
(Human)
update
add
labeled
example
provide
correct
label
Machine
Learning
needs
• Logics
(algorithm)
• Data
• Optimization
• Feedback
ß Human-‐in-‐the-‐Loop
Active
Learning
=
a
Machine
Learning
Algorithm
using
an
“oracle”
to
reduce
mistakes/uncertainty
Query
Strategy
-‐ Labels
are
queried
when:
• Data
points
for
which
model
uncertainty
is
high
(uncertainty
sampling)
• Data
points
for
which
the
different
models
of
an
ensemble
method
disagree
the
most
(query
by
committee)
• Data
points
causing
the
most
changes
on
the
model
(expected
model
change)
• Data
points
caused
overall
variance
to
be
high
(variance
reduction)
81. Active Learning: How does it Work?
Unlabeled Data
Active
Learning
Algorithm
select/remove
single
example
Labeled Data
Classifier
Oracle
(Human)
update
add
labeled
example
provide
correct
label
Machine
Learning
needs
• Logics
(algorithm)
• Data
• Optimization
• Feedback
ß Human-‐in-‐the-‐Loop
Active
Learning
=
a
Machine
Learning
Algorithm
using
an
“oracle”
to
reduce
mistakes/uncertainty
Query
Strategy
-‐ Labels
are
queried
when:
• Data
points
for
which
model
uncertainty
is
high
(uncertainty
sampling)
• Data
points
for
which
the
different
models
of
an
ensemble
method
disagree
the
most
(query
by
committee)
• Data
points
causing
the
most
changes
on
the
model
(expected
model
change)
• Data
points
caused
overall
variance
to
be
high
(variance
reduction)
82. Active Learning: How does it Work?
Machine Learning
Classifier
Confidence
level
high?
YES
NO
Output
Annotation by
Human Oracle
Human-‐in-‐the-‐Loop
Active
Learning
By adding a human feedback loop, we allow the system to:
• actively
learn
• correct
itself
where
it
got
it
wrong
• improve
the
algorithm
over
iterations
83. Active Learning: How does it Work?
Machine Learning
Classifier
Confidence
level
high?
YES
NO
Output
Annotation by
Human Oracle
Human-‐in-‐the-‐Loop
Active
Learning
By adding a human feedback loop, we allow the system to:
• actively
learn
• correct
itself
where
it
got
it
wrong
• improve
the
algorithm
over
iterations
84. 3 Use Cases using Active Learning in the context of Search/Retail
Active Learning at Walmart e-‐Commerce
85. q Machine Learning Lifecycle Management (Programming by Feedback)
• Automatic
monitoring
of
input
and
output
values
for
ML
algorithm
• An
algorithm
detects
failings
and
outliers
in
real-‐time
and
suggest
an
action
• A
human
validates
the
action,
creating
tagged
data
for
full
automation
q Diagnosis of Catalog Data Issues (Reinforcement Learning)
• Algorithm
uncovers
demoted
items
and
suggests
most
likely
reason
for
the
demotion
• Engineer
manually
confirms/corrects
the
suggestion,
generating
training
data
for
full
automation
q Refinement of Query Tagging Algorithm (Optimization)
• Human
evaluation
team
manually
measures
accuracy
of
query
tagging
model
• Mistagged
queries
are
used
to
discover
patterns
specific
to
problematic
queries,
which
are
reported
to
engineers
• Sample
is
enriched
with
problematic
queries
(evaluation
team
can
diagnose
problems
with
algorithms)
3 Use Cases using Active Learning in the context of Search/Retail
Active Learning at Walmart e-‐Commerce
86. q Machine Learning Lifecycle Management (Programming by Feedback)
• Automatic
monitoring
of
input
and
output
values
for
ML
algorithm
• An
algorithm
detects
failings
and
outliers
in
real-‐time
and
suggest
an
action
• A
human
validates
the
action,
creating
tagged
data
for
full
automation
q Diagnosis of Catalog Data Issues (Reinforcement Learning)
• Algorithm
uncovers
demoted
items
and
suggests
most
likely
reason
for
the
demotion
• Engineer
manually
confirms/corrects
the
suggestion,
generating
training
data
for
full
automation
q Refinement of Query Tagging Algorithm (Optimization)
• Human
evaluation
team
manually
measures
accuracy
of
query
tagging
model
• Mistagged
queries
are
used
to
discover
patterns
specific
to
problematic
queries,
which
are
reported
to
engineers
• Sample
is
enriched
with
problematic
queries
(evaluation
team
can
diagnose
problems
with
algorithms)
3 Use Cases using Active Learning in the context of Search/Retail
Active Learning at Walmart e-‐Commerce
87. q Machine Learning Lifecycle Management (Programming by Feedback)
• Automatic
monitoring
of
input
and
output
values
for
ML
algorithm
• An
algorithm
detects
failings
and
outliers
in
real-‐time
and
suggest
an
action
• A
human
validates
the
action,
creating
tagged
data
for
full
automation
q Diagnosis of Catalog Data Issues (Reinforcement Learning)
• Algorithm
uncovers
demoted
items
and
suggests
most
likely
reason
for
the
demotion
• Engineer
manually
confirms/corrects
the
suggestion,
generating
training
data
for
full
automation
q Refinement of Query Tagging Algorithm (Optimization)
• Human
evaluation
team
manually
measures
accuracy
of
query
tagging
model
• Mistagged
queries
are
used
to
discover
patterns
specific
to
problematic
queries,
which
are
reported
to
engineers
• Sample
is
enriched
with
problematic
queries
(evaluation
team
can
diagnose
problems
with
algorithms)
3 Use Cases using Active Learning in the context of Search/Retail
red t-shirt Size M
color product
type size
Active Learning at Walmart e-‐Commerce
88. • Why do humans and machine complement each other?
• Human
beings
are
memory-‐constrained
• Computers
are
knowledge-‐constrained
• Tagged data more important than ever
• But
getting
quality
data
is
challenging
given
the
volume
of
data
• Crowdsourcing
offer
more
flexibility
to
tag
data
at
scale
• Human-‐in-‐the-‐Loop paradigm
• Improve
accuracy
of
machine
learning
algorithm
(classifiers)
• Many
examples
of
successful
endeavors
using
“Augmented
Intelligence”
• Active
Learning
is
a
booming
area
of
ML/AI
Conclusion and Takeaways
89. • Why do humans and machine complement each other?
• Human
beings
are
memory-‐constrained
• Computers
are
knowledge-‐constrained
• Tagged data more important than ever
• But
getting
quality
data
is
challenging
given
the
volume
of
data
• Crowdsourcing
offer
more
flexibility
to
tag
data
at
scale
• Human-‐in-‐the-‐Loop paradigm
• Improve
accuracy
of
machine
learning
algorithm
(classifiers)
• Many
examples
of
successful
endeavors
using
“Augmented
Intelligence”
• Active
Learning
is
a
booming
area
of
ML/AI
Conclusion and Takeaways
90. • Why do humans and machine complement each other?
• Human
beings
are
memory-‐constrained
• Computers
are
knowledge-‐constrained
• Tagged data more important than ever
• But
getting
quality
data
is
challenging
given
the
volume
of
data
• Crowdsourcing
offer
more
flexibility
to
tag
data
at
scale
• Human-‐in-‐the-‐Loop paradigm
• Improve
accuracy
of
machine
learning
algorithm
(classifiers)
• Many
examples
of
successful
endeavors
using
“Augmented
Intelligence”
• Active
Learning
is
a
booming
area
of
ML/AI
Conclusion and Takeaways