SlideShare a Scribd company logo
1 of 91
Download to read offline
Natural  Intelligence:  
the  Human  Factor  in  A.I.
Big  Data  Expo  2017
Utrecht,  Netherlands
About  Me
• Former  Member  of  the  Search  team  at  @WalmartLabs
• Former	
  Head	
  of	
  Metrics	
  &	
  Measurements	
  team
• I	
  also	
  led	
  the	
  Human	
  Evaluation	
  team
• About  the  Metrics  and  Measurements  team
• A	
  team	
  of	
  engineers,	
  analysts	
  and	
  scientists	
  in	
  charge	
  of
providing	
  accurate	
  and	
  exhaustive	
  measurements
• we	
  also	
  had	
  an	
  auditing	
  role	
  towards	
  adjacent	
  teams
• What  do  we  measure?
• Engineering	
  metrics	
  related	
  to	
  model	
  and	
  data	
  quality
• Business	
  metrics	
  (revenue,	
  etc.)
• More	
  exotic	
  customer-­‐centric	
  metrics	
  
(customer	
  value,	
  customer	
  satisfaction,	
  model	
  impact,	
  etc.)
• Currently  Head  of  Data  Science  at  Atlassian
• In	
  charge	
  of	
  the	
  Search	
  &	
  Smarts	
  team
About  Me
• Former  Member  of  the  Search  team  at  @WalmartLabs
• Former	
  Head	
  of	
  Metrics	
  &	
  Measurements	
  team
• I	
  also	
  led	
  the	
  Human	
  Evaluation	
  team
• About  the  Metrics  and  Measurements  team
• A	
  team	
  of	
  engineers,	
  analysts	
  and	
  scientists	
  in	
  charge	
  of
providing	
  accurate and	
  exhaustive measurements
• we	
  also	
  had	
  an	
  auditing	
  role	
  towards	
  adjacent	
  teams
• What  do  we  measure?
• Engineering	
  metrics	
  related	
  to	
  model	
  and	
  data	
  quality
• Business	
  metrics	
  (revenue,	
  etc.)
• More	
  exotic	
  customer-­‐centric	
  metrics	
  
(customer	
  value,	
  customer	
  satisfaction,	
  model	
  impact,	
  etc.)
• Currently  Head  of  Data  Science  at  Atlassian
• In	
  charge	
  of	
  the	
  Search	
  &	
  Smarts	
  team
About  Me
• Former  Member  of  the  Search  team  at  @WalmartLabs
• Former	
  Head	
  of	
  Metrics	
  &	
  Measurements	
  team
• I	
  also	
  led	
  the	
  Human	
  Evaluation	
  team
• About  the  Metrics  and  Measurements  team
• A	
  team	
  of	
  engineers,	
  analysts	
  and	
  scientists	
  in	
  charge	
  of
providing	
  accurate and	
  exhaustive measurements
• we	
  also	
  had	
  an	
  auditing	
  role	
  towards	
  adjacent	
  teams
• What  do  we  measure?
• Engineering	
  metrics	
  related	
  to	
  model	
  and	
  data	
  quality
• Business	
  metrics	
  (revenue,	
  etc.)
• More	
  exotic	
  customer-­‐centric	
  metrics	
  
(customer	
  value,	
  customer	
  satisfaction,	
  model	
  impact,	
  etc.)
• Currently  Head  of  Data  Science  at  Atlassian
• In	
  charge	
  of	
  the	
  Search	
  &	
  Smarts	
  team
About  Me
• Former  Member  of  the  Search  team  at  @WalmartLabs
• Former	
  Head	
  of	
  Metrics	
  &	
  Measurements	
  team
• I	
  also	
  led	
  the	
  Human	
  Evaluation	
  team
• About  the  Metrics  and  Measurements  team
• A	
  team	
  of	
  engineers,	
  analysts	
  and	
  scientists	
  in	
  charge	
  of
providing	
  accurate and	
  exhaustive measurements
• we	
  also	
  had	
  an	
  auditing	
  role	
  towards	
  adjacent	
  teams
• What  do  we  measure?
• Engineering	
  metrics	
  related	
  to	
  model	
  and	
  data	
  quality
• Business	
  metrics	
  (revenue,	
  etc.)
• More	
  exotic	
  customer-­‐centric	
  metrics	
  
(customer	
  value,	
  customer	
  satisfaction,	
  model	
  impact,	
  etc.)
• Currently  Head  of  Data  Science  at  Atlassian
• In	
  charge	
  of	
  the	
  Search	
  &	
  Smarts	
  team
q Humans  &  Big  Data
• The	
  role	
  of	
  human	
  beings	
  in	
  the	
  era	
  of	
  Big	
  Data
• Why	
  do	
  we	
  need	
  to	
  tag	
  data?
• How	
  to	
  get	
  tagged	
  data?
q The  Era  of  Crowdsourcing
• What	
  is	
  Crowdsourcing?
• Use	
  cases	
  and	
  details	
  about	
  Crowdsourcing
• Traditional	
  crowds	
  vs.	
  curated	
  crowds
q The  Human-­‐in-­‐the-­‐Loop  Paradigm
• Definition	
  and	
  details	
  about	
  Human-­‐In-­‐The-­‐Loop	
  ML
• Introduction	
  to	
  Active	
  Learning
Outline
q Humans  &  Big  Data
• The	
  role	
  of	
  human	
  beings	
  in	
  the	
  era	
  of	
  Big	
  Data
• Why	
  do	
  we	
  need	
  to	
  tag	
  data?
• How	
  to	
  get	
  tagged	
  data?
q The  Era  of  Crowdsourcing
• What	
  is	
  Crowdsourcing?
• Use	
  cases	
  and	
  details	
  about	
  Crowdsourcing
• Traditional	
  crowds	
  vs.	
  curated	
  crowds
q The  Human-­‐in-­‐the-­‐Loop  Paradigm
• Definition	
  and	
  details	
  about	
  Human-­‐In-­‐The-­‐Loop	
  ML
• Introduction	
  to	
  Active	
  Learning
Outline
q Humans  &  Big  Data
• The	
  role	
  of	
  human	
  beings	
  in	
  the	
  era	
  of	
  Big	
  Data
• Why	
  do	
  we	
  need	
  to	
  tag	
  data?
• How	
  to	
  get	
  tagged	
  data?
q The  Era  of  Crowdsourcing
• What	
  is	
  Crowdsourcing?
• Use	
  cases	
  and	
  details	
  about	
  Crowdsourcing
• Traditional	
  crowds	
  vs.	
  curated	
  crowds
q The  Human-­‐in-­‐the-­‐Loop  Paradigm
• Definition	
  and	
  details	
  about	
  Human-­‐In-­‐The-­‐Loop	
  ML
• Introduction	
  to	
  Active	
  Learning
Outline
Humans  &  Big  Data:
The  Role  of  Human  Beings  in  the  Era  of  
Machine  Learning
The  Era  of  Very  Big  Data
q VOLUME
• More	
  data created	
  from	
  2013	
  to	
  2015	
  than	
  in	
  the	
  entire	
  previous	
  history	
  of	
  the	
  human	
  race
• By	
  2020,	
  accumulated	
  data	
  will	
  reach	
  44 trillion gigabytes
q VELOCITY
• By	
  2020,	
  ~1.7	
  MB of	
  new	
  data	
  /	
  second	
  /	
  human	
  being
• 1.2	
  trillion	
  search	
  queries	
  on	
  Google	
  per	
  year
q VARIETY
• 31	
  million	
  messages/2.8	
  million	
  videos per	
  minute	
  on	
  Facebook
• Up	
  to 300	
  hours of	
  video	
  /	
  minute	
  are	
  uploaded	
  to	
  YouTube
• In	
  2015, 1	
  trillion	
  photos taken;	
  billions	
  shared	
  online
data  center  at  Google
The  Era  of  Very  Big  Data
q VOLUME
• More	
  data created	
  from	
  2013	
  to	
  2015	
  than	
  in	
  the	
  entire	
  previous	
  history	
  of	
  the	
  human	
  race
• By	
  2020,	
  accumulated	
  data	
  will	
  reach	
  44 trillion gigabytes
q VELOCITY
• By	
  2020,	
  ~1.7	
  MB of	
  new	
  data	
  /	
  second	
  /	
  human	
  being
• 1.2	
  trillion	
  search	
  queries	
  on	
  Google	
  per	
  year
q VARIETY
• 31	
  million	
  messages/2.8	
  million	
  videos per	
  minute	
  on	
  Facebook
• Up	
  to 300	
  hours of	
  video	
  /	
  minute	
  are	
  uploaded	
  to	
  YouTube
• In	
  2015, 1	
  trillion	
  photos taken;	
  billions	
  shared	
  online
data  center  at  Google
The  Era  of  Very  Big  Data
q VOLUME
• More	
  data created	
  from	
  2013	
  to	
  2015	
  than	
  in	
  the	
  entire	
  previous	
  history	
  of	
  the	
  human	
  race
• By	
  2020,	
  accumulated	
  data	
  will	
  reach	
  44 trillion gigabytes
q VELOCITY
• By	
  2020,	
  ~1.7	
  MB of	
  new	
  data	
  /	
  second	
  /	
  human	
  being
• 1.2	
  trillion	
  search	
  queries	
  on	
  Google	
  per	
  year
q VARIETY
• 31	
  million	
  messages/2.8	
  million	
  videos per	
  minute	
  on	
  Facebook
• Up	
  to 300	
  hours of	
  video	
  /	
  minute	
  are	
  uploaded	
  to	
  YouTube
• In	
  2015, 1	
  trillion	
  photos taken;	
  billions	
  shared	
  online
data  center  at  Google
The  Era  of  Very  Big  Data
q VOLUME
• More	
  data created	
  from	
  2013	
  to	
  2015	
  than	
  in	
  the	
  entire	
  previous	
  history	
  of	
  the	
  human	
  race
• By	
  2020,	
  accumulated	
  data	
  will	
  reach	
  44 trillion gigabytes
q VELOCITY
• By	
  2020,	
  ~1.7	
  MB of	
  new	
  data	
  /	
  second	
  /	
  human	
  being
• 1.2	
  trillion	
  search	
  queries	
  on	
  Google	
  per	
  year
q VARIETY
• 31	
  million	
  messages/2.8	
  million	
  videos per	
  minute	
  on	
  Facebook
• Up	
  to 300	
  hours of	
  video	
  /	
  minute	
  are	
  uploaded	
  to	
  YouTube
• In	
  2015, 1	
  trillion	
  photos taken;	
  billions	
  shared	
  online
data  center  at  Google
Supervised  vs.  Unsupervised  Machine  Learning
Supervised  ML
requires  tagged  data
• Classification:	
  
problem	
  where	
  the	
  output	
  variable	
  is	
  a	
  category
examples:	
  SVM,	
  random	
  forest,	
  Bayesian	
  classifiers
• Regression:	
  
problem	
  where	
  the	
  output	
  variable	
  is	
  a	
  real	
  value
examples:	
  linear	
  regression,	
  random	
  forest
Supervised  vs.  Unsupervised  Machine  Learning
Supervised  ML
requires  tagged  data
Unsupervised  ML
doesn’t  require  tagged  data
• Classification:	
  
problem	
  where	
  the	
  output	
  variable	
  is	
  a	
  category
examples:	
  SVM,	
  random	
  forest,	
  Bayesian	
  classifiers
• Regression:	
  
problem	
  where	
  the	
  output	
  variable	
  is	
  a	
  real	
  value
examples:	
  linear	
  regression,	
  random	
  forest
• Clustering:
discovery of inherent groupings in the data
examples: k-­‐means, k-­‐nearest neighbors
• Association rules:
discovery of rules describing the data
example: Apriori algorithm
Supervised  vs.  Unsupervised  Machine  Learning
Supervised  ML
requires  tagged  data
Unsupervised  ML
doesn’t  require  tagged  data
Supervised:
• Image	
  Recognition
• Speech	
  Recognition
Unsupervised
• Feature	
  Learning
• Autoencoders
• Classification:	
  
problem	
  where	
  the	
  output	
  variable	
  is	
  a	
  category
examples:	
  SVM,	
  random	
  forest,	
  Bayesian	
  classifiers
• Regression:	
  
problem	
  where	
  the	
  output	
  variable	
  is	
  a	
  real	
  value
examples:	
  linear	
  regression,	
  random	
  forest
• Clustering:
discovery of inherent groupings in the data
examples: k-­‐means, k-­‐nearest neighbors
• Association rules:
discovery of rules describing the data
example: Apriori algorithm
The  Case  of  Deep  Learning
both  supervised  and  unsupervised  applications
NB:	
  Deep	
  Learning	
  algorithms	
  
are	
  data-­‐greedy…
• Gathering  quality  tagged  training  data is  a  common  bottleneck  in  ML
• Expensive
• Quality	
  control	
  is	
  hard,	
  requires	
  second	
  human	
  pass
• Hardly	
  scalable	
  à heavy	
  use	
  of	
  sampling	
  strategies
• How  do  companies  doing  Machine  Learning  get  tagged  data?
• Implicit	
  tagging:	
  customer	
  engagement
• Explicit	
  tagging:	
  manual	
  labor
• A  few  strategies  to  get  tagged  data  for  cheap/free:
• Games	
  (Google	
  Quick	
  Draw)
• Incentivization	
  (extra	
  lives	
  or	
  bonuses	
  in	
  games)
Tagged  Data
• Gathering  quality  tagged  training  data is  a  common  bottleneck  in  ML
• Expensive
• Quality	
  control	
  is	
  hard,	
  requires	
  second	
  human	
  pass
• Hardly	
  scalable	
  à heavy	
  use	
  of	
  sampling	
  strategies
• How  do  companies  doing  Machine  Learning  get  tagged  data?
• Implicit	
  tagging:	
  customer	
  engagement
• Explicit	
  tagging:	
  manual	
  labor
• A  few  strategies  to  get  tagged  data  for  cheap/free:
• Games	
  (Google	
  Quick	
  Draw)
• Incentivization	
  (extra	
  lives	
  or	
  bonuses	
  in	
  games)
Tagged  Data
• Gathering  quality  tagged  training  data is  a  common  bottleneck  in  ML
• Expensive
• Quality	
  control	
  is	
  hard,	
  requires	
  second	
  human	
  pass
• Hardly	
  scalable	
  à heavy	
  use	
  of	
  sampling	
  strategies
• How  do  companies  doing  Machine  Learning  get  tagged  data?
• Implicit	
  tagging:	
  customer	
  engagement
• Explicit	
  tagging:	
  manual	
  labor
• A  few  strategies  to  get  tagged  data  for  cheap/free:
• Games	
  (Google	
  Quick	
  Draw)
• Incentivization	
  (extra	
  lives	
  or	
  bonuses	
  in	
  games)
Tagged  Data
https://quickdraw.withgoogle.com/
Why	
  human	
  input	
  matters:	
  the	
  use	
  case	
  of	
  image	
  colorization
The  Wisdom  from  the  Crowd
Why	
  human	
  input	
  matters:	
  the	
  use	
  case	
  of	
  image	
  colorization
The  Wisdom  from  the  Crowd
Colorization
Model
à Colorization	
  is	
  straightforward	
  to	
  humans	
  because	
  they	
  can	
  ‘tap’	
  into	
  their	
  general	
  knowledge
The  Wisdom  from  the  Crowd
image  
recognition
watermelon
grapesbananas
pineapple
orange
tagged training	
  data	
  set
“Bananas	
  are	
  generally	
   ”
‘general’	
  knowledge
• obvious	
  for	
  human	
  beings
• fastidious	
  for	
  machines
colorization
Why	
  human	
  input	
  matters:	
  the	
  use	
  case	
  of	
  image	
  colorization
Crowdsourcing:
Human  Wisdom  at  Scale
What  is  Crowdsourcing?
the	
  process	
  of	
  getting	
  labor	
  or	
  funding,	
  usually	
  online,	
  from	
  a	
  crowd	
  of	
  people
Crowdsourcing
What  is  Crowdsourcing?
Ø Crowdsourcing	
  =	
  'crowd'	
  +	
  'outsourcing'	
  
Ø Act	
  of	
  taking	
  a	
  function	
  once	
  performed	
  by	
  employees	
  and	
  
outsourcing	
  it	
  to	
  an	
  undefined	
  (generally	
  large)	
  network	
  of	
  
people	
  in	
  the	
  form	
  of	
  an	
  open	
  call
the	
  process	
  of	
  getting	
  labor	
  or	
  funding,	
  usually	
  online,	
  from	
  a	
  crowd	
  of	
  people
History  of  Crowdsourcing
• Term	
  was	
  first	
  used	
  in	
  2005	
  by	
  the	
  editors	
  at Wired
• Official	
  definition	
  published	
  in	
  Wired	
  article	
  “The	
  Rise	
  of	
  Crowdsourcing”,	
  June	
  2016
• Describes	
  how	
  businesses	
  were	
  using	
  the	
  Internet	
  to	
  “outsource	
  work	
  to	
  the	
  crowd”
What	
  Crowdsourcing	
  helps	
  with:
• Scale	
   à peer-­‐production	
  (for jobs	
  to	
  be	
  performed	
  collaboratively)	
  
• Reach	
   à connect	
  with	
  a	
  large	
  network	
  of	
  potential	
  laborers	
  (if	
  task	
  undertaken	
  by	
  sole	
  individuals)
Crowdsourcing
What  is  Crowdsourcing?
Ø Crowdsourcing	
  =	
  'crowd'	
  +	
  'outsourcing'	
  
Ø Act	
  of	
  taking	
  a	
  function	
  once	
  performed	
  by	
  employees	
  and	
  
outsourcing	
  it	
  to	
  an	
  undefined	
  (generally	
  large)	
  network	
  of	
  
people	
  in	
  the	
  form	
  of	
  an	
  open	
  call
the	
  process	
  of	
  getting	
  labor	
  or	
  funding,	
  usually	
  online,	
  from	
  a	
  crowd	
  of	
  people
History  of  Crowdsourcing
• Term	
  was	
  first	
  used	
  in	
  2005 by	
  the	
  editors	
  at Wired
• Official	
  definition	
  published	
  in	
  Wired	
  article	
  “The	
  Rise	
  of	
  Crowdsourcing”,	
  June	
  2006
• Describes	
  how	
  businesses	
  were	
  using	
  the	
  Internet	
  to	
  “outsource	
  work	
  to	
  the	
  crowd”
What	
  Crowdsourcing	
  helps	
  with:
• Scale	
   à peer-­‐production	
  (for jobs	
  to	
  be	
  performed	
  collaboratively)	
  
• Reach	
   à connect	
  with	
  a	
  large	
  network	
  of	
  potential	
  laborers	
  (if	
  task	
  undertaken	
  by	
  sole	
  individuals)
Crowdsourcing
What  is  Crowdsourcing?
Ø Crowdsourcing	
  =	
  'crowd'	
  +	
  'outsourcing'	
  
Ø Act	
  of	
  taking	
  a	
  function	
  once	
  performed	
  by	
  employees	
  and	
  
outsourcing	
  it	
  to	
  an	
  undefined	
  (generally	
  large)	
  network	
  of	
  
people	
  in	
  the	
  form	
  of	
  an	
  open	
  call
the	
  process	
  of	
  getting	
  labor	
  or	
  funding,	
  usually	
  online,	
  from	
  a	
  crowd	
  of	
  people
Crowdsourcing
History  of  Crowdsourcing
• Term	
  was	
  first	
  used	
  in	
  2005 by	
  the	
  editors	
  at Wired
• Official	
  definition	
  published	
  in	
  Wired	
  article	
  “The	
  Rise	
  of	
  Crowdsourcing”,	
  June	
  2016
• Describes	
  how	
  businesses	
  were	
  using	
  the	
  Internet	
  to	
  “outsource	
  work	
  to	
  the	
  crowd”
What	
  Crowdsourcing	
  helps	
  with:
• Scale	
   à peer-­‐production	
  (for jobs	
  to	
  be	
  performed	
  collaboratively)	
  
• Reach	
   à connect	
  with	
  a	
  large	
  network	
  of	
  potential	
  laborers	
  (if	
  task	
  undertaken	
  by	
  sole	
  individuals)
The  Nature  of  Crowdsourcing
• Data	
  generation: user	
  generated	
  content	
  such	
  as	
  reviews,	
  pictures,	
  translations,	
  etc.
• Data	
  validation:	
  validation	
  of	
  translation,	
  etc.
• Data	
  tagging:	
  image	
  tagging,	
  product	
  categorization,	
  etc.
• Data	
  curation:	
  curation	
  of	
  news	
  feeds,	
  etc.
Microtasks
Funding
Macrotasks
• Solution	
  development:	
  algorithm	
  improvement,	
  etc.
• Crowd	
  contest:	
  design	
  competition,	
  algorithmic	
  competition,	
  etc.
The  Nature  of  Crowdsourcing
• Data	
  generation: user	
  generated	
  content	
  such	
  as	
  reviews,	
  pictures,	
  translations,	
  etc.
• Data	
  validation:	
  validation	
  of	
  translation,	
  etc.
• Data	
  tagging:	
  image	
  tagging,	
  product	
  categorization,	
  etc.
• Data	
  curation:	
  curation	
  of	
  news	
  feeds,	
  etc.
Microtasks
Funding
Macrotasks
• Solution	
  development:	
  algorithm	
  improvement,	
  etc.
• Crowd	
  contest:	
  design	
  competition,	
  algorithmic	
  competition,	
  etc.
The  Nature  of  Crowdsourcing
• Data	
  generation: user	
  generated	
  content	
  such	
  as	
  reviews,	
  pictures,	
  translations,	
  etc.
• Data	
  validation:	
  validation	
  of	
  translation,	
  etc.
• Data	
  tagging:	
  image	
  tagging,	
  product	
  categorization,	
  etc.
• Data	
  curation:	
  curation	
  of	
  news	
  feeds,	
  etc.
Microtasks
Funding
Macrotasks
• Solution	
  development:	
  algorithm	
  improvement,	
  etc.
• Crowd	
  contest:	
  design	
  competition,	
  algorithmic	
  competition,	
  etc.
Some  Cool  Crowdsourcing  Applications
Some  Cool  Crowdsourcing  Applications
Mapping
• Photo	
  Sphere
• Google	
  Maps	
  crowdsources	
  info	
  for	
  
wheelchair-­‐accessible	
  places
Some  Cool  Crowdsourcing  Applications
Mapping
• Photo	
  Sphere
• Google	
  Maps	
  crowdsources	
  info	
  for	
  
wheelchair-­‐accessible	
  places
Traffic
• Google	
  Traffic
• Waze:	
  Traffic	
  reporting	
  app
Some  Cool  Crowdsourcing  Applications
Mapping
• Photo	
  Sphere
• Google	
  Maps	
  crowdsources	
  info	
  for	
  
wheelchair-­‐accessible	
  places
Traffic
• Google	
  Traffic
• Waze:	
  Traffic	
  reporting	
  app
Translation  
• Google	
  Translate
Some  Cool  Crowdsourcing  Applications
Mapping
• Photo	
  Sphere
• Google	
  Maps	
  crowdsources	
  info	
  for	
  
wheelchair-­‐accessible	
  places
Traffic
• Google	
  Traffic
• Waze:	
  Traffic	
  reporting	
  app
Epidemiology
• Flu	
  tracking	
  applications
Translation  
• Google	
  Translate
Companies  Based  on  Crowdsourcing
Quora is	
  a question-­‐and-­‐answer	
  site where	
  questions	
  are	
  asked,	
  
answered,	
  edited	
  and	
  organized	
  by	
  its	
  community	
  of	
  users.
Waze	
  is	
  a	
  community-­‐based	
  traffic	
  and	
  navigation	
  app	
  where	
  drivers	
  
share	
  real-­‐time	
  traffic	
  and	
  road	
  info
Kaggle is	
  a	
  platform	
  for predictive	
  modelling competitions	
  in	
  which	
  
companies	
  post	
  data	
  and	
  data	
  miners	
  compete	
  to	
  produce	
  the	
  best	
  models.
Stack	
  Overflow	
  is	
  a	
  platform	
  for	
  users	
  to	
  ask	
  and	
  answer	
  questions	
  and	
  to	
  
vote	
  questions	
  and	
  answers	
  up	
  or	
  down	
  and	
  edit	
  them.
Flickr is	
  an image	
  and	
  video	
  hosting website that	
  is	
  widely	
  used	
  
by bloggers to	
  host	
  images	
  that	
  they	
  embed	
  in	
  social	
  media.
The  Challenges  of  Crowdsourcing
Reliability  
• Retail: Absence	
  of	
  emotional	
  involvement	
  (judges	
  are	
  not	
  actually	
  spending	
  money	
  on	
  items)
• Waze:	
  Locals	
  were	
  sending	
  fake	
  information	
  to	
  limit	
  traffic	
  in	
  their	
  area
Relevance	
  of	
  knowledge
• Retail:	
  Judges	
  might	
  not	
  have	
  appropriate	
  knowledge	
  of	
  the	
  items	
  they	
  are	
  evaluating
Subjectivity
• Search: Relevance	
  score	
  varies	
  depending	
  on	
  profile	
  and	
  personal	
  preferences
Speed  &  cost
• Human	
  evaluations	
  take	
  time,	
  can	
  only	
  be	
  performed	
  sporadically	
  and	
  on	
  samples
• Not	
  practical	
  for	
  measurement	
  purposes
The  Challenges  of  Crowdsourcing
Reliability  
• Retail: Absence	
  of	
  emotional	
  involvement	
  (judges	
  are	
  not	
  actually	
  spending	
  money	
  on	
  items)
• Waze:	
  Locals	
  were	
  sending	
  fake	
  information	
  to	
  limit	
  traffic	
  in	
  their	
  area
Relevance	
  of	
  knowledge
• Retail:	
  Judges	
  might	
  not	
  have	
  appropriate	
  knowledge	
  of	
  the	
  items	
  they	
  are	
  evaluating
Subjectivity
• Search: Relevance	
  score	
  varies	
  depending	
  on	
  profile	
  and	
  personal	
  preferences
Speed  &  cost
• Human	
  evaluations	
  take	
  time,	
  can	
  only	
  be	
  performed	
  sporadically	
  and	
  on	
  samples
• Not	
  practical	
  for	
  measurement	
  purposes
The  Challenges  of  Crowdsourcing
Reliability  
• Retail: Absence	
  of	
  emotional	
  involvement	
  (judges	
  are	
  not	
  actually	
  spending	
  money	
  on	
  items)
• Waze:	
  Locals	
  were	
  sending	
  fake	
  information	
  to	
  limit	
  traffic	
  in	
  their	
  area
Relevance	
  of	
  knowledge
• Retail:	
  Judges	
  might	
  not	
  have	
  appropriate	
  knowledge	
  of	
  the	
  items	
  they	
  are	
  evaluating
Subjectivity
• Search: Relevance	
  score	
  varies	
  depending	
  on	
  profile	
  and	
  personal	
  preferences
Speed  &  cost
• Human	
  evaluations	
  take	
  time,	
  can	
  only	
  be	
  performed	
  sporadically	
  and	
  on	
  samples
• Not	
  practical	
  for	
  measurement	
  purposes
The  Challenges  of  Crowdsourcing
Reliability  
• Retail: Absence	
  of	
  emotional	
  involvement	
  (judges	
  are	
  not	
  actually	
  spending	
  money	
  on	
  items)
• Waze:	
  Locals	
  were	
  sending	
  fake	
  information	
  to	
  limit	
  traffic	
  in	
  their	
  area
Relevance	
  of	
  knowledge
• Retail:	
  Judges	
  might	
  not	
  have	
  appropriate	
  knowledge	
  of	
  the	
  items	
  they	
  are	
  evaluating
Subjectivity
• Search: Relevance	
  score	
  varies	
  depending	
  on	
  profile	
  and	
  personal	
  preferences
Speed  &  cost
• Human	
  evaluations	
  take	
  time,	
  can	
  only	
  be	
  performed	
  sporadically	
  and	
  on	
  samples
• Not	
  practical	
  for	
  measurement	
  purposes
The  Challenges  of  Crowdsourcing
Crowdsourcing  vs.  Curated  Crowds
Traditional  Crowdsourcing  Model
$$$$$
+ Speed:	
  
• many	
  hands	
  generate	
  light	
  work
+ Lower	
  cost:
• typically	
  a	
  few	
  pennies	
  per	
  task
-­‐ No	
  quality	
  control
-­‐ Lack	
  of	
  control:	
  
• little	
  to	
  no	
  incentive	
  to	
  deliver	
  on	
  time
-­‐ High	
  maintenance:	
  
• clear	
  instructions	
  needed	
  
• automated	
  understanding	
  checks
-­‐ Lower	
  reliability:	
  
• high	
  overlap	
  required
-­‐ Lack	
  of	
  confidentiality:	
  
• anyone	
  can	
  see	
  your	
  tasks
Curated  Crowd
$$$$$
+ Quality	
  control:	
  
• judges	
  submitted	
  to	
  quality	
  metrics	
  
• removed	
  if	
  they	
  don’t	
  deliver	
  required	
  quality
+ Better	
  quality:	
  
• very	
  little	
  overlap	
  needed
+ Expertise:
• judges	
  become	
  experts	
  at	
  required	
  task
+ Constraints	
  on	
  crowd:	
  
• judges	
  less	
  likely	
  to	
  drop	
  out
-­‐ More	
  expensive:
• typically	
  primary	
  source	
  of	
  income	
  for	
  judges
-­‐ Consistency	
  required:	
  
• need	
  frequent	
  tasks	
  to	
  keep	
  sharp	
  skills
Catalog  Curation
• Product	
  Description	
  Curation
• Product	
  Tagging	
  & Categorization
• Product	
  Deduplication
• Taxonomy	
  Testing
Search  Relevance  Evaluation
• Relevance	
  score	
  (query-­‐item	
  pair	
  scores)
• Engine	
  comparison	
  (ranking-­‐to-­‐ranking)
Review  Moderation
• Removal/flagging	
  of	
  obscene	
  reviews
Mystery  Shopping
• Analysis	
  and	
  discovery	
  of	
  new	
  trends	
  
• Evaluation	
  of	
  new	
  products
• Competitive	
  analysis
Crowdsourcing  Applications  in  e-­‐Commerce
Catalog  Curation
• Product	
  Description	
  Curation
• Product	
  Tagging	
  & Categorization
• Product	
  Deduplication
• Taxonomy	
  Testing
Search  Relevance  Evaluation
• Relevance	
  score	
  (query-­‐item	
  pair	
  scores)
• Engine	
  comparison	
  (ranking-­‐to-­‐ranking)
Review  Moderation
• Removal/flagging	
  of	
  obscene	
  reviews
Mystery  Shopping
• Analysis	
  and	
  discovery	
  of	
  new	
  trends	
  
• Evaluation	
  of	
  new	
  products
• Competitive	
  analysis
Crowdsourcing  Applications  in  e-­‐Commerce
The	
  example	
  of	
  Product	
  Tagging
Catalog  Curation
• Product	
  Description	
  Curation
• Product	
  Tagging	
  & Categorization
• Product	
  Deduplication
• Taxonomy	
  Testing
Search  Relevance  Evaluation
• Relevance	
  score	
  (query-­‐item	
  pair	
  scores)
• Engine	
  comparison	
  (ranking-­‐to-­‐ranking)
Review  Moderation
• Removal/flagging	
  of	
  obscene	
  reviews
Mystery  Shopping
• Analysis	
  and	
  discovery	
  of	
  new	
  trends	
  
• Evaluation	
  of	
  new	
  products
• Competitive	
  analysis
Crowdsourcing  Applications  in  e-­‐Commerce
The	
  example	
  of	
  Product	
  Tagging
Catalog  Curation
• Product	
  Description	
  Curation
• Product	
  Tagging	
  & Categorization
• Product	
  Deduplication
• Taxonomy	
  Testing
Search  Relevance  Evaluation
• Relevance	
  score	
  (query-­‐item	
  pair	
  scores)
• Engine	
  comparison	
  (ranking-­‐to-­‐ranking)
Review  Moderation
• Removal/flagging	
  of	
  obscene	
  reviews
Mystery  Shopping
• Analysis	
  and	
  discovery	
  of	
  new	
  trends	
  
• Evaluation	
  of	
  new	
  products
• Competitive	
  analysis
Crowdsourcing  Applications  in  e-­‐Commerce
The	
  example	
  of	
  Product	
  Tagging
Catalog  Curation
• Product	
  Description	
  Curation
• Product	
  Tagging	
  & Categorization
• Product	
  Deduplication
• Taxonomy	
  Testing
Search  Relevance  Evaluation
• Relevance	
  score	
  (query-­‐item	
  pair	
  scores)
• Engine	
  comparison	
  (ranking-­‐to-­‐ranking)
Review  Moderation
• Removal/flagging	
  of	
  obscene	
  reviews
Mystery  Shopping
• Analysis	
  and	
  discovery	
  of	
  new	
  trends	
  
• Evaluation	
  of	
  new	
  products
• Competitive	
  analysis
Crowdsourcing  Applications  in  e-­‐Commerce
The	
  example	
  of	
  Product	
  Tagging
Use  Case:  Evaluation  of  Search  Engine  Relevance
à Human	
  evaluation	
  makes	
  it	
  possible	
  to	
  
measure	
  the	
  intangible	
  with	
  little	
  risk
Ranking  BRanking  A
Side-­‐by-­‐Side  Engine  Comparison
Judge	
  1:
Prefers	
  ranking	
  A
Judge	
  2:
Prefers	
  ranking	
  A
Judge	
  3:
Prefers	
  ranking	
  B
Use  Case:  Evaluation  of  Search  Engine  Relevance
5/5
5/5
5/5
4/5
3/5
2/5
5/5
5/5
5/5
5/5
5/5
5/5
Query-­‐Item  Relevance  Scoring  for  
Measurement  of  Ranking  Quality
𝐷𝐶𝐺$ = &
𝑟𝑒𝑙*
𝑙𝑜𝑔-(𝑖 + 1)
$
*34
𝑛𝐷𝐶𝐺$ =
𝐷𝐶𝐺$
𝐼𝐷𝐶𝐺$
𝐼𝐷𝐶𝐺$ = &
289:; − 1
𝑙𝑜𝑔-(𝑖 + 1)
=>?
*34
where
graded	
  relevance	
  of item at	
  position i
Discounted	
  cumulative	
  gain
Human-­‐in-­‐the-­‐Loop:
When  Human  Beings  still  Outperform  the  Machine
Fact:	
   the	
  brain	
  has 38	
  petaflops (thousand	
  trillion	
  operations	
  per	
  second)	
  
of	
  processing	
  power…
The  Dream  of  Automation
FIRST  REVOLUTION  – 1784
Mechanical	
  production,	
  
railroad,	
  steam	
  power
SECOND  REVOLUTION  – 1870
Mass	
  production,	
  electrical	
  power,	
  
assembly	
  lines
THIRD  REVOLUTION  – 1969
Automated	
  production,	
  electronics,
computers
FOURTH  REVOLUTION  – ongoing
Artificial	
  intelligence,	
  big	
  data
The  4  Industrial  Revolutions
The  Dream  of  Automation
FIRST  REVOLUTION  – 1784
Mechanical	
  production,	
  
railroad,	
  steam	
  power
SECOND  REVOLUTION  – 1870
Mass	
  production,	
  electrical	
  power,	
  
assembly	
  lines
THIRD  REVOLUTION  – 1969
Automated	
  production,	
  electronics,
computers
FOURTH  REVOLUTION  – ongoing
Artificial	
  intelligence,	
  big	
  data
à Automation  is  not  a  new  idea
The  4  Industrial  Revolutions
The  Dream  of  Automation
FIRST  REVOLUTION  – 1784
Mechanical	
  production,	
  
railroad,	
  steam	
  power
SECOND  REVOLUTION  – 1870
Mass	
  production,	
  electrical	
  power,	
  
assembly	
  lines
THIRD  REVOLUTION  – 1969
Automated	
  production,	
  electronics,
computers
FOURTH  REVOLUTION  – ongoing
Artificial	
  intelligence,	
  big	
  data
à Automation  is  not  a  new  idea
The  4  Industrial  Revolutions
the	
  use	
  of	
  various control	
  systems for	
  operating	
  
equipment	
  such	
  as	
  machinery	
  and	
  processes	
  with	
  
minimal	
  or	
  reduced	
  human	
  intervention.
Automation
The  Dream  of  Automation
the	
  use	
  of	
  various control	
  systems for	
  operating	
  
equipment	
  such	
  as	
  machinery	
  and	
  processes	
  with	
  
minimal	
  or	
  reduced	
  human	
  intervention.
FIRST  REVOLUTION  – 1784
Mechanical	
  production,	
  
railroad,	
  steam	
  power
SECOND  REVOLUTION  – 1870
Mass	
  production,	
  electrical	
  power,	
  
assembly	
  lines
THIRD  REVOLUTION  – 1969
Automated	
  production,	
  electronics,
computers
FOURTH  REVOLUTION  – ongoing
Artificial	
  intelligence,	
  big	
  data
Why?
• Automate	
  boring/repetitive	
  tasks
• Perform	
  tasks	
  at	
  scale
• Perform	
  tasks	
  with	
  enhanced	
  precision
• Deliver	
  consistent products
• Use	
  machines	
  where	
  they	
  outperform	
  humans
à Automation  is  not  a  new  idea
The  4  Industrial  Revolutions Automation
When  Full  Automation  can’t  be  Achieved…
Human-­‐in-­‐the-­‐Loop
Human-­in-­the-­loop or HITL is  defined  as  a  model  or  a  system  that  requires  human  interaction
The  idea  of  using  human  beings  to  enhance  the  machine  is  not  new
We	
  have	
  been	
  doing	
  Human-­‐in-­‐the-­‐Loop	
  all	
  along…
• Example:	
  Autopilot	
  technology	
  for	
  planes
Human  intervention/presence  is  useful:
• To	
  handle	
  corner	
  cases	
  (outlier	
  management)
• To	
  “keep	
  an	
  eye”	
  on	
  the	
  system	
  (sanity	
  check)
• To	
  correct	
  unwanted	
  behavior	
  (refinement)
• To	
  validate	
  appropriate	
  behavior	
  (validation)
When  Full  Automation  can’t  be  Achieved…
Human-­in-­the-­loop or HITL is  defined  as  a  model  or  a  system  that  requires  human  interaction
Human-­‐in-­‐the-­‐Loop
The  idea  of  using  human  beings  to  enhance  the  machine  is  not  new
We	
  have	
  been	
  doing	
  Human-­‐in-­‐the-­‐Loop	
  all	
  along…
• Example:	
  Autopilot	
  technology	
  for	
  planes
Human  intervention/presence  is  useful:
• To	
  handle	
  corner	
  cases	
  (outlier	
  management)
• To	
  “keep	
  an	
  eye”	
  on	
  the	
  system	
  (sanity	
  check)
• To	
  correct	
  unwanted	
  behavior	
  (refinement)
• To	
  validate	
  appropriate	
  behavior	
  (validation)
When  Full  Automation  can’t  be  Achieved…
Human-­in-­the-­loop or HITL is  defined  as  a  model  or  a  system  that  requires  human  interaction
Human-­‐in-­‐the-­‐Loop
Human-­‐in-­‐the-­‐Loop  Paradigm
Pareto  Principle
aka	
  the	
  80/20	
  rule,	
  the law	
  of	
  the	
  vital	
  few, or	
  the principle	
  of	
  factor	
  sparsity
-­‐ states	
  that,	
  for	
  many	
  events,	
  roughly	
  80%	
  of	
  the	
  effects	
  come	
  from	
  20%	
  of	
  the	
  causes
ML  version  of  the  Pareto  Principle:  
• Evidence	
  suggests	
  that	
  some	
  of	
  the	
  most	
  accurate	
  ML	
  systems	
  to	
  date need:	
  
• 80%	
  computer	
  AI-­‐driven	
  
• 19%	
  human	
  input
• 1	
  %	
  unknown	
  randomness	
  
to	
  balance	
  things	
  out
• The	
  combination	
  of	
  machine	
  and	
  human	
  intervention	
  achieves	
  maximum	
  machine	
  accuracy
How  can  human  knowledge  be  incorporated  to  ML  models?
A. Helping	
  label	
  the	
  original	
  dataset	
  that	
  will	
  be	
  fed	
  into	
  a	
  ML	
  model
B. Helping	
  correct	
  inaccurate	
  predictions	
  that	
  arise	
  as	
  the	
  system	
  goes	
  live.
Human-­‐in-­‐the-­‐Loop  Paradigm
aka	
  the	
  80/20	
  rule,	
  the law	
  of	
  the	
  vital	
  few, or	
  the principle	
  of	
  factor	
  sparsity
-­‐ states	
  that,	
  for	
  many	
  events,	
  roughly	
  80%	
  of	
  the	
  effects	
  come	
  from	
  20%	
  of	
  the	
  causes
Pareto  Principle
ML  version  of  the  Pareto  Principle:  
• Evidence	
  suggests	
  that	
  some	
  of	
  the	
  most	
  accurate	
  ML	
  systems	
  to	
  date need:	
  
• 80%	
  computer	
  AI-­‐driven	
  
• 19%	
  human	
  input
• 1	
  %	
  unknown	
  randomness	
  
to	
  balance	
  things	
  out
• The	
  combination	
  of	
  machine	
  and	
  human	
  intervention	
  achieves	
  maximum	
  machine	
  accuracy
How  can  human  knowledge  be  incorporated  to  ML  models?
A. Helping	
  label	
  the	
  original	
  dataset	
  that	
  will	
  be	
  fed	
  into	
  a	
  ML	
  model
B. Helping	
  correct	
  inaccurate	
  predictions	
  that	
  arise	
  as	
  the	
  system	
  goes	
  live
Human-­‐in-­‐the-­‐Loop  Paradigm
aka	
  the	
  80/20	
  rule,	
  the law	
  of	
  the	
  vital	
  few, or	
  the principle	
  of	
  factor	
  sparsity
-­‐ states	
  that,	
  for	
  many	
  events,	
  roughly	
  80%	
  of	
  the	
  effects	
  come	
  from	
  20%	
  of	
  the	
  causes
Pareto  Principle
Human-­‐In-­‐The-­‐Loop  Use  Case  #1
An  example  of  HITL  approach:  face  recognition
Human-­‐In-­‐The-­‐Loop  Use  Case  #1
Mary
Roberto
Victoria
LauraSebastian
Cecelia
An  example  of  HITL  approach:  face  recognition
Human-­‐In-­‐The-­‐Loop  Use  Case  #1
Mary
Roberto
Victoria
LauraSebastian
Cecelia
Accuracy
• Facebook's	
  DeepFace Software	
  reaches	
  97.25%	
  of	
  accuracy
HITL  as  a  feedback  loop
• When	
  the	
  confidence	
  is	
  below	
  a	
  certain	
  threshold,	
  it:
• suggests	
  a	
  label
• ask	
  the	
  uploader	
  to	
  validate/approve	
  or	
  correct	
  the	
  
suggestion
• The	
  new	
  data	
  is	
  used	
  to	
  improve	
  the	
  accuracy	
  of	
  the	
  
algorithm
An  example  of  HITL  approach:  face  recognition
Human-­‐In-­‐The-­‐Loop  Use  Case  #1
Mary
Roberto
Victoria
LauraSebastian
Cecelia
Accuracy
• Facebook's	
  DeepFace Software	
  reaches	
  97.25%	
  of	
  accuracy
HITL  as  a  feedback  loop
• When	
  the	
  confidence	
  is	
  below	
  a	
  certain	
  threshold,	
  it:
• suggests a	
  label
• ask	
  the	
  uploader	
  to	
  validate/approve	
  or	
  correct	
  the	
  
suggestion
• The	
  new	
  data	
  is	
  used	
  to	
  improve	
  the	
  accuracy	
  of	
  the	
  
algorithm
An  example  of  HITL  approach:  face  recognition
Human-­‐In-­‐The-­‐Loop  Use  Case  #2
An  example  of  HITL  approach:  autonomous  vehicles
Teaching  the  machine
• Driving	
  systems	
  were	
  trained	
  using	
  a	
  human	
  to	
  oversee	
  the	
  process
Accuracy  considerations
• Autopilot	
  system	
  is	
  now	
  over	
  99%	
  accurate
• However,	
  a	
  99%	
  accuracy	
  means	
  that	
  people	
  can	
  die	
  1%	
  of	
  the	
  time	
  (!!)
• Though	
  we	
  have	
  seen	
  huge	
  advances	
  in	
  accuracy	
  of	
  pure	
  machine-­‐
driven	
  systems,	
  they	
  tend	
  to fall	
  short	
  of	
  acceptable accuracy	
  rates
Human-­‐In-­‐The-­‐Loop  Use  Case  #2
An  example  of  HITL  approach:  autonomous  vehicles
Teaching  the  machine
• Driving	
  systems	
  were	
  trained	
  using	
  a	
  human	
  to	
  oversee	
  the	
  process
Accuracy  considerations
• Autopilot	
  system	
  is	
  now	
  over	
  99%	
  accurate
• However,	
  a	
  99%	
  accuracy	
  means	
  that	
  people	
  can	
  die	
  1%	
  of	
  the	
  time	
  (!!)
• Though	
  we	
  have	
  seen	
  huge	
  advances	
  in	
  accuracy	
  of	
  pure	
  machine-­‐
driven	
  systems,	
  they	
  tend	
  to fall	
  short	
  of	
  acceptable accuracy	
  rates
Human-­‐In-­‐The-­‐Loop  Use  Case  #2
An  example  of  HITL  approach:  autonomous  vehicles
Teaching  the  machine
• Driving	
  systems	
  were	
  trained	
  using	
  a	
  human	
  to	
  oversee	
  the	
  process
Accuracy  considerations
• Autopilot	
  system	
  is	
  now	
  over	
  99%	
  accurate
• However,	
  a	
  99%	
  accuracy	
  means	
  that	
  people	
  can	
  die	
  1%	
  of	
  the	
  time	
  (!!)
• Though	
  we	
  have	
  seen	
  huge	
  advances	
  in	
  accuracy	
  of	
  pure	
  machine-­‐
driven	
  systems,	
  they	
  tend	
  to fall	
  short	
  of	
  acceptable accuracy	
  rates
Human-­‐In-­‐The-­‐Loop  Use  Case  #2
An  example  of  HITL  approach:  autonomous  vehicles
Corner	
  cases
• Fun	
  fact: Volvo’s	
  self-­‐driving	
  cars	
  fail	
  in	
  Australia	
  because	
  of	
  kangaroos
• Reaching	
  100%	
  is	
  hard	
  because	
  of	
  corner	
  cases
• A	
  HITL	
  approach	
  helps	
  get	
  the	
  accuracy	
  to	
  ~100%
• get	
  the	
  accuracy	
  to	
  ~100%
Volvo's	
  driverless	
  cars	
  
'confused'	
  by	
  kangaroos
The  Success  of  Human-­‐In-­‐The-­‐Loop
The  Example  of  Chess
The  Human  vs.  the  Machine
• In	
  1997,	
  Chess	
  Master	
  Garry	
  Kasparov	
  is	
  beaten	
  by	
  IBM	
  supercomputer	
  Deep	
  Blue
The  Success  of  Human-­‐In-­‐The-­‐Loop
The  Example  of  Chess
Garry	
  Kasparov
The  Human  vs.  the  Machine
• In	
  1997,	
  Chess	
  Master	
  Garry	
  Kasparov	
  is	
  beaten	
  by	
  IBM	
  supercomputer	
  Deep	
  Blue
The  Success  of  Human-­‐In-­‐The-­‐Loop
The  Example  of  Chess
Freestyle	
  or	
  “Advanced”	
  Chess
• Advanced:	
  A	
  human	
  chess	
  master	
  works	
  with	
  a	
  computer	
  to	
  find	
  the	
  best	
  possible	
  move	
  
• Freestyle:	
  A	
  team	
  can	
  be	
  made	
  of	
  any	
  combination	
  of	
  human	
  beings	
  +	
  computers
• In	
  2005,	
  Steven	
  Cramton,	
  Zackary	
  Stephen	
  and	
  their	
  3	
  computers	
  win	
  Freestyle	
  Chess	
  Tournament
Why  it  works
• Computers	
  are	
  great	
  at	
  reading	
  tough	
  tactical	
  situations
• But	
  humans	
  are	
  better	
  at	
  understanding	
  long	
  term	
  strategy
• Computers	
  to	
  limit	
  “blunders”	
  while	
  using	
  their	
  intuition	
  to	
  force	
  the	
  opponent	
  into	
  board	
  states	
  that	
  
confuses	
  the	
  computer(s)
Garry	
  Kasparov
The  Human  vs.  the  Machine
• In	
  1997,	
  Chess	
  Master	
  Garry	
  Kasparov	
  is	
  beaten	
  by	
  IBM	
  supercomputer	
  Deep	
  Blue
The  Success  of  Human-­‐In-­‐The-­‐Loop
The  Example  of  Chess
Freestyle	
  or	
  “Advanced”	
  Chess
• Advanced:	
  A	
  human	
  chess	
  master	
  works	
  with	
  a	
  computer	
  to	
  find	
  the	
  best	
  possible	
  move	
  
• Freestyle:	
  A	
  team	
  can	
  be	
  made	
  of	
  any	
  combination	
  of	
  human	
  beings	
  +	
  computers
• In	
  2005,	
  Steven	
  Cramton,	
  Zackary	
  Stephen	
  and	
  their	
  3	
  computers	
  win	
  Freestyle	
  Chess	
  Tournament
Why  it  works
• Computers	
  are	
  great	
  at	
  reading	
  tough	
  tactical	
  situations
• But	
  humans	
  are	
  better	
  at	
  understanding	
  long	
  term	
  strategy
• Computers	
  to	
  limit	
  “blunders”	
  while	
  using	
  their	
  intuition	
  to	
  force	
  the	
  opponent	
  into	
  board	
  states	
  that	
  
confuses	
  the	
  computer(s)
Garry	
  Kasparov
Active  Learning:
The  Best  of  Both  Worlds
Active  Learning
a special case of semi-­‐supervised ML in which a learning algorithm can interactively query the
user (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance
Active  Learning
Active  Learning
a special case of semi-­‐supervised ML in which a learning algorithm can interactively query the
user (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance
General  Strategy
If	
  D	
  is the	
  entire	
  data	
  set,	
  a each	
  iteration i , D is	
  broken	
  up	
  into	
  three	
  subsets
1. DK,	
  i :	
  data	
  points	
  where	
  the	
  label	
  is known
2. DU,	
  i :	
  data	
  points	
  where	
  the	
  label	
  is unknown
3. DQ,	
  i :	
  data	
  points for	
  which	
  the	
  label	
  is	
  queried	
  (sometimes,	
  even	
  when	
  the	
  label	
  is	
  known)
Benefits
• Query	
  labels	
  only	
  when	
  necessary	
  (lower	
  cost)
Next  Generation  Algorithms
• Proactive	
  learning:	
  
• relaxes	
  the	
  assumption	
  that	
  the	
  oracle	
  is	
  always	
  right
• casts	
  the	
  problem	
  as	
  an	
  optimization	
  problem w/	
  a budget	
  constraint
Active  Learning
Active  Learning
a special case of semi-­‐supervised ML in which a learning algorithm can interactively query the
user (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance
General  Strategy
If	
  D	
  is the	
  entire	
  data	
  set,	
  a each	
  iteration i , D is	
  broken	
  up	
  into	
  three	
  subsets
1. DK,	
  i :	
  data	
  points	
  where	
  the	
  label	
  is known
2. DU,	
  i :	
  data	
  points	
  where	
  the	
  label	
  is unknown
3. DQ,	
  i :	
  data	
  points for	
  which	
  the	
  label	
  is	
  queried	
  (sometimes,	
  even	
  when	
  the	
  label	
  is	
  known)
Benefits
• Query	
  labels	
  only	
  when	
  necessary	
  (lower	
  cost)
Next  Generation  Algorithms
• Proactive	
  learning:	
  
• relaxes	
  the	
  assumption	
  that	
  the	
  oracle	
  is	
  always	
  right
• casts	
  the	
  problem	
  as	
  an	
  optimization	
  problem w/	
  a budget	
  constraint
Active  Learning
Active  Learning
a special case of semi-­‐supervised ML in which a learning algorithm can interactively query the
user (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance
General  Strategy
If	
  D	
  is the	
  entire	
  data	
  set,	
  a each	
  iteration i , D is	
  broken	
  up	
  into	
  three	
  subsets
1. DK,	
  i :	
  data	
  points	
  where	
  the	
  label	
  is known
2. DU,	
  i :	
  data	
  points	
  where	
  the	
  label	
  is unknown
3. DQ,	
  i :	
  data	
  points for	
  which	
  the	
  label	
  is	
  queried	
  (sometimes,	
  even	
  when	
  the	
  label	
  is	
  known)
Benefits
• Query	
  labels	
  only	
  when	
  necessary	
  (lower	
  cost)
Next  Generation  Algorithms
• Proactive	
  learning:	
  
• relaxes	
  the	
  assumption	
  that	
  the	
  oracle	
  is	
  always	
  right
• casts	
  the	
  problem	
  as	
  an	
  optimization	
  problem w/	
  a budget	
  constraint
Active  Learning
Active  Learning:  How  does  it  Work?
Active  Learning:  How  does  it  Work?
Machine	
  Learning	
  needs	
  
• Logics	
  (algorithm)
• Data	
  
• Optimization
• Feedback	
  ß Human-­‐in-­‐the-­‐Loop
Active	
  Learning	
  =	
  a	
  Machine	
  Learning	
  Algorithm	
  using	
  
an	
  “oracle”	
  to	
  reduce	
  mistakes/uncertainty
Query	
  Strategy	
  -­‐ Labels	
  are	
  queried	
  when:
• Data	
  points	
  for	
  which	
  model	
  uncertainty	
  is	
  high	
  
(uncertainty	
  sampling)
• Data	
  points	
  for	
  which	
  the	
  different	
  models	
  of	
  an	
  
ensemble	
  method	
  disagree	
  the	
  most	
  
(query	
  by	
  committee)
• Data	
  points	
  causing	
  the	
  most	
  changes	
  on	
  the	
  model
(expected	
  model	
  change)
• Data	
  points	
  caused	
  overall	
  variance	
  to	
  be	
  high
(variance	
  reduction)
Active  Learning:  How  does  it  Work?
Unlabeled  Data
Active	
  
Learning	
  
Algorithm
select/remove	
  
single	
  example
Labeled  Data
Classifier
Oracle
(Human)
update
add	
  labeled	
  
example
provide	
  
correct	
  label
Machine	
  Learning	
  needs	
  
• Logics	
  (algorithm)
• Data	
  
• Optimization
• Feedback	
  ß Human-­‐in-­‐the-­‐Loop
Active	
  Learning	
  =	
  a	
  Machine	
  Learning	
  Algorithm	
  using	
  
an	
  “oracle”	
  to	
  reduce	
  mistakes/uncertainty
Query	
  Strategy	
  -­‐ Labels	
  are	
  queried	
  when:
• Data	
  points	
  for	
  which	
  model	
  uncertainty	
  is	
  high	
  
(uncertainty	
  sampling)
• Data	
  points	
  for	
  which	
  the	
  different	
  models	
  of	
  an	
  
ensemble	
  method	
  disagree	
  the	
  most	
  
(query	
  by	
  committee)
• Data	
  points	
  causing	
  the	
  most	
  changes	
  on	
  the	
  model
(expected	
  model	
  change)
• Data	
  points	
  caused	
  overall	
  variance	
  to	
  be	
  high
(variance	
  reduction)
Active  Learning:  How  does  it  Work?
Unlabeled  Data
Active	
  
Learning	
  
Algorithm
select/remove	
  
single	
  example
Labeled  Data
Classifier
Oracle
(Human)
update
add	
  labeled	
  
example
provide	
  
correct	
  label
Machine	
  Learning	
  needs	
  
• Logics	
  (algorithm)
• Data	
  
• Optimization
• Feedback	
  ß Human-­‐in-­‐the-­‐Loop
Active	
  Learning	
  =	
  a	
  Machine	
  Learning	
  Algorithm	
  using	
  
an	
  “oracle”	
  to	
  reduce	
  mistakes/uncertainty
Query	
  Strategy	
  -­‐ Labels	
  are	
  queried	
  when:
• Data	
  points	
  for	
  which	
  model	
  uncertainty	
  is	
  high	
  
(uncertainty	
  sampling)
• Data	
  points	
  for	
  which	
  the	
  different	
  models	
  of	
  an	
  
ensemble	
  method	
  disagree	
  the	
  most	
  
(query	
  by	
  committee)
• Data	
  points	
  causing	
  the	
  most	
  changes	
  on	
  the	
  model
(expected	
  model	
  change)
• Data	
  points	
  caused	
  overall	
  variance	
  to	
  be	
  high
(variance	
  reduction)
Active  Learning:  How  does  it  Work?
Machine  Learning
Classifier
Confidence	
  
level	
  high?
YES
NO
Output
Annotation  by  
Human  Oracle
Human-­‐in-­‐the-­‐Loop
Active	
  Learning
By  adding  a  human  feedback  loop,  we  allow  the  system  to:  
• actively	
  learn
• correct	
  itself	
  where	
  it	
  got	
  it	
  wrong
• improve	
  the	
  algorithm	
  over	
  iterations
Active  Learning:  How  does  it  Work?
Machine  Learning
Classifier
Confidence	
  
level	
  high?
YES
NO
Output
Annotation  by  
Human  Oracle
Human-­‐in-­‐the-­‐Loop
Active	
  Learning
By  adding  a  human  feedback  loop,  we  allow  the  system  to:  
• actively	
  learn
• correct	
  itself	
  where	
  it	
  got	
  it	
  wrong
• improve	
  the	
  algorithm	
  over	
  iterations
3  Use  Cases  using  Active  Learning  in  the  context  of  Search/Retail
Active  Learning  at  Walmart  e-­‐Commerce
q Machine  Learning  Lifecycle  Management  (Programming  by  Feedback)
• Automatic	
  monitoring	
  of	
  input	
  and	
  output	
  values	
  for	
  ML	
  algorithm
• An	
  algorithm	
  detects	
  failings	
  and	
  outliers	
  in	
  real-­‐time	
  and	
  suggest	
  an	
  action
• A	
  human	
  validates	
  the	
  action,	
  creating	
  tagged	
  data	
  for	
  full	
  automation
q Diagnosis  of  Catalog  Data  Issues  (Reinforcement  Learning)
• Algorithm	
  uncovers	
  demoted	
  items	
  and	
  suggests	
  most	
  likely	
  reason	
  for	
  the	
  demotion
• Engineer	
  manually	
  confirms/corrects	
  the	
  suggestion,	
  generating	
  training	
  data	
  for	
  full	
  automation
q Refinement  of  Query  Tagging  Algorithm  (Optimization)
• Human	
  evaluation	
  team	
  manually	
  measures	
  accuracy	
  of	
  query	
  tagging	
  model
• Mistagged	
  queries	
  are	
  used	
  to	
  discover	
  patterns	
  specific	
  to	
  problematic	
  queries,	
  which	
  are	
  reported	
  to	
  engineers
• Sample	
  is	
  enriched	
  with	
  problematic	
  queries	
  (evaluation	
  team	
  can	
  diagnose	
  problems	
  with	
  algorithms)
3  Use  Cases  using  Active  Learning  in  the  context  of  Search/Retail
Active  Learning  at  Walmart  e-­‐Commerce
q Machine  Learning  Lifecycle  Management  (Programming  by  Feedback)
• Automatic	
  monitoring	
  of	
  input	
  and	
  output	
  values	
  for	
  ML	
  algorithm
• An	
  algorithm	
  detects	
  failings	
  and	
  outliers	
  in	
  real-­‐time	
  and	
  suggest	
  an	
  action
• A	
  human	
  validates	
  the	
  action,	
  creating	
  tagged	
  data	
  for	
  full	
  automation
q Diagnosis  of  Catalog  Data  Issues  (Reinforcement  Learning)
• Algorithm	
  uncovers	
  demoted	
  items	
  and	
  suggests	
  most	
  likely	
  reason	
  for	
  the	
  demotion
• Engineer	
  manually	
  confirms/corrects	
  the	
  suggestion,	
  generating	
  training	
  data	
  for	
  full	
  automation
q Refinement  of  Query  Tagging  Algorithm  (Optimization)
• Human	
  evaluation	
  team	
  manually	
  measures	
  accuracy	
  of	
  query	
  tagging	
  model
• Mistagged	
  queries	
  are	
  used	
  to	
  discover	
  patterns	
  specific	
  to	
  problematic	
  queries,	
  which	
  are	
  reported	
  to	
  engineers
• Sample	
  is	
  enriched	
  with	
  problematic	
  queries	
  (evaluation	
  team	
  can	
  diagnose	
  problems	
  with	
  algorithms)
3  Use  Cases  using  Active  Learning  in  the  context  of  Search/Retail
Active  Learning  at  Walmart  e-­‐Commerce
q Machine  Learning  Lifecycle  Management  (Programming  by  Feedback)
• Automatic	
  monitoring	
  of	
  input	
  and	
  output	
  values	
  for	
  ML	
  algorithm
• An	
  algorithm	
  detects	
  failings	
  and	
  outliers	
  in	
  real-­‐time	
  and	
  suggest	
  an	
  action
• A	
  human	
  validates	
  the	
  action,	
  creating	
  tagged	
  data	
  for	
  full	
  automation
q Diagnosis  of  Catalog  Data  Issues  (Reinforcement  Learning)
• Algorithm	
  uncovers	
  demoted	
  items	
  and	
  suggests	
  most	
  likely	
  reason	
  for	
  the	
  demotion
• Engineer	
  manually	
  confirms/corrects	
  the	
  suggestion,	
  generating	
  training	
  data	
  for	
  full	
  automation
q Refinement  of  Query  Tagging  Algorithm  (Optimization)
• Human	
  evaluation	
  team	
  manually	
  measures	
  accuracy	
  of	
  query	
  tagging	
  model
• Mistagged	
  queries	
  are	
  used	
  to	
  discover	
  patterns	
  specific	
  to	
  problematic	
  queries,	
  which	
  are	
  reported	
  to	
  engineers
• Sample	
  is	
  enriched	
  with	
  problematic	
  queries	
  (evaluation	
  team	
  can	
  diagnose	
  problems	
  with	
  algorithms)
3  Use  Cases  using  Active  Learning  in  the  context  of  Search/Retail
red t-shirt Size M
color product	
  type size
Active  Learning  at  Walmart  e-­‐Commerce
• Why  do  humans  and  machine  complement  each  other?
• Human	
  beings	
  are	
  memory-­‐constrained
• Computers	
  are	
  knowledge-­‐constrained
• Tagged  data  more  important  than  ever
• But	
  getting	
  quality	
  data	
  is	
  challenging	
  given	
  the	
  volume	
  of	
  data
• Crowdsourcing	
  offer	
  more	
  flexibility	
  to	
  tag	
  data	
  at	
  scale
• Human-­‐in-­‐the-­‐Loop  paradigm
• Improve	
  accuracy	
  of	
  machine	
  learning	
  algorithm	
  (classifiers)
• Many	
  examples	
  of	
  successful	
  endeavors	
  using	
  “Augmented	
  Intelligence”
• Active	
  Learning	
  is	
  a	
  booming	
  area	
  of	
  ML/AI
Conclusion  and  Takeaways
• Why  do  humans  and  machine  complement  each  other?
• Human	
  beings	
  are	
  memory-­‐constrained
• Computers	
  are	
  knowledge-­‐constrained
• Tagged  data  more  important  than  ever
• But	
  getting	
  quality	
  data	
  is	
  challenging	
  given	
  the	
  volume	
  of	
  data
• Crowdsourcing	
  offer	
  more	
  flexibility	
  to	
  tag	
  data	
  at	
  scale
• Human-­‐in-­‐the-­‐Loop  paradigm
• Improve	
  accuracy	
  of	
  machine	
  learning	
  algorithm	
  (classifiers)
• Many	
  examples	
  of	
  successful	
  endeavors	
  using	
  “Augmented	
  Intelligence”
• Active	
  Learning	
  is	
  a	
  booming	
  area	
  of	
  ML/AI
Conclusion  and  Takeaways
• Why  do  humans  and  machine  complement  each  other?
• Human	
  beings	
  are	
  memory-­‐constrained
• Computers	
  are	
  knowledge-­‐constrained
• Tagged  data  more  important  than  ever
• But	
  getting	
  quality	
  data	
  is	
  challenging	
  given	
  the	
  volume	
  of	
  data
• Crowdsourcing	
  offer	
  more	
  flexibility	
  to	
  tag	
  data	
  at	
  scale
• Human-­‐in-­‐the-­‐Loop  paradigm
• Improve	
  accuracy	
  of	
  machine	
  learning	
  algorithm	
  (classifiers)
• Many	
  examples	
  of	
  successful	
  endeavors	
  using	
  “Augmented	
  Intelligence”
• Active	
  Learning	
  is	
  a	
  booming	
  area	
  of	
  ML/AI
Conclusion  and  Takeaways
Thank  You!

More Related Content

What's hot

Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph DatabasesMax De Marzi
 
Social media analytics powered by data science
Social media analytics powered by data scienceSocial media analytics powered by data science
Social media analytics powered by data scienceNavin Manaswi
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Big Data Analytics for Banking, a Point of View
Big Data Analytics for Banking, a Point of ViewBig Data Analytics for Banking, a Point of View
Big Data Analytics for Banking, a Point of ViewPietro Leo
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data VisualizationRaffael Marty
 
Big Query Basics
Big Query BasicsBig Query Basics
Big Query BasicsIdo Green
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALASaikiran Panjala
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouseKrish_ver2
 
Netflix Recommender System : Big Data Case Study
Netflix Recommender System : Big Data Case StudyNetflix Recommender System : Big Data Case Study
Netflix Recommender System : Big Data Case StudyKetan Patil
 
Data ware house architecture
Data ware house architectureData ware house architecture
Data ware house architectureDeepak Chaurasia
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
Application of predictive analytics
Application of predictive analyticsApplication of predictive analytics
Application of predictive analyticsPrasad Narasimhan
 

What's hot (20)

Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Social media analytics powered by data science
Social media analytics powered by data scienceSocial media analytics powered by data science
Social media analytics powered by data science
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Big data architecture
Big data architectureBig data architecture
Big data architecture
 
Big Data Analytics for Banking, a Point of View
Big Data Analytics for Banking, a Point of ViewBig Data Analytics for Banking, a Point of View
Big Data Analytics for Banking, a Point of View
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data Visualization
 
What is big data?
What is big data?What is big data?
What is big data?
 
Big Query Basics
Big Query BasicsBig Query Basics
Big Query Basics
 
Big data in telecom
Big data in telecomBig data in telecom
Big data in telecom
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouse
 
Netflix Recommender System : Big Data Case Study
Netflix Recommender System : Big Data Case StudyNetflix Recommender System : Big Data Case Study
Netflix Recommender System : Big Data Case Study
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Data ware house architecture
Data ware house architectureData ware house architecture
Data ware house architecture
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Big data and analytics
Big data and analyticsBig data and analytics
Big data and analytics
 
Application of predictive analytics
Application of predictive analyticsApplication of predictive analytics
Application of predictive analytics
 

Viewers also liked

Lightweight Taxonomy Approaches - Taxonomy Bootcamp 2015
Lightweight Taxonomy Approaches - Taxonomy Bootcamp 2015Lightweight Taxonomy Approaches - Taxonomy Bootcamp 2015
Lightweight Taxonomy Approaches - Taxonomy Bootcamp 2015Jessica DuVerneay
 
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloudA1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloudDr. Wilfred Lin (Ph.D.)
 
Rio Cloud Computing Meetup 25/01/2017 - Lançamentos do AWS re:Invent 2016
Rio Cloud Computing Meetup 25/01/2017 - Lançamentos do AWS re:Invent 2016Rio Cloud Computing Meetup 25/01/2017 - Lançamentos do AWS re:Invent 2016
Rio Cloud Computing Meetup 25/01/2017 - Lançamentos do AWS re:Invent 2016Filipe Barretto
 
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...Splunk
 
High Availability Architecture for Legacy Stuff - a 10.000 feet overview
High Availability Architecture for Legacy Stuff - a 10.000 feet overviewHigh Availability Architecture for Legacy Stuff - a 10.000 feet overview
High Availability Architecture for Legacy Stuff - a 10.000 feet overviewMarco Amado
 
Software Engineering College 6 -timeseries data
Software Engineering College 6 -timeseries dataSoftware Engineering College 6 -timeseries data
Software Engineering College 6 -timeseries dataJurjen Helmus
 
Bim based process mining master thesis presentation
Bim based process mining master thesis presentation Bim based process mining master thesis presentation
Bim based process mining master thesis presentation Stijn van Schaijk
 
Understanding Camouflage
Understanding CamouflageUnderstanding Camouflage
Understanding CamouflageEmily Kissner
 
Data science unit introduction
Data science unit introductionData science unit introduction
Data science unit introductionGregg Barrett
 
5733 a deep dive into IBM Watson Foundation for CSP (WFC)
5733   a deep dive into IBM Watson Foundation for CSP (WFC)5733   a deep dive into IBM Watson Foundation for CSP (WFC)
5733 a deep dive into IBM Watson Foundation for CSP (WFC)Arvind Sathi
 
AI = SE , giip system manage automation with A.I
AI = SE , giip system manage automation with A.IAI = SE , giip system manage automation with A.I
AI = SE , giip system manage automation with A.ILowy Shin
 
Plan de transport 2014: le Brabant Flamand
Plan de transport 2014: le Brabant FlamandPlan de transport 2014: le Brabant Flamand
Plan de transport 2014: le Brabant FlamandSNCB
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
First day of school for sixth grade
First day of school for sixth gradeFirst day of school for sixth grade
First day of school for sixth gradeEmily Kissner
 
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...Lucas Jellema
 
15 oefeningen schakelen van weerstanden
15 oefeningen schakelen van weerstanden15 oefeningen schakelen van weerstanden
15 oefeningen schakelen van weerstandenFreddy Van Eynde
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHBaseCon
 
Channel partners: Get ready for future trends in client solutions
Channel partners: Get ready for future trends in client solutionsChannel partners: Get ready for future trends in client solutions
Channel partners: Get ready for future trends in client solutionsDell World
 

Viewers also liked (20)

Lightweight Taxonomy Approaches - Taxonomy Bootcamp 2015
Lightweight Taxonomy Approaches - Taxonomy Bootcamp 2015Lightweight Taxonomy Approaches - Taxonomy Bootcamp 2015
Lightweight Taxonomy Approaches - Taxonomy Bootcamp 2015
 
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloudA1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
 
Rio Cloud Computing Meetup 25/01/2017 - Lançamentos do AWS re:Invent 2016
Rio Cloud Computing Meetup 25/01/2017 - Lançamentos do AWS re:Invent 2016Rio Cloud Computing Meetup 25/01/2017 - Lançamentos do AWS re:Invent 2016
Rio Cloud Computing Meetup 25/01/2017 - Lançamentos do AWS re:Invent 2016
 
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
 
High Availability Architecture for Legacy Stuff - a 10.000 feet overview
High Availability Architecture for Legacy Stuff - a 10.000 feet overviewHigh Availability Architecture for Legacy Stuff - a 10.000 feet overview
High Availability Architecture for Legacy Stuff - a 10.000 feet overview
 
Software Engineering College 6 -timeseries data
Software Engineering College 6 -timeseries dataSoftware Engineering College 6 -timeseries data
Software Engineering College 6 -timeseries data
 
Bim based process mining master thesis presentation
Bim based process mining master thesis presentation Bim based process mining master thesis presentation
Bim based process mining master thesis presentation
 
Understanding Camouflage
Understanding CamouflageUnderstanding Camouflage
Understanding Camouflage
 
Data science unit introduction
Data science unit introductionData science unit introduction
Data science unit introduction
 
5733 a deep dive into IBM Watson Foundation for CSP (WFC)
5733   a deep dive into IBM Watson Foundation for CSP (WFC)5733   a deep dive into IBM Watson Foundation for CSP (WFC)
5733 a deep dive into IBM Watson Foundation for CSP (WFC)
 
AI = SE , giip system manage automation with A.I
AI = SE , giip system manage automation with A.IAI = SE , giip system manage automation with A.I
AI = SE , giip system manage automation with A.I
 
Plan de transport 2014: le Brabant Flamand
Plan de transport 2014: le Brabant FlamandPlan de transport 2014: le Brabant Flamand
Plan de transport 2014: le Brabant Flamand
 
Bennett raglinphotography
Bennett raglinphotographyBennett raglinphotography
Bennett raglinphotography
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
First day of school for sixth grade
First day of school for sixth gradeFirst day of school for sixth grade
First day of school for sixth grade
 
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
 
Intel and Big Data
Intel and Big DataIntel and Big Data
Intel and Big Data
 
15 oefeningen schakelen van weerstanden
15 oefeningen schakelen van weerstanden15 oefeningen schakelen van weerstanden
15 oefeningen schakelen van weerstanden
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
 
Channel partners: Get ready for future trends in client solutions
Channel partners: Get ready for future trends in client solutionsChannel partners: Get ready for future trends in client solutions
Channel partners: Get ready for future trends in client solutions
 

Similar to Walmart Big Data Expo

Natural Intelligence the human factor in AI
Natural Intelligence the human factor in AINatural Intelligence the human factor in AI
Natural Intelligence the human factor in AIBill Liu
 
Measuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metricMeasuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metricEdward Baker
 
Understanding big data and data analytics big data
Understanding big data and data analytics big dataUnderstanding big data and data analytics big data
Understanding big data and data analytics big dataSeta Wicaksana
 
Introduction Data Science.pptx
Introduction Data Science.pptxIntroduction Data Science.pptx
Introduction Data Science.pptxAkhirulAminulloh2
 
The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure LeskovecThe Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure LeskovecThe Hive
 
Business Analytics and Data mining.pdf
Business Analytics and Data mining.pdfBusiness Analytics and Data mining.pdf
Business Analytics and Data mining.pdfssuser0413ec
 
Understanding big data and data analytics-Business Intelligence
Understanding big data and data analytics-Business IntelligenceUnderstanding big data and data analytics-Business Intelligence
Understanding big data and data analytics-Business IntelligenceSeta Wicaksana
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptxXanGwaps
 
ASA conference Feb 2013
ASA conference Feb 2013ASA conference Feb 2013
ASA conference Feb 2013mrkwr
 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiVijay Susheedran C G
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introductionNeeraj Tewari
 
Lecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and VisualisationLecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and VisualisationMarieke van Erp
 
chương 1 - Tổng quan về khai phá dữ liệu.pdf
chương 1 - Tổng quan về khai phá dữ liệu.pdfchương 1 - Tổng quan về khai phá dữ liệu.pdf
chương 1 - Tổng quan về khai phá dữ liệu.pdfphongnguyen312110237
 
Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise SearchFindwise
 

Similar to Walmart Big Data Expo (20)

Natural Intelligence the human factor in AI
Natural Intelligence the human factor in AINatural Intelligence the human factor in AI
Natural Intelligence the human factor in AI
 
Measuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metricMeasuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metric
 
Understanding big data and data analytics big data
Understanding big data and data analytics big dataUnderstanding big data and data analytics big data
Understanding big data and data analytics big data
 
Big data gaurav
Big data gauravBig data gaurav
Big data gaurav
 
Introduction Data Science.pptx
Introduction Data Science.pptxIntroduction Data Science.pptx
Introduction Data Science.pptx
 
The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure LeskovecThe Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
The Hive Think Tank: Machine Learning at Pinterest by Jure Leskovec
 
Business Analytics and Data mining.pdf
Business Analytics and Data mining.pdfBusiness Analytics and Data mining.pdf
Business Analytics and Data mining.pdf
 
Dlf 2012
Dlf 2012Dlf 2012
Dlf 2012
 
Data Mining Lecture_1.pptx
Data Mining Lecture_1.pptxData Mining Lecture_1.pptx
Data Mining Lecture_1.pptx
 
Understanding big data and data analytics-Business Intelligence
Understanding big data and data analytics-Business IntelligenceUnderstanding big data and data analytics-Business Intelligence
Understanding big data and data analytics-Business Intelligence
 
Lecture4 Social Web
Lecture4 Social Web Lecture4 Social Web
Lecture4 Social Web
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
 
ASA conference Feb 2013
ASA conference Feb 2013ASA conference Feb 2013
ASA conference Feb 2013
 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in Chennai
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introduction
 
Lecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and VisualisationLecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and Visualisation
 
datamining-lect1.pptx
datamining-lect1.pptxdatamining-lect1.pptx
datamining-lect1.pptx
 
chương 1 - Tổng quan về khai phá dữ liệu.pdf
chương 1 - Tổng quan về khai phá dữ liệu.pdfchương 1 - Tổng quan về khai phá dữ liệu.pdf
chương 1 - Tổng quan về khai phá dữ liệu.pdf
 
Digital Economics
Digital EconomicsDigital Economics
Digital Economics
 
Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise Search
 

More from BigDataExpo

Centric - Jaap huisprijzen, GTST, The Bold, IKEA en IENS. Zomaar wat toepassi...
Centric - Jaap huisprijzen, GTST, The Bold, IKEA en IENS. Zomaar wat toepassi...Centric - Jaap huisprijzen, GTST, The Bold, IKEA en IENS. Zomaar wat toepassi...
Centric - Jaap huisprijzen, GTST, The Bold, IKEA en IENS. Zomaar wat toepassi...BigDataExpo
 
Google Cloud - Google's vision on AI
Google Cloud - Google's vision on AIGoogle Cloud - Google's vision on AI
Google Cloud - Google's vision on AIBigDataExpo
 
Pacmed - Machine Learning in health care: opportunities and challanges in pra...
Pacmed - Machine Learning in health care: opportunities and challanges in pra...Pacmed - Machine Learning in health care: opportunities and challanges in pra...
Pacmed - Machine Learning in health care: opportunities and challanges in pra...BigDataExpo
 
PGGM - The Future Explore
PGGM - The Future ExplorePGGM - The Future Explore
PGGM - The Future ExploreBigDataExpo
 
Universiteit Utrecht & gghdc - Wat zijn de gezondheidseffecten van omgeving e...
Universiteit Utrecht & gghdc - Wat zijn de gezondheidseffecten van omgeving e...Universiteit Utrecht & gghdc - Wat zijn de gezondheidseffecten van omgeving e...
Universiteit Utrecht & gghdc - Wat zijn de gezondheidseffecten van omgeving e...BigDataExpo
 
Rob van Kranenburg - Kunnen we ons een sociaal krediet systeem zoals in het o...
Rob van Kranenburg - Kunnen we ons een sociaal krediet systeem zoals in het o...Rob van Kranenburg - Kunnen we ons een sociaal krediet systeem zoals in het o...
Rob van Kranenburg - Kunnen we ons een sociaal krediet systeem zoals in het o...BigDataExpo
 
OrangeNXT - High accuracy mapping from videos for efficient fiber optic cable...
OrangeNXT - High accuracy mapping from videos for efficient fiber optic cable...OrangeNXT - High accuracy mapping from videos for efficient fiber optic cable...
OrangeNXT - High accuracy mapping from videos for efficient fiber optic cable...BigDataExpo
 
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AIDynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AIBigDataExpo
 
Teleperformance - Smart personalized service door het gebruik van Data Science
Teleperformance - Smart personalized service door het gebruik van Data Science Teleperformance - Smart personalized service door het gebruik van Data Science
Teleperformance - Smart personalized service door het gebruik van Data Science BigDataExpo
 
FunXtion - Interactive Digital Fitness with Data Analytics
FunXtion - Interactive Digital Fitness with Data AnalyticsFunXtion - Interactive Digital Fitness with Data Analytics
FunXtion - Interactive Digital Fitness with Data AnalyticsBigDataExpo
 
fashionTrade - Vroeger noemde we dat Big Data
fashionTrade - Vroeger noemde we dat Big DatafashionTrade - Vroeger noemde we dat Big Data
fashionTrade - Vroeger noemde we dat Big DataBigDataExpo
 
BigData Republic - Industrializing data science: a view from the trenches
BigData Republic - Industrializing data science: a view from the trenchesBigData Republic - Industrializing data science: a view from the trenches
BigData Republic - Industrializing data science: a view from the trenchesBigDataExpo
 
Bicos - Hear how a top sportswear company produced cutting-edge data infrastr...
Bicos - Hear how a top sportswear company produced cutting-edge data infrastr...Bicos - Hear how a top sportswear company produced cutting-edge data infrastr...
Bicos - Hear how a top sportswear company produced cutting-edge data infrastr...BigDataExpo
 
Endrse - Next level online samenwerkingen tussen personalities en merken met ...
Endrse - Next level online samenwerkingen tussen personalities en merken met ...Endrse - Next level online samenwerkingen tussen personalities en merken met ...
Endrse - Next level online samenwerkingen tussen personalities en merken met ...BigDataExpo
 
Bovag - Refine-IT - Proces optimalisatie in de automotive sector
Bovag - Refine-IT - Proces optimalisatie in de automotive sectorBovag - Refine-IT - Proces optimalisatie in de automotive sector
Bovag - Refine-IT - Proces optimalisatie in de automotive sectorBigDataExpo
 
Schiphol - Optimale doorstroom van passagiers op Schiphol dankzij slimme data...
Schiphol - Optimale doorstroom van passagiers op Schiphol dankzij slimme data...Schiphol - Optimale doorstroom van passagiers op Schiphol dankzij slimme data...
Schiphol - Optimale doorstroom van passagiers op Schiphol dankzij slimme data...BigDataExpo
 
Veco - Big Data in de Supply Chain: Hoe Process Mining kan helpen kosten te r...
Veco - Big Data in de Supply Chain: Hoe Process Mining kan helpen kosten te r...Veco - Big Data in de Supply Chain: Hoe Process Mining kan helpen kosten te r...
Veco - Big Data in de Supply Chain: Hoe Process Mining kan helpen kosten te r...BigDataExpo
 
Rabobank - There is something about Data
Rabobank - There is something about DataRabobank - There is something about Data
Rabobank - There is something about DataBigDataExpo
 
VU Amsterdam - Big data en datagedreven waardecreatie: valt er nog iets te ki...
VU Amsterdam - Big data en datagedreven waardecreatie: valt er nog iets te ki...VU Amsterdam - Big data en datagedreven waardecreatie: valt er nog iets te ki...
VU Amsterdam - Big data en datagedreven waardecreatie: valt er nog iets te ki...BigDataExpo
 
Booking.com - Data science and experimentation at Booking.com: a data-driven ...
Booking.com - Data science and experimentation at Booking.com: a data-driven ...Booking.com - Data science and experimentation at Booking.com: a data-driven ...
Booking.com - Data science and experimentation at Booking.com: a data-driven ...BigDataExpo
 

More from BigDataExpo (20)

Centric - Jaap huisprijzen, GTST, The Bold, IKEA en IENS. Zomaar wat toepassi...
Centric - Jaap huisprijzen, GTST, The Bold, IKEA en IENS. Zomaar wat toepassi...Centric - Jaap huisprijzen, GTST, The Bold, IKEA en IENS. Zomaar wat toepassi...
Centric - Jaap huisprijzen, GTST, The Bold, IKEA en IENS. Zomaar wat toepassi...
 
Google Cloud - Google's vision on AI
Google Cloud - Google's vision on AIGoogle Cloud - Google's vision on AI
Google Cloud - Google's vision on AI
 
Pacmed - Machine Learning in health care: opportunities and challanges in pra...
Pacmed - Machine Learning in health care: opportunities and challanges in pra...Pacmed - Machine Learning in health care: opportunities and challanges in pra...
Pacmed - Machine Learning in health care: opportunities and challanges in pra...
 
PGGM - The Future Explore
PGGM - The Future ExplorePGGM - The Future Explore
PGGM - The Future Explore
 
Universiteit Utrecht & gghdc - Wat zijn de gezondheidseffecten van omgeving e...
Universiteit Utrecht & gghdc - Wat zijn de gezondheidseffecten van omgeving e...Universiteit Utrecht & gghdc - Wat zijn de gezondheidseffecten van omgeving e...
Universiteit Utrecht & gghdc - Wat zijn de gezondheidseffecten van omgeving e...
 
Rob van Kranenburg - Kunnen we ons een sociaal krediet systeem zoals in het o...
Rob van Kranenburg - Kunnen we ons een sociaal krediet systeem zoals in het o...Rob van Kranenburg - Kunnen we ons een sociaal krediet systeem zoals in het o...
Rob van Kranenburg - Kunnen we ons een sociaal krediet systeem zoals in het o...
 
OrangeNXT - High accuracy mapping from videos for efficient fiber optic cable...
OrangeNXT - High accuracy mapping from videos for efficient fiber optic cable...OrangeNXT - High accuracy mapping from videos for efficient fiber optic cable...
OrangeNXT - High accuracy mapping from videos for efficient fiber optic cable...
 
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AIDynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
 
Teleperformance - Smart personalized service door het gebruik van Data Science
Teleperformance - Smart personalized service door het gebruik van Data Science Teleperformance - Smart personalized service door het gebruik van Data Science
Teleperformance - Smart personalized service door het gebruik van Data Science
 
FunXtion - Interactive Digital Fitness with Data Analytics
FunXtion - Interactive Digital Fitness with Data AnalyticsFunXtion - Interactive Digital Fitness with Data Analytics
FunXtion - Interactive Digital Fitness with Data Analytics
 
fashionTrade - Vroeger noemde we dat Big Data
fashionTrade - Vroeger noemde we dat Big DatafashionTrade - Vroeger noemde we dat Big Data
fashionTrade - Vroeger noemde we dat Big Data
 
BigData Republic - Industrializing data science: a view from the trenches
BigData Republic - Industrializing data science: a view from the trenchesBigData Republic - Industrializing data science: a view from the trenches
BigData Republic - Industrializing data science: a view from the trenches
 
Bicos - Hear how a top sportswear company produced cutting-edge data infrastr...
Bicos - Hear how a top sportswear company produced cutting-edge data infrastr...Bicos - Hear how a top sportswear company produced cutting-edge data infrastr...
Bicos - Hear how a top sportswear company produced cutting-edge data infrastr...
 
Endrse - Next level online samenwerkingen tussen personalities en merken met ...
Endrse - Next level online samenwerkingen tussen personalities en merken met ...Endrse - Next level online samenwerkingen tussen personalities en merken met ...
Endrse - Next level online samenwerkingen tussen personalities en merken met ...
 
Bovag - Refine-IT - Proces optimalisatie in de automotive sector
Bovag - Refine-IT - Proces optimalisatie in de automotive sectorBovag - Refine-IT - Proces optimalisatie in de automotive sector
Bovag - Refine-IT - Proces optimalisatie in de automotive sector
 
Schiphol - Optimale doorstroom van passagiers op Schiphol dankzij slimme data...
Schiphol - Optimale doorstroom van passagiers op Schiphol dankzij slimme data...Schiphol - Optimale doorstroom van passagiers op Schiphol dankzij slimme data...
Schiphol - Optimale doorstroom van passagiers op Schiphol dankzij slimme data...
 
Veco - Big Data in de Supply Chain: Hoe Process Mining kan helpen kosten te r...
Veco - Big Data in de Supply Chain: Hoe Process Mining kan helpen kosten te r...Veco - Big Data in de Supply Chain: Hoe Process Mining kan helpen kosten te r...
Veco - Big Data in de Supply Chain: Hoe Process Mining kan helpen kosten te r...
 
Rabobank - There is something about Data
Rabobank - There is something about DataRabobank - There is something about Data
Rabobank - There is something about Data
 
VU Amsterdam - Big data en datagedreven waardecreatie: valt er nog iets te ki...
VU Amsterdam - Big data en datagedreven waardecreatie: valt er nog iets te ki...VU Amsterdam - Big data en datagedreven waardecreatie: valt er nog iets te ki...
VU Amsterdam - Big data en datagedreven waardecreatie: valt er nog iets te ki...
 
Booking.com - Data science and experimentation at Booking.com: a data-driven ...
Booking.com - Data science and experimentation at Booking.com: a data-driven ...Booking.com - Data science and experimentation at Booking.com: a data-driven ...
Booking.com - Data science and experimentation at Booking.com: a data-driven ...
 

Recently uploaded

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 

Recently uploaded (20)

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 

Walmart Big Data Expo

  • 1. Natural  Intelligence:   the  Human  Factor  in  A.I. Big  Data  Expo  2017 Utrecht,  Netherlands
  • 2. About  Me • Former  Member  of  the  Search  team  at  @WalmartLabs • Former  Head  of  Metrics  &  Measurements  team • I  also  led  the  Human  Evaluation  team • About  the  Metrics  and  Measurements  team • A  team  of  engineers,  analysts  and  scientists  in  charge  of providing  accurate  and  exhaustive  measurements • we  also  had  an  auditing  role  towards  adjacent  teams • What  do  we  measure? • Engineering  metrics  related  to  model  and  data  quality • Business  metrics  (revenue,  etc.) • More  exotic  customer-­‐centric  metrics   (customer  value,  customer  satisfaction,  model  impact,  etc.) • Currently  Head  of  Data  Science  at  Atlassian • In  charge  of  the  Search  &  Smarts  team
  • 3. About  Me • Former  Member  of  the  Search  team  at  @WalmartLabs • Former  Head  of  Metrics  &  Measurements  team • I  also  led  the  Human  Evaluation  team • About  the  Metrics  and  Measurements  team • A  team  of  engineers,  analysts  and  scientists  in  charge  of providing  accurate and  exhaustive measurements • we  also  had  an  auditing  role  towards  adjacent  teams • What  do  we  measure? • Engineering  metrics  related  to  model  and  data  quality • Business  metrics  (revenue,  etc.) • More  exotic  customer-­‐centric  metrics   (customer  value,  customer  satisfaction,  model  impact,  etc.) • Currently  Head  of  Data  Science  at  Atlassian • In  charge  of  the  Search  &  Smarts  team
  • 4. About  Me • Former  Member  of  the  Search  team  at  @WalmartLabs • Former  Head  of  Metrics  &  Measurements  team • I  also  led  the  Human  Evaluation  team • About  the  Metrics  and  Measurements  team • A  team  of  engineers,  analysts  and  scientists  in  charge  of providing  accurate and  exhaustive measurements • we  also  had  an  auditing  role  towards  adjacent  teams • What  do  we  measure? • Engineering  metrics  related  to  model  and  data  quality • Business  metrics  (revenue,  etc.) • More  exotic  customer-­‐centric  metrics   (customer  value,  customer  satisfaction,  model  impact,  etc.) • Currently  Head  of  Data  Science  at  Atlassian • In  charge  of  the  Search  &  Smarts  team
  • 5. About  Me • Former  Member  of  the  Search  team  at  @WalmartLabs • Former  Head  of  Metrics  &  Measurements  team • I  also  led  the  Human  Evaluation  team • About  the  Metrics  and  Measurements  team • A  team  of  engineers,  analysts  and  scientists  in  charge  of providing  accurate and  exhaustive measurements • we  also  had  an  auditing  role  towards  adjacent  teams • What  do  we  measure? • Engineering  metrics  related  to  model  and  data  quality • Business  metrics  (revenue,  etc.) • More  exotic  customer-­‐centric  metrics   (customer  value,  customer  satisfaction,  model  impact,  etc.) • Currently  Head  of  Data  Science  at  Atlassian • In  charge  of  the  Search  &  Smarts  team
  • 6. q Humans  &  Big  Data • The  role  of  human  beings  in  the  era  of  Big  Data • Why  do  we  need  to  tag  data? • How  to  get  tagged  data? q The  Era  of  Crowdsourcing • What  is  Crowdsourcing? • Use  cases  and  details  about  Crowdsourcing • Traditional  crowds  vs.  curated  crowds q The  Human-­‐in-­‐the-­‐Loop  Paradigm • Definition  and  details  about  Human-­‐In-­‐The-­‐Loop  ML • Introduction  to  Active  Learning Outline
  • 7. q Humans  &  Big  Data • The  role  of  human  beings  in  the  era  of  Big  Data • Why  do  we  need  to  tag  data? • How  to  get  tagged  data? q The  Era  of  Crowdsourcing • What  is  Crowdsourcing? • Use  cases  and  details  about  Crowdsourcing • Traditional  crowds  vs.  curated  crowds q The  Human-­‐in-­‐the-­‐Loop  Paradigm • Definition  and  details  about  Human-­‐In-­‐The-­‐Loop  ML • Introduction  to  Active  Learning Outline
  • 8. q Humans  &  Big  Data • The  role  of  human  beings  in  the  era  of  Big  Data • Why  do  we  need  to  tag  data? • How  to  get  tagged  data? q The  Era  of  Crowdsourcing • What  is  Crowdsourcing? • Use  cases  and  details  about  Crowdsourcing • Traditional  crowds  vs.  curated  crowds q The  Human-­‐in-­‐the-­‐Loop  Paradigm • Definition  and  details  about  Human-­‐In-­‐The-­‐Loop  ML • Introduction  to  Active  Learning Outline
  • 9. Humans  &  Big  Data: The  Role  of  Human  Beings  in  the  Era  of   Machine  Learning
  • 10. The  Era  of  Very  Big  Data q VOLUME • More  data created  from  2013  to  2015  than  in  the  entire  previous  history  of  the  human  race • By  2020,  accumulated  data  will  reach  44 trillion gigabytes q VELOCITY • By  2020,  ~1.7  MB of  new  data  /  second  /  human  being • 1.2  trillion  search  queries  on  Google  per  year q VARIETY • 31  million  messages/2.8  million  videos per  minute  on  Facebook • Up  to 300  hours of  video  /  minute  are  uploaded  to  YouTube • In  2015, 1  trillion  photos taken;  billions  shared  online data  center  at  Google
  • 11. The  Era  of  Very  Big  Data q VOLUME • More  data created  from  2013  to  2015  than  in  the  entire  previous  history  of  the  human  race • By  2020,  accumulated  data  will  reach  44 trillion gigabytes q VELOCITY • By  2020,  ~1.7  MB of  new  data  /  second  /  human  being • 1.2  trillion  search  queries  on  Google  per  year q VARIETY • 31  million  messages/2.8  million  videos per  minute  on  Facebook • Up  to 300  hours of  video  /  minute  are  uploaded  to  YouTube • In  2015, 1  trillion  photos taken;  billions  shared  online data  center  at  Google
  • 12. The  Era  of  Very  Big  Data q VOLUME • More  data created  from  2013  to  2015  than  in  the  entire  previous  history  of  the  human  race • By  2020,  accumulated  data  will  reach  44 trillion gigabytes q VELOCITY • By  2020,  ~1.7  MB of  new  data  /  second  /  human  being • 1.2  trillion  search  queries  on  Google  per  year q VARIETY • 31  million  messages/2.8  million  videos per  minute  on  Facebook • Up  to 300  hours of  video  /  minute  are  uploaded  to  YouTube • In  2015, 1  trillion  photos taken;  billions  shared  online data  center  at  Google
  • 13. The  Era  of  Very  Big  Data q VOLUME • More  data created  from  2013  to  2015  than  in  the  entire  previous  history  of  the  human  race • By  2020,  accumulated  data  will  reach  44 trillion gigabytes q VELOCITY • By  2020,  ~1.7  MB of  new  data  /  second  /  human  being • 1.2  trillion  search  queries  on  Google  per  year q VARIETY • 31  million  messages/2.8  million  videos per  minute  on  Facebook • Up  to 300  hours of  video  /  minute  are  uploaded  to  YouTube • In  2015, 1  trillion  photos taken;  billions  shared  online data  center  at  Google
  • 14. Supervised  vs.  Unsupervised  Machine  Learning Supervised  ML requires  tagged  data • Classification:   problem  where  the  output  variable  is  a  category examples:  SVM,  random  forest,  Bayesian  classifiers • Regression:   problem  where  the  output  variable  is  a  real  value examples:  linear  regression,  random  forest
  • 15. Supervised  vs.  Unsupervised  Machine  Learning Supervised  ML requires  tagged  data Unsupervised  ML doesn’t  require  tagged  data • Classification:   problem  where  the  output  variable  is  a  category examples:  SVM,  random  forest,  Bayesian  classifiers • Regression:   problem  where  the  output  variable  is  a  real  value examples:  linear  regression,  random  forest • Clustering: discovery of inherent groupings in the data examples: k-­‐means, k-­‐nearest neighbors • Association rules: discovery of rules describing the data example: Apriori algorithm
  • 16. Supervised  vs.  Unsupervised  Machine  Learning Supervised  ML requires  tagged  data Unsupervised  ML doesn’t  require  tagged  data Supervised: • Image  Recognition • Speech  Recognition Unsupervised • Feature  Learning • Autoencoders • Classification:   problem  where  the  output  variable  is  a  category examples:  SVM,  random  forest,  Bayesian  classifiers • Regression:   problem  where  the  output  variable  is  a  real  value examples:  linear  regression,  random  forest • Clustering: discovery of inherent groupings in the data examples: k-­‐means, k-­‐nearest neighbors • Association rules: discovery of rules describing the data example: Apriori algorithm The  Case  of  Deep  Learning both  supervised  and  unsupervised  applications NB:  Deep  Learning  algorithms   are  data-­‐greedy…
  • 17. • Gathering  quality  tagged  training  data is  a  common  bottleneck  in  ML • Expensive • Quality  control  is  hard,  requires  second  human  pass • Hardly  scalable  à heavy  use  of  sampling  strategies • How  do  companies  doing  Machine  Learning  get  tagged  data? • Implicit  tagging:  customer  engagement • Explicit  tagging:  manual  labor • A  few  strategies  to  get  tagged  data  for  cheap/free: • Games  (Google  Quick  Draw) • Incentivization  (extra  lives  or  bonuses  in  games) Tagged  Data
  • 18. • Gathering  quality  tagged  training  data is  a  common  bottleneck  in  ML • Expensive • Quality  control  is  hard,  requires  second  human  pass • Hardly  scalable  à heavy  use  of  sampling  strategies • How  do  companies  doing  Machine  Learning  get  tagged  data? • Implicit  tagging:  customer  engagement • Explicit  tagging:  manual  labor • A  few  strategies  to  get  tagged  data  for  cheap/free: • Games  (Google  Quick  Draw) • Incentivization  (extra  lives  or  bonuses  in  games) Tagged  Data
  • 19. • Gathering  quality  tagged  training  data is  a  common  bottleneck  in  ML • Expensive • Quality  control  is  hard,  requires  second  human  pass • Hardly  scalable  à heavy  use  of  sampling  strategies • How  do  companies  doing  Machine  Learning  get  tagged  data? • Implicit  tagging:  customer  engagement • Explicit  tagging:  manual  labor • A  few  strategies  to  get  tagged  data  for  cheap/free: • Games  (Google  Quick  Draw) • Incentivization  (extra  lives  or  bonuses  in  games) Tagged  Data https://quickdraw.withgoogle.com/
  • 20. Why  human  input  matters:  the  use  case  of  image  colorization The  Wisdom  from  the  Crowd
  • 21. Why  human  input  matters:  the  use  case  of  image  colorization The  Wisdom  from  the  Crowd Colorization Model à Colorization  is  straightforward  to  humans  because  they  can  ‘tap’  into  their  general  knowledge
  • 22. The  Wisdom  from  the  Crowd image   recognition watermelon grapesbananas pineapple orange tagged training  data  set “Bananas  are  generally   ” ‘general’  knowledge • obvious  for  human  beings • fastidious  for  machines colorization Why  human  input  matters:  the  use  case  of  image  colorization
  • 24. What  is  Crowdsourcing? the  process  of  getting  labor  or  funding,  usually  online,  from  a  crowd  of  people Crowdsourcing
  • 25. What  is  Crowdsourcing? Ø Crowdsourcing  =  'crowd'  +  'outsourcing'   Ø Act  of  taking  a  function  once  performed  by  employees  and   outsourcing  it  to  an  undefined  (generally  large)  network  of   people  in  the  form  of  an  open  call the  process  of  getting  labor  or  funding,  usually  online,  from  a  crowd  of  people History  of  Crowdsourcing • Term  was  first  used  in  2005  by  the  editors  at Wired • Official  definition  published  in  Wired  article  “The  Rise  of  Crowdsourcing”,  June  2016 • Describes  how  businesses  were  using  the  Internet  to  “outsource  work  to  the  crowd” What  Crowdsourcing  helps  with: • Scale   à peer-­‐production  (for jobs  to  be  performed  collaboratively)   • Reach   à connect  with  a  large  network  of  potential  laborers  (if  task  undertaken  by  sole  individuals) Crowdsourcing
  • 26. What  is  Crowdsourcing? Ø Crowdsourcing  =  'crowd'  +  'outsourcing'   Ø Act  of  taking  a  function  once  performed  by  employees  and   outsourcing  it  to  an  undefined  (generally  large)  network  of   people  in  the  form  of  an  open  call the  process  of  getting  labor  or  funding,  usually  online,  from  a  crowd  of  people History  of  Crowdsourcing • Term  was  first  used  in  2005 by  the  editors  at Wired • Official  definition  published  in  Wired  article  “The  Rise  of  Crowdsourcing”,  June  2006 • Describes  how  businesses  were  using  the  Internet  to  “outsource  work  to  the  crowd” What  Crowdsourcing  helps  with: • Scale   à peer-­‐production  (for jobs  to  be  performed  collaboratively)   • Reach   à connect  with  a  large  network  of  potential  laborers  (if  task  undertaken  by  sole  individuals) Crowdsourcing
  • 27. What  is  Crowdsourcing? Ø Crowdsourcing  =  'crowd'  +  'outsourcing'   Ø Act  of  taking  a  function  once  performed  by  employees  and   outsourcing  it  to  an  undefined  (generally  large)  network  of   people  in  the  form  of  an  open  call the  process  of  getting  labor  or  funding,  usually  online,  from  a  crowd  of  people Crowdsourcing History  of  Crowdsourcing • Term  was  first  used  in  2005 by  the  editors  at Wired • Official  definition  published  in  Wired  article  “The  Rise  of  Crowdsourcing”,  June  2016 • Describes  how  businesses  were  using  the  Internet  to  “outsource  work  to  the  crowd” What  Crowdsourcing  helps  with: • Scale   à peer-­‐production  (for jobs  to  be  performed  collaboratively)   • Reach   à connect  with  a  large  network  of  potential  laborers  (if  task  undertaken  by  sole  individuals)
  • 28. The  Nature  of  Crowdsourcing • Data  generation: user  generated  content  such  as  reviews,  pictures,  translations,  etc. • Data  validation:  validation  of  translation,  etc. • Data  tagging:  image  tagging,  product  categorization,  etc. • Data  curation:  curation  of  news  feeds,  etc. Microtasks Funding Macrotasks • Solution  development:  algorithm  improvement,  etc. • Crowd  contest:  design  competition,  algorithmic  competition,  etc.
  • 29. The  Nature  of  Crowdsourcing • Data  generation: user  generated  content  such  as  reviews,  pictures,  translations,  etc. • Data  validation:  validation  of  translation,  etc. • Data  tagging:  image  tagging,  product  categorization,  etc. • Data  curation:  curation  of  news  feeds,  etc. Microtasks Funding Macrotasks • Solution  development:  algorithm  improvement,  etc. • Crowd  contest:  design  competition,  algorithmic  competition,  etc.
  • 30. The  Nature  of  Crowdsourcing • Data  generation: user  generated  content  such  as  reviews,  pictures,  translations,  etc. • Data  validation:  validation  of  translation,  etc. • Data  tagging:  image  tagging,  product  categorization,  etc. • Data  curation:  curation  of  news  feeds,  etc. Microtasks Funding Macrotasks • Solution  development:  algorithm  improvement,  etc. • Crowd  contest:  design  competition,  algorithmic  competition,  etc.
  • 31. Some  Cool  Crowdsourcing  Applications
  • 32. Some  Cool  Crowdsourcing  Applications Mapping • Photo  Sphere • Google  Maps  crowdsources  info  for   wheelchair-­‐accessible  places
  • 33. Some  Cool  Crowdsourcing  Applications Mapping • Photo  Sphere • Google  Maps  crowdsources  info  for   wheelchair-­‐accessible  places Traffic • Google  Traffic • Waze:  Traffic  reporting  app
  • 34. Some  Cool  Crowdsourcing  Applications Mapping • Photo  Sphere • Google  Maps  crowdsources  info  for   wheelchair-­‐accessible  places Traffic • Google  Traffic • Waze:  Traffic  reporting  app Translation   • Google  Translate
  • 35. Some  Cool  Crowdsourcing  Applications Mapping • Photo  Sphere • Google  Maps  crowdsources  info  for   wheelchair-­‐accessible  places Traffic • Google  Traffic • Waze:  Traffic  reporting  app Epidemiology • Flu  tracking  applications Translation   • Google  Translate
  • 36. Companies  Based  on  Crowdsourcing Quora is  a question-­‐and-­‐answer  site where  questions  are  asked,   answered,  edited  and  organized  by  its  community  of  users. Waze  is  a  community-­‐based  traffic  and  navigation  app  where  drivers   share  real-­‐time  traffic  and  road  info Kaggle is  a  platform  for predictive  modelling competitions  in  which   companies  post  data  and  data  miners  compete  to  produce  the  best  models. Stack  Overflow  is  a  platform  for  users  to  ask  and  answer  questions  and  to   vote  questions  and  answers  up  or  down  and  edit  them. Flickr is  an image  and  video  hosting website that  is  widely  used   by bloggers to  host  images  that  they  embed  in  social  media.
  • 37. The  Challenges  of  Crowdsourcing
  • 38. Reliability   • Retail: Absence  of  emotional  involvement  (judges  are  not  actually  spending  money  on  items) • Waze:  Locals  were  sending  fake  information  to  limit  traffic  in  their  area Relevance  of  knowledge • Retail:  Judges  might  not  have  appropriate  knowledge  of  the  items  they  are  evaluating Subjectivity • Search: Relevance  score  varies  depending  on  profile  and  personal  preferences Speed  &  cost • Human  evaluations  take  time,  can  only  be  performed  sporadically  and  on  samples • Not  practical  for  measurement  purposes The  Challenges  of  Crowdsourcing
  • 39. Reliability   • Retail: Absence  of  emotional  involvement  (judges  are  not  actually  spending  money  on  items) • Waze:  Locals  were  sending  fake  information  to  limit  traffic  in  their  area Relevance  of  knowledge • Retail:  Judges  might  not  have  appropriate  knowledge  of  the  items  they  are  evaluating Subjectivity • Search: Relevance  score  varies  depending  on  profile  and  personal  preferences Speed  &  cost • Human  evaluations  take  time,  can  only  be  performed  sporadically  and  on  samples • Not  practical  for  measurement  purposes The  Challenges  of  Crowdsourcing
  • 40. Reliability   • Retail: Absence  of  emotional  involvement  (judges  are  not  actually  spending  money  on  items) • Waze:  Locals  were  sending  fake  information  to  limit  traffic  in  their  area Relevance  of  knowledge • Retail:  Judges  might  not  have  appropriate  knowledge  of  the  items  they  are  evaluating Subjectivity • Search: Relevance  score  varies  depending  on  profile  and  personal  preferences Speed  &  cost • Human  evaluations  take  time,  can  only  be  performed  sporadically  and  on  samples • Not  practical  for  measurement  purposes The  Challenges  of  Crowdsourcing
  • 41. Reliability   • Retail: Absence  of  emotional  involvement  (judges  are  not  actually  spending  money  on  items) • Waze:  Locals  were  sending  fake  information  to  limit  traffic  in  their  area Relevance  of  knowledge • Retail:  Judges  might  not  have  appropriate  knowledge  of  the  items  they  are  evaluating Subjectivity • Search: Relevance  score  varies  depending  on  profile  and  personal  preferences Speed  &  cost • Human  evaluations  take  time,  can  only  be  performed  sporadically  and  on  samples • Not  practical  for  measurement  purposes The  Challenges  of  Crowdsourcing
  • 42. Crowdsourcing  vs.  Curated  Crowds Traditional  Crowdsourcing  Model $$$$$ + Speed:   • many  hands  generate  light  work + Lower  cost: • typically  a  few  pennies  per  task -­‐ No  quality  control -­‐ Lack  of  control:   • little  to  no  incentive  to  deliver  on  time -­‐ High  maintenance:   • clear  instructions  needed   • automated  understanding  checks -­‐ Lower  reliability:   • high  overlap  required -­‐ Lack  of  confidentiality:   • anyone  can  see  your  tasks Curated  Crowd $$$$$ + Quality  control:   • judges  submitted  to  quality  metrics   • removed  if  they  don’t  deliver  required  quality + Better  quality:   • very  little  overlap  needed + Expertise: • judges  become  experts  at  required  task + Constraints  on  crowd:   • judges  less  likely  to  drop  out -­‐ More  expensive: • typically  primary  source  of  income  for  judges -­‐ Consistency  required:   • need  frequent  tasks  to  keep  sharp  skills
  • 43. Catalog  Curation • Product  Description  Curation • Product  Tagging  & Categorization • Product  Deduplication • Taxonomy  Testing Search  Relevance  Evaluation • Relevance  score  (query-­‐item  pair  scores) • Engine  comparison  (ranking-­‐to-­‐ranking) Review  Moderation • Removal/flagging  of  obscene  reviews Mystery  Shopping • Analysis  and  discovery  of  new  trends   • Evaluation  of  new  products • Competitive  analysis Crowdsourcing  Applications  in  e-­‐Commerce
  • 44. Catalog  Curation • Product  Description  Curation • Product  Tagging  & Categorization • Product  Deduplication • Taxonomy  Testing Search  Relevance  Evaluation • Relevance  score  (query-­‐item  pair  scores) • Engine  comparison  (ranking-­‐to-­‐ranking) Review  Moderation • Removal/flagging  of  obscene  reviews Mystery  Shopping • Analysis  and  discovery  of  new  trends   • Evaluation  of  new  products • Competitive  analysis Crowdsourcing  Applications  in  e-­‐Commerce The  example  of  Product  Tagging
  • 45. Catalog  Curation • Product  Description  Curation • Product  Tagging  & Categorization • Product  Deduplication • Taxonomy  Testing Search  Relevance  Evaluation • Relevance  score  (query-­‐item  pair  scores) • Engine  comparison  (ranking-­‐to-­‐ranking) Review  Moderation • Removal/flagging  of  obscene  reviews Mystery  Shopping • Analysis  and  discovery  of  new  trends   • Evaluation  of  new  products • Competitive  analysis Crowdsourcing  Applications  in  e-­‐Commerce The  example  of  Product  Tagging
  • 46. Catalog  Curation • Product  Description  Curation • Product  Tagging  & Categorization • Product  Deduplication • Taxonomy  Testing Search  Relevance  Evaluation • Relevance  score  (query-­‐item  pair  scores) • Engine  comparison  (ranking-­‐to-­‐ranking) Review  Moderation • Removal/flagging  of  obscene  reviews Mystery  Shopping • Analysis  and  discovery  of  new  trends   • Evaluation  of  new  products • Competitive  analysis Crowdsourcing  Applications  in  e-­‐Commerce The  example  of  Product  Tagging
  • 47. Catalog  Curation • Product  Description  Curation • Product  Tagging  & Categorization • Product  Deduplication • Taxonomy  Testing Search  Relevance  Evaluation • Relevance  score  (query-­‐item  pair  scores) • Engine  comparison  (ranking-­‐to-­‐ranking) Review  Moderation • Removal/flagging  of  obscene  reviews Mystery  Shopping • Analysis  and  discovery  of  new  trends   • Evaluation  of  new  products • Competitive  analysis Crowdsourcing  Applications  in  e-­‐Commerce The  example  of  Product  Tagging
  • 48. Use  Case:  Evaluation  of  Search  Engine  Relevance à Human  evaluation  makes  it  possible  to   measure  the  intangible  with  little  risk Ranking  BRanking  A Side-­‐by-­‐Side  Engine  Comparison Judge  1: Prefers  ranking  A Judge  2: Prefers  ranking  A Judge  3: Prefers  ranking  B
  • 49. Use  Case:  Evaluation  of  Search  Engine  Relevance 5/5 5/5 5/5 4/5 3/5 2/5 5/5 5/5 5/5 5/5 5/5 5/5 Query-­‐Item  Relevance  Scoring  for   Measurement  of  Ranking  Quality 𝐷𝐶𝐺$ = & 𝑟𝑒𝑙* 𝑙𝑜𝑔-(𝑖 + 1) $ *34 𝑛𝐷𝐶𝐺$ = 𝐷𝐶𝐺$ 𝐼𝐷𝐶𝐺$ 𝐼𝐷𝐶𝐺$ = & 289:; − 1 𝑙𝑜𝑔-(𝑖 + 1) =>? *34 where graded  relevance  of item at  position i Discounted  cumulative  gain
  • 50. Human-­‐in-­‐the-­‐Loop: When  Human  Beings  still  Outperform  the  Machine Fact:   the  brain  has 38  petaflops (thousand  trillion  operations  per  second)   of  processing  power…
  • 51. The  Dream  of  Automation FIRST  REVOLUTION  – 1784 Mechanical  production,   railroad,  steam  power SECOND  REVOLUTION  – 1870 Mass  production,  electrical  power,   assembly  lines THIRD  REVOLUTION  – 1969 Automated  production,  electronics, computers FOURTH  REVOLUTION  – ongoing Artificial  intelligence,  big  data The  4  Industrial  Revolutions
  • 52. The  Dream  of  Automation FIRST  REVOLUTION  – 1784 Mechanical  production,   railroad,  steam  power SECOND  REVOLUTION  – 1870 Mass  production,  electrical  power,   assembly  lines THIRD  REVOLUTION  – 1969 Automated  production,  electronics, computers FOURTH  REVOLUTION  – ongoing Artificial  intelligence,  big  data à Automation  is  not  a  new  idea The  4  Industrial  Revolutions
  • 53. The  Dream  of  Automation FIRST  REVOLUTION  – 1784 Mechanical  production,   railroad,  steam  power SECOND  REVOLUTION  – 1870 Mass  production,  electrical  power,   assembly  lines THIRD  REVOLUTION  – 1969 Automated  production,  electronics, computers FOURTH  REVOLUTION  – ongoing Artificial  intelligence,  big  data à Automation  is  not  a  new  idea The  4  Industrial  Revolutions the  use  of  various control  systems for  operating   equipment  such  as  machinery  and  processes  with   minimal  or  reduced  human  intervention. Automation
  • 54. The  Dream  of  Automation the  use  of  various control  systems for  operating   equipment  such  as  machinery  and  processes  with   minimal  or  reduced  human  intervention. FIRST  REVOLUTION  – 1784 Mechanical  production,   railroad,  steam  power SECOND  REVOLUTION  – 1870 Mass  production,  electrical  power,   assembly  lines THIRD  REVOLUTION  – 1969 Automated  production,  electronics, computers FOURTH  REVOLUTION  – ongoing Artificial  intelligence,  big  data Why? • Automate  boring/repetitive  tasks • Perform  tasks  at  scale • Perform  tasks  with  enhanced  precision • Deliver  consistent products • Use  machines  where  they  outperform  humans à Automation  is  not  a  new  idea The  4  Industrial  Revolutions Automation
  • 55. When  Full  Automation  can’t  be  Achieved… Human-­‐in-­‐the-­‐Loop Human-­in-­the-­loop or HITL is  defined  as  a  model  or  a  system  that  requires  human  interaction
  • 56. The  idea  of  using  human  beings  to  enhance  the  machine  is  not  new We  have  been  doing  Human-­‐in-­‐the-­‐Loop  all  along… • Example:  Autopilot  technology  for  planes Human  intervention/presence  is  useful: • To  handle  corner  cases  (outlier  management) • To  “keep  an  eye”  on  the  system  (sanity  check) • To  correct  unwanted  behavior  (refinement) • To  validate  appropriate  behavior  (validation) When  Full  Automation  can’t  be  Achieved… Human-­in-­the-­loop or HITL is  defined  as  a  model  or  a  system  that  requires  human  interaction Human-­‐in-­‐the-­‐Loop
  • 57. The  idea  of  using  human  beings  to  enhance  the  machine  is  not  new We  have  been  doing  Human-­‐in-­‐the-­‐Loop  all  along… • Example:  Autopilot  technology  for  planes Human  intervention/presence  is  useful: • To  handle  corner  cases  (outlier  management) • To  “keep  an  eye”  on  the  system  (sanity  check) • To  correct  unwanted  behavior  (refinement) • To  validate  appropriate  behavior  (validation) When  Full  Automation  can’t  be  Achieved… Human-­in-­the-­loop or HITL is  defined  as  a  model  or  a  system  that  requires  human  interaction Human-­‐in-­‐the-­‐Loop
  • 58. Human-­‐in-­‐the-­‐Loop  Paradigm Pareto  Principle aka  the  80/20  rule,  the law  of  the  vital  few, or  the principle  of  factor  sparsity -­‐ states  that,  for  many  events,  roughly  80%  of  the  effects  come  from  20%  of  the  causes
  • 59. ML  version  of  the  Pareto  Principle:   • Evidence  suggests  that  some  of  the  most  accurate  ML  systems  to  date need:   • 80%  computer  AI-­‐driven   • 19%  human  input • 1  %  unknown  randomness   to  balance  things  out • The  combination  of  machine  and  human  intervention  achieves  maximum  machine  accuracy How  can  human  knowledge  be  incorporated  to  ML  models? A. Helping  label  the  original  dataset  that  will  be  fed  into  a  ML  model B. Helping  correct  inaccurate  predictions  that  arise  as  the  system  goes  live. Human-­‐in-­‐the-­‐Loop  Paradigm aka  the  80/20  rule,  the law  of  the  vital  few, or  the principle  of  factor  sparsity -­‐ states  that,  for  many  events,  roughly  80%  of  the  effects  come  from  20%  of  the  causes Pareto  Principle
  • 60. ML  version  of  the  Pareto  Principle:   • Evidence  suggests  that  some  of  the  most  accurate  ML  systems  to  date need:   • 80%  computer  AI-­‐driven   • 19%  human  input • 1  %  unknown  randomness   to  balance  things  out • The  combination  of  machine  and  human  intervention  achieves  maximum  machine  accuracy How  can  human  knowledge  be  incorporated  to  ML  models? A. Helping  label  the  original  dataset  that  will  be  fed  into  a  ML  model B. Helping  correct  inaccurate  predictions  that  arise  as  the  system  goes  live Human-­‐in-­‐the-­‐Loop  Paradigm aka  the  80/20  rule,  the law  of  the  vital  few, or  the principle  of  factor  sparsity -­‐ states  that,  for  many  events,  roughly  80%  of  the  effects  come  from  20%  of  the  causes Pareto  Principle
  • 61. Human-­‐In-­‐The-­‐Loop  Use  Case  #1 An  example  of  HITL  approach:  face  recognition
  • 62. Human-­‐In-­‐The-­‐Loop  Use  Case  #1 Mary Roberto Victoria LauraSebastian Cecelia An  example  of  HITL  approach:  face  recognition
  • 63. Human-­‐In-­‐The-­‐Loop  Use  Case  #1 Mary Roberto Victoria LauraSebastian Cecelia Accuracy • Facebook's  DeepFace Software  reaches  97.25%  of  accuracy HITL  as  a  feedback  loop • When  the  confidence  is  below  a  certain  threshold,  it: • suggests  a  label • ask  the  uploader  to  validate/approve  or  correct  the   suggestion • The  new  data  is  used  to  improve  the  accuracy  of  the   algorithm An  example  of  HITL  approach:  face  recognition
  • 64. Human-­‐In-­‐The-­‐Loop  Use  Case  #1 Mary Roberto Victoria LauraSebastian Cecelia Accuracy • Facebook's  DeepFace Software  reaches  97.25%  of  accuracy HITL  as  a  feedback  loop • When  the  confidence  is  below  a  certain  threshold,  it: • suggests a  label • ask  the  uploader  to  validate/approve  or  correct  the   suggestion • The  new  data  is  used  to  improve  the  accuracy  of  the   algorithm An  example  of  HITL  approach:  face  recognition
  • 65. Human-­‐In-­‐The-­‐Loop  Use  Case  #2 An  example  of  HITL  approach:  autonomous  vehicles
  • 66. Teaching  the  machine • Driving  systems  were  trained  using  a  human  to  oversee  the  process Accuracy  considerations • Autopilot  system  is  now  over  99%  accurate • However,  a  99%  accuracy  means  that  people  can  die  1%  of  the  time  (!!) • Though  we  have  seen  huge  advances  in  accuracy  of  pure  machine-­‐ driven  systems,  they  tend  to fall  short  of  acceptable accuracy  rates Human-­‐In-­‐The-­‐Loop  Use  Case  #2 An  example  of  HITL  approach:  autonomous  vehicles
  • 67. Teaching  the  machine • Driving  systems  were  trained  using  a  human  to  oversee  the  process Accuracy  considerations • Autopilot  system  is  now  over  99%  accurate • However,  a  99%  accuracy  means  that  people  can  die  1%  of  the  time  (!!) • Though  we  have  seen  huge  advances  in  accuracy  of  pure  machine-­‐ driven  systems,  they  tend  to fall  short  of  acceptable accuracy  rates Human-­‐In-­‐The-­‐Loop  Use  Case  #2 An  example  of  HITL  approach:  autonomous  vehicles
  • 68. Teaching  the  machine • Driving  systems  were  trained  using  a  human  to  oversee  the  process Accuracy  considerations • Autopilot  system  is  now  over  99%  accurate • However,  a  99%  accuracy  means  that  people  can  die  1%  of  the  time  (!!) • Though  we  have  seen  huge  advances  in  accuracy  of  pure  machine-­‐ driven  systems,  they  tend  to fall  short  of  acceptable accuracy  rates Human-­‐In-­‐The-­‐Loop  Use  Case  #2 An  example  of  HITL  approach:  autonomous  vehicles Corner  cases • Fun  fact: Volvo’s  self-­‐driving  cars  fail  in  Australia  because  of  kangaroos • Reaching  100%  is  hard  because  of  corner  cases • A  HITL  approach  helps  get  the  accuracy  to  ~100% • get  the  accuracy  to  ~100% Volvo's  driverless  cars   'confused'  by  kangaroos
  • 69. The  Success  of  Human-­‐In-­‐The-­‐Loop The  Example  of  Chess
  • 70. The  Human  vs.  the  Machine • In  1997,  Chess  Master  Garry  Kasparov  is  beaten  by  IBM  supercomputer  Deep  Blue The  Success  of  Human-­‐In-­‐The-­‐Loop The  Example  of  Chess Garry  Kasparov
  • 71. The  Human  vs.  the  Machine • In  1997,  Chess  Master  Garry  Kasparov  is  beaten  by  IBM  supercomputer  Deep  Blue The  Success  of  Human-­‐In-­‐The-­‐Loop The  Example  of  Chess Freestyle  or  “Advanced”  Chess • Advanced:  A  human  chess  master  works  with  a  computer  to  find  the  best  possible  move   • Freestyle:  A  team  can  be  made  of  any  combination  of  human  beings  +  computers • In  2005,  Steven  Cramton,  Zackary  Stephen  and  their  3  computers  win  Freestyle  Chess  Tournament Why  it  works • Computers  are  great  at  reading  tough  tactical  situations • But  humans  are  better  at  understanding  long  term  strategy • Computers  to  limit  “blunders”  while  using  their  intuition  to  force  the  opponent  into  board  states  that   confuses  the  computer(s) Garry  Kasparov
  • 72. The  Human  vs.  the  Machine • In  1997,  Chess  Master  Garry  Kasparov  is  beaten  by  IBM  supercomputer  Deep  Blue The  Success  of  Human-­‐In-­‐The-­‐Loop The  Example  of  Chess Freestyle  or  “Advanced”  Chess • Advanced:  A  human  chess  master  works  with  a  computer  to  find  the  best  possible  move   • Freestyle:  A  team  can  be  made  of  any  combination  of  human  beings  +  computers • In  2005,  Steven  Cramton,  Zackary  Stephen  and  their  3  computers  win  Freestyle  Chess  Tournament Why  it  works • Computers  are  great  at  reading  tough  tactical  situations • But  humans  are  better  at  understanding  long  term  strategy • Computers  to  limit  “blunders”  while  using  their  intuition  to  force  the  opponent  into  board  states  that   confuses  the  computer(s) Garry  Kasparov
  • 73. Active  Learning: The  Best  of  Both  Worlds
  • 74. Active  Learning a special case of semi-­‐supervised ML in which a learning algorithm can interactively query the user (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance Active  Learning
  • 75. Active  Learning a special case of semi-­‐supervised ML in which a learning algorithm can interactively query the user (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance General  Strategy If  D  is the  entire  data  set,  a each  iteration i , D is  broken  up  into  three  subsets 1. DK,  i :  data  points  where  the  label  is known 2. DU,  i :  data  points  where  the  label  is unknown 3. DQ,  i :  data  points for  which  the  label  is  queried  (sometimes,  even  when  the  label  is  known) Benefits • Query  labels  only  when  necessary  (lower  cost) Next  Generation  Algorithms • Proactive  learning:   • relaxes  the  assumption  that  the  oracle  is  always  right • casts  the  problem  as  an  optimization  problem w/  a budget  constraint Active  Learning
  • 76. Active  Learning a special case of semi-­‐supervised ML in which a learning algorithm can interactively query the user (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance General  Strategy If  D  is the  entire  data  set,  a each  iteration i , D is  broken  up  into  three  subsets 1. DK,  i :  data  points  where  the  label  is known 2. DU,  i :  data  points  where  the  label  is unknown 3. DQ,  i :  data  points for  which  the  label  is  queried  (sometimes,  even  when  the  label  is  known) Benefits • Query  labels  only  when  necessary  (lower  cost) Next  Generation  Algorithms • Proactive  learning:   • relaxes  the  assumption  that  the  oracle  is  always  right • casts  the  problem  as  an  optimization  problem w/  a budget  constraint Active  Learning
  • 77. Active  Learning a special case of semi-­‐supervised ML in which a learning algorithm can interactively query the user (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance General  Strategy If  D  is the  entire  data  set,  a each  iteration i , D is  broken  up  into  three  subsets 1. DK,  i :  data  points  where  the  label  is known 2. DU,  i :  data  points  where  the  label  is unknown 3. DQ,  i :  data  points for  which  the  label  is  queried  (sometimes,  even  when  the  label  is  known) Benefits • Query  labels  only  when  necessary  (lower  cost) Next  Generation  Algorithms • Proactive  learning:   • relaxes  the  assumption  that  the  oracle  is  always  right • casts  the  problem  as  an  optimization  problem w/  a budget  constraint Active  Learning
  • 78. Active  Learning:  How  does  it  Work?
  • 79. Active  Learning:  How  does  it  Work? Machine  Learning  needs   • Logics  (algorithm) • Data   • Optimization • Feedback  ß Human-­‐in-­‐the-­‐Loop Active  Learning  =  a  Machine  Learning  Algorithm  using   an  “oracle”  to  reduce  mistakes/uncertainty Query  Strategy  -­‐ Labels  are  queried  when: • Data  points  for  which  model  uncertainty  is  high   (uncertainty  sampling) • Data  points  for  which  the  different  models  of  an   ensemble  method  disagree  the  most   (query  by  committee) • Data  points  causing  the  most  changes  on  the  model (expected  model  change) • Data  points  caused  overall  variance  to  be  high (variance  reduction)
  • 80. Active  Learning:  How  does  it  Work? Unlabeled  Data Active   Learning   Algorithm select/remove   single  example Labeled  Data Classifier Oracle (Human) update add  labeled   example provide   correct  label Machine  Learning  needs   • Logics  (algorithm) • Data   • Optimization • Feedback  ß Human-­‐in-­‐the-­‐Loop Active  Learning  =  a  Machine  Learning  Algorithm  using   an  “oracle”  to  reduce  mistakes/uncertainty Query  Strategy  -­‐ Labels  are  queried  when: • Data  points  for  which  model  uncertainty  is  high   (uncertainty  sampling) • Data  points  for  which  the  different  models  of  an   ensemble  method  disagree  the  most   (query  by  committee) • Data  points  causing  the  most  changes  on  the  model (expected  model  change) • Data  points  caused  overall  variance  to  be  high (variance  reduction)
  • 81. Active  Learning:  How  does  it  Work? Unlabeled  Data Active   Learning   Algorithm select/remove   single  example Labeled  Data Classifier Oracle (Human) update add  labeled   example provide   correct  label Machine  Learning  needs   • Logics  (algorithm) • Data   • Optimization • Feedback  ß Human-­‐in-­‐the-­‐Loop Active  Learning  =  a  Machine  Learning  Algorithm  using   an  “oracle”  to  reduce  mistakes/uncertainty Query  Strategy  -­‐ Labels  are  queried  when: • Data  points  for  which  model  uncertainty  is  high   (uncertainty  sampling) • Data  points  for  which  the  different  models  of  an   ensemble  method  disagree  the  most   (query  by  committee) • Data  points  causing  the  most  changes  on  the  model (expected  model  change) • Data  points  caused  overall  variance  to  be  high (variance  reduction)
  • 82. Active  Learning:  How  does  it  Work? Machine  Learning Classifier Confidence   level  high? YES NO Output Annotation  by   Human  Oracle Human-­‐in-­‐the-­‐Loop Active  Learning By  adding  a  human  feedback  loop,  we  allow  the  system  to:   • actively  learn • correct  itself  where  it  got  it  wrong • improve  the  algorithm  over  iterations
  • 83. Active  Learning:  How  does  it  Work? Machine  Learning Classifier Confidence   level  high? YES NO Output Annotation  by   Human  Oracle Human-­‐in-­‐the-­‐Loop Active  Learning By  adding  a  human  feedback  loop,  we  allow  the  system  to:   • actively  learn • correct  itself  where  it  got  it  wrong • improve  the  algorithm  over  iterations
  • 84. 3  Use  Cases  using  Active  Learning  in  the  context  of  Search/Retail Active  Learning  at  Walmart  e-­‐Commerce
  • 85. q Machine  Learning  Lifecycle  Management  (Programming  by  Feedback) • Automatic  monitoring  of  input  and  output  values  for  ML  algorithm • An  algorithm  detects  failings  and  outliers  in  real-­‐time  and  suggest  an  action • A  human  validates  the  action,  creating  tagged  data  for  full  automation q Diagnosis  of  Catalog  Data  Issues  (Reinforcement  Learning) • Algorithm  uncovers  demoted  items  and  suggests  most  likely  reason  for  the  demotion • Engineer  manually  confirms/corrects  the  suggestion,  generating  training  data  for  full  automation q Refinement  of  Query  Tagging  Algorithm  (Optimization) • Human  evaluation  team  manually  measures  accuracy  of  query  tagging  model • Mistagged  queries  are  used  to  discover  patterns  specific  to  problematic  queries,  which  are  reported  to  engineers • Sample  is  enriched  with  problematic  queries  (evaluation  team  can  diagnose  problems  with  algorithms) 3  Use  Cases  using  Active  Learning  in  the  context  of  Search/Retail Active  Learning  at  Walmart  e-­‐Commerce
  • 86. q Machine  Learning  Lifecycle  Management  (Programming  by  Feedback) • Automatic  monitoring  of  input  and  output  values  for  ML  algorithm • An  algorithm  detects  failings  and  outliers  in  real-­‐time  and  suggest  an  action • A  human  validates  the  action,  creating  tagged  data  for  full  automation q Diagnosis  of  Catalog  Data  Issues  (Reinforcement  Learning) • Algorithm  uncovers  demoted  items  and  suggests  most  likely  reason  for  the  demotion • Engineer  manually  confirms/corrects  the  suggestion,  generating  training  data  for  full  automation q Refinement  of  Query  Tagging  Algorithm  (Optimization) • Human  evaluation  team  manually  measures  accuracy  of  query  tagging  model • Mistagged  queries  are  used  to  discover  patterns  specific  to  problematic  queries,  which  are  reported  to  engineers • Sample  is  enriched  with  problematic  queries  (evaluation  team  can  diagnose  problems  with  algorithms) 3  Use  Cases  using  Active  Learning  in  the  context  of  Search/Retail Active  Learning  at  Walmart  e-­‐Commerce
  • 87. q Machine  Learning  Lifecycle  Management  (Programming  by  Feedback) • Automatic  monitoring  of  input  and  output  values  for  ML  algorithm • An  algorithm  detects  failings  and  outliers  in  real-­‐time  and  suggest  an  action • A  human  validates  the  action,  creating  tagged  data  for  full  automation q Diagnosis  of  Catalog  Data  Issues  (Reinforcement  Learning) • Algorithm  uncovers  demoted  items  and  suggests  most  likely  reason  for  the  demotion • Engineer  manually  confirms/corrects  the  suggestion,  generating  training  data  for  full  automation q Refinement  of  Query  Tagging  Algorithm  (Optimization) • Human  evaluation  team  manually  measures  accuracy  of  query  tagging  model • Mistagged  queries  are  used  to  discover  patterns  specific  to  problematic  queries,  which  are  reported  to  engineers • Sample  is  enriched  with  problematic  queries  (evaluation  team  can  diagnose  problems  with  algorithms) 3  Use  Cases  using  Active  Learning  in  the  context  of  Search/Retail red t-shirt Size M color product  type size Active  Learning  at  Walmart  e-­‐Commerce
  • 88. • Why  do  humans  and  machine  complement  each  other? • Human  beings  are  memory-­‐constrained • Computers  are  knowledge-­‐constrained • Tagged  data  more  important  than  ever • But  getting  quality  data  is  challenging  given  the  volume  of  data • Crowdsourcing  offer  more  flexibility  to  tag  data  at  scale • Human-­‐in-­‐the-­‐Loop  paradigm • Improve  accuracy  of  machine  learning  algorithm  (classifiers) • Many  examples  of  successful  endeavors  using  “Augmented  Intelligence” • Active  Learning  is  a  booming  area  of  ML/AI Conclusion  and  Takeaways
  • 89. • Why  do  humans  and  machine  complement  each  other? • Human  beings  are  memory-­‐constrained • Computers  are  knowledge-­‐constrained • Tagged  data  more  important  than  ever • But  getting  quality  data  is  challenging  given  the  volume  of  data • Crowdsourcing  offer  more  flexibility  to  tag  data  at  scale • Human-­‐in-­‐the-­‐Loop  paradigm • Improve  accuracy  of  machine  learning  algorithm  (classifiers) • Many  examples  of  successful  endeavors  using  “Augmented  Intelligence” • Active  Learning  is  a  booming  area  of  ML/AI Conclusion  and  Takeaways
  • 90. • Why  do  humans  and  machine  complement  each  other? • Human  beings  are  memory-­‐constrained • Computers  are  knowledge-­‐constrained • Tagged  data  more  important  than  ever • But  getting  quality  data  is  challenging  given  the  volume  of  data • Crowdsourcing  offer  more  flexibility  to  tag  data  at  scale • Human-­‐in-­‐the-­‐Loop  paradigm • Improve  accuracy  of  machine  learning  algorithm  (classifiers) • Many  examples  of  successful  endeavors  using  “Augmented  Intelligence” • Active  Learning  is  a  booming  area  of  ML/AI Conclusion  and  Takeaways