SlideShare a Scribd company logo
1 of 31
Getting Value Out of Chat
Data
WHAT TO DO WHEN YOUR DATA IS NOISY, SPARSE, AND SHORT
0
Introduction
 Contact: daniel@talla.com
1
Talla
 NLP for internal business use cases
 Smart knowledge management
 Hiring!
2
What is “Chat data?”
USER2: USER3 do you have new new cal on your Talla account already? Looks like it’s not available for me
yet. Would be nice if we could also get inbox support enabled since it’s so much better than gmail. cc
USER1
USER3: USER2 I realized that after I typed this that I was using my personal gmail when I updated to the
new changes. I looked on Talla and I didn’t see the same option to update to new calendar yet.
USER4: USER2 I just enabled Inbox for our domain
USER4: new calendar is set to letting google decide when to roll it out, but it looks like we can also enable
it as an option now
USER4: I've now set that to be available as well. These may take some time to show up
USER1: USER2 its been enabled for awhile.
USER1: (inbox)
USER1: and the new calendar is enabled, soon as google decides you are allowed to have it.
USER2: Thanks USER1 USER4
3
Things similar to chat data
 Sequential interactions
 Forum posts
 Some email
 IT ticketing system interactions
 Short text
 Associated with a user
 Possibly directed at another user
 Highly context dependent
4
Problems with chat
Increasing number of data sources
In theory contains lots of valuable
information
In practice data is unlabeled
“Water, water, everywhere, but not a
drop to drink.”
5
Goal: Issue detection and matching
 People get help through chat platforms
 Extract that data and automate the process
 USER1’s interaction should help USER3!
USER1: Hi, does anyone know if we have patriot’s day off?
USER2: Yeah USER1, we do.
USER1: Thanks!
…
USER3: Hey, do we get patriot’s day off?
6
Automating knowledge delivery
 Find issues or questions that people have
 Match new issues to pre-existing ones
 Serve the appropriate response or answer
 Extracting answers is very hard
 Focus on matching and search
7
Overview
 Jumpstart ML: Active Learning
 Topic modeling
 Dimensionality Reduction and Representations
8
Find questions and analyze
 Use patterns to find questions
Has ‘?’ token
Has a question word
 Not too hard
 Good start for finding past issues
9
Problems with extracted questions
 Most questions need context to understand. e.g.:
“What is it?”
”Can I use her personal email?”
 Intent varies:
Want information
Do this thing for me
Huh?
10
Only some questions make sense out of
context
 “Who is she?” “What is that?” “Will that fix my
computer?”
 Anaphora—it, that
 Pronouns—He, she, etc
 “What day is it?”, “Where am I?”
 Answer depends on time, person asking
 Requires more involved data model
11
Questions have different intents
 “Performative” – Please help me? ex:
 hi can you please help me reset my 2 factor authentication
on salesforce?
 “Informational” – What is it?
 what's the pl code?
 “Navigational” – How do I do this?
 how do i record a vidyo meeting?
12
Can we write special case rules?
 Borderline cases
 is there a way to find out the size of an hbase table? – User asks
“Is there (a way…)” to get directions
 can anyone tell me where i find the out of stock request report? –
User asks someone to give them information
 Many variants
 Alternative is to label data and use supervised learning
13
We want to label data, but…
Managing crowdworkers:
Expensive
Time consuming
Can’t be used unless data is safely anonymous
Will the model work afterwards?
14
Active Learning makes labeling more efficient
 More value for your time
 Can use with crowd workers or without
 Good for chat:
Models train fast
Quick to annotate
 Supervised learning with little labeled data
Annotate
Train/Predict Get data
15
How it works (roughly)
 Annotate 𝐷0 ∈ 𝐷
 Train your model on 𝐷0
 Predict labels on remaining data (𝐷 − 𝐷0)
 Choose more data, 𝐷1 ∈ 𝐷 − 𝐷0,
 Choice of 𝐷1 is based on label predictions
 Repeat
 ???
 Profit!
Annotate
Train/Predict Get data
16
Where we are
 Jumpstart ML: Active Learning
 Topic modeling
 Dimensionality Reduction and Representations
17
More to data than questions or intent
 What do people talk about?
 What kind of issues are common?
 Are there clear lines defining topics?
 Finding problem areas
 Strategic thinking about what to tackle
18
Know Your Data
Read some of it (if you can)
Learn the context
Cluster and overview
19
Clustering or modeling chat topics
 LDA, LSA, NMF, others
 Human supervision necessary for interpretation
(boo!)
 Messages short, so chat is hard
 Larger documents have broader topic distributions
 We expect messages to be about fewer topics
20
Using LDA with Chat
𝜶 =. 𝟓 𝜶 =. 𝟏 𝜶 =. 𝟎𝟓 𝜶 = . 𝟎𝟑
know; does; link database; jermaine; running file; area; bank free; jermaine; database
did; try; work online; palace; sorry mean; try; screen user; hi; email
send; test; agent try; user; free did; ok; want client; server; user
look; able; mean user; client; error error; server; user ok; did; update
online; help; screen mean; app; does whats; agent; end mean; user; file
hi; palace; property shall; working; process client; property; user online; user; change
email; error; just emails; kelly; time online; user; update mandy; wrong; chance
user; issue; want did; ok; property palace; live; test owner; end; invoice
client; need; check ticket; whats; right run; right; check want; error; agent
owner; report; password check; chloe; duncan emails; know; link live; palace; try
21
Where we are
 Jumpstart ML: Active Learning
 Topic modeling
 Dimensionality Reduction and Representations
22
Why do dimensionality reduction?
 We want to improve our supervised learning techniques
 Chat data is even more sparse than many NL datasets
 Good representations can help search and similarity
models
 Off the shelf representations are good
 Off the shelf + custom representations are better
23
Setting up methods for learning
 Word2vec, NMF, even LDA
 Most methods equivalent*
 Chat has no clear document barriers
 Methods assume either continuous context or separate
documents
 Using messages as contexts  too sparse
24
Choosing a context
 Representations are influenced by context choice
 Figure out your goal
 Choose context where words are associated in a way
helpful for your goal
 For our purposes: Words should be similar if they occur
together in issues people have
25
Using a time-based context window
 Window before each question
 Problem statement and questions should be related
USER2: Can I email this form, or do I have to print it out?
USER1: You need to drop the form off in person
USER2: OK, sure.
USER1: Great.
USER2: Where can I get access to the printers?
…
26
Keywords are extracted from recent history
USER2: Can I email this form, or do I have to print it out?
USER1: You need to drop the form off in person
USER2: OK, sure.
USER1: Great.
USER2: Where can I get access to the printers?
…
27
Similarity from resulting representations
 ‘printer’
 ['printer', 'choice', 'fuji', 'xerox', 'settings', 'sequence', 'default', 'rollover', 'driver',
'takes', 'smaller', 'main', ]
 ‘issue’
 ['issue', 'resolved', 'helping', 'experiencing', 'companies', 'related', 'assuming',
'reported', 'double', 'site', 'saw', 'causing', 'understand', 'sorted', 'logging', 'heard’]
 ‘ssh’
 ['ssh', 'config', 'dhcp, 'ping', 'reconnect', 'jpg’, 'webconsole', 'coats', 'lab’,
'browsers', 'instances', 'bypass’]
28
Final Thoughts...
 Tip of the iceberg
 Understand how people interact
 What information can we extract?
 Can we escape our corpus?
29
Thank you everyone!
 thanks
['heaps', 'great', 'perfect', 'fantastic',]
30

More Related Content

Viewers also liked

Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...MLconf
 
Ashfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataAshfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataMLconf
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf
 
LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...
LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...
LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...MLconf
 
Jonas Schneider, Head of Engineering for Robotics, OpenAI
Jonas Schneider, Head of Engineering for Robotics, OpenAIJonas Schneider, Head of Engineering for Robotics, OpenAI
Jonas Schneider, Head of Engineering for Robotics, OpenAIMLconf
 
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017MLconf
 
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017MLconf
 
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017MLconf
 
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017MLconf
 
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...MLconf
 
Talha Obaid, Email Security, Symantec at MLconf ATL 2017
Talha Obaid, Email Security, Symantec at MLconf ATL 2017Talha Obaid, Email Security, Symantec at MLconf ATL 2017
Talha Obaid, Email Security, Symantec at MLconf ATL 2017MLconf
 
Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...
Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...
Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...MLconf
 
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017MLconf
 
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017MLconf
 
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017MLconf
 
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017MLconf
 

Viewers also liked (16)

Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
Anima Anadkumar, Principal Scientist, Amazon Web Services, Endowed Professor,...
 
Ashfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataAshfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, Pepperdata
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
 
LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...
LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...
LN Renganarayana, Architect, ML Platform and Services and Madhura Dudhgaonkar...
 
Jonas Schneider, Head of Engineering for Robotics, OpenAI
Jonas Schneider, Head of Engineering for Robotics, OpenAIJonas Schneider, Head of Engineering for Robotics, OpenAI
Jonas Schneider, Head of Engineering for Robotics, OpenAI
 
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
 
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
Dr. Steve Liu, Chief Scientist, Tinder at MLconf SF 2017
 
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
Doug Eck, Research Scientist, Google Magenta, at MLconf SF 2017
 
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
Michael Alcorn, Sr. Software Engineer, Red Hat Inc. at MLconf SF 2017
 
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
Dr. June Andrews, Principal Data Scientist, Wise.io, From GE Digital at MLcon...
 
Talha Obaid, Email Security, Symantec at MLconf ATL 2017
Talha Obaid, Email Security, Symantec at MLconf ATL 2017Talha Obaid, Email Security, Symantec at MLconf ATL 2017
Talha Obaid, Email Security, Symantec at MLconf ATL 2017
 
Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...
Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...
Jessica Rudd, PhD Student, Analytics and Data Science, Kennesaw State Univers...
 
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
 
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
 
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017
Ryan West, Machine Learning Engineer, Nexosis at MLconf ATL 2017
 
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
Ashrith Barthur, Security Scientist, H2o.ai, at MLconf 2017
 

Similar to Daniel Shank, Data Scientist, Talla at MLconf SF 2017

Soft-performance: Messages - ISTA 2014
Soft-performance: Messages - ISTA 2014Soft-performance: Messages - ISTA 2014
Soft-performance: Messages - ISTA 2014Dimiter Simov
 
Four Short Foibles of Organizational Data
Four Short Foibles of Organizational DataFour Short Foibles of Organizational Data
Four Short Foibles of Organizational DataLars von Sneidern
 
UC Irvine WICS workshop feb 2017
UC Irvine WICS workshop feb 2017UC Irvine WICS workshop feb 2017
UC Irvine WICS workshop feb 2017Aliza Carpio
 
500-Level Guide to Career Internals
500-Level Guide to Career Internals500-Level Guide to Career Internals
500-Level Guide to Career InternalsBrent Ozar
 
User Experience as an Organizational Development Tool
User Experience as an Organizational Development ToolUser Experience as an Organizational Development Tool
User Experience as an Organizational Development ToolDonovan Chandler
 
Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLPGVS Chaitanya
 
Webinar - Design thinking 101 - 2018-07-24
Webinar - Design thinking 101 - 2018-07-24Webinar - Design thinking 101 - 2018-07-24
Webinar - Design thinking 101 - 2018-07-24TechSoup
 
SAD01 - An Introduction to Systems Analysis and Design
SAD01 - An Introduction to Systems Analysis and DesignSAD01 - An Introduction to Systems Analysis and Design
SAD01 - An Introduction to Systems Analysis and DesignMichael Heron
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera, Inc.
 
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHubSOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHubDevOpsDays Tel Aviv
 
LeadTeam Start-up : step 6 Value Curve and Landing page
LeadTeam Start-up : step 6 Value Curve and Landing pageLeadTeam Start-up : step 6 Value Curve and Landing page
LeadTeam Start-up : step 6 Value Curve and Landing pageClémence Lagaüzère
 
Web 2.0 for Special Libraries
Web 2.0 for Special LibrariesWeb 2.0 for Special Libraries
Web 2.0 for Special LibrariesMary Schaff
 
Become Your Own Business Analyst, Gather Requirements for Any Project
Become Your Own Business Analyst, Gather Requirements for Any ProjectBecome Your Own Business Analyst, Gather Requirements for Any Project
Become Your Own Business Analyst, Gather Requirements for Any ProjectCathy Dew
 
Clare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science OnlineClare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science Onlinesfdatascience
 
EuroIA 2015 On Messages
EuroIA 2015 On MessagesEuroIA 2015 On Messages
EuroIA 2015 On MessagesDimiter Simov
 
Mimi Yin: Getting Things Done: Technology and Practice
Mimi Yin: Getting Things Done: Technology and PracticeMimi Yin: Getting Things Done: Technology and Practice
Mimi Yin: Getting Things Done: Technology and PracticeSteve Williams
 
Blockchain Experience Design Meetup #1
Blockchain Experience Design Meetup #1Blockchain Experience Design Meetup #1
Blockchain Experience Design Meetup #1Gendry Morales
 

Similar to Daniel Shank, Data Scientist, Talla at MLconf SF 2017 (20)

Soft-performance: Messages - ISTA 2014
Soft-performance: Messages - ISTA 2014Soft-performance: Messages - ISTA 2014
Soft-performance: Messages - ISTA 2014
 
Four Short Foibles of Organizational Data
Four Short Foibles of Organizational DataFour Short Foibles of Organizational Data
Four Short Foibles of Organizational Data
 
Scailing CX Playbook - Chattermill
Scailing CX Playbook - ChattermillScailing CX Playbook - Chattermill
Scailing CX Playbook - Chattermill
 
UC Irvine WICS workshop feb 2017
UC Irvine WICS workshop feb 2017UC Irvine WICS workshop feb 2017
UC Irvine WICS workshop feb 2017
 
500-Level Guide to Career Internals
500-Level Guide to Career Internals500-Level Guide to Career Internals
500-Level Guide to Career Internals
 
User Experience as an Organizational Development Tool
User Experience as an Organizational Development ToolUser Experience as an Organizational Development Tool
User Experience as an Organizational Development Tool
 
Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLP
 
Webinar - Design thinking 101 - 2018-07-24
Webinar - Design thinking 101 - 2018-07-24Webinar - Design thinking 101 - 2018-07-24
Webinar - Design thinking 101 - 2018-07-24
 
SAD01 - An Introduction to Systems Analysis and Design
SAD01 - An Introduction to Systems Analysis and DesignSAD01 - An Introduction to Systems Analysis and Design
SAD01 - An Introduction to Systems Analysis and Design
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
 
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHubSOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub
 
LeadTeam Start-up : step 6 Value Curve and Landing page
LeadTeam Start-up : step 6 Value Curve and Landing pageLeadTeam Start-up : step 6 Value Curve and Landing page
LeadTeam Start-up : step 6 Value Curve and Landing page
 
Web 2.0 for Special Libraries
Web 2.0 for Special LibrariesWeb 2.0 for Special Libraries
Web 2.0 for Special Libraries
 
Become Your Own Business Analyst, Gather Requirements for Any Project
Become Your Own Business Analyst, Gather Requirements for Any ProjectBecome Your Own Business Analyst, Gather Requirements for Any Project
Become Your Own Business Analyst, Gather Requirements for Any Project
 
Clare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science OnlineClare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science Online
 
EuroIA 2015 On Messages
EuroIA 2015 On MessagesEuroIA 2015 On Messages
EuroIA 2015 On Messages
 
Mimi Yin: Getting Things Done: Technology and Practice
Mimi Yin: Getting Things Done: Technology and PracticeMimi Yin: Getting Things Done: Technology and Practice
Mimi Yin: Getting Things Done: Technology and Practice
 
Blockchain Experience Design Meetup #1
Blockchain Experience Design Meetup #1Blockchain Experience Design Meetup #1
Blockchain Experience Design Meetup #1
 
User Stories
User StoriesUser Stories
User Stories
 
User Stories
User StoriesUser Stories
User Stories
 

More from MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Recently uploaded

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Daniel Shank, Data Scientist, Talla at MLconf SF 2017

  • 1. Getting Value Out of Chat Data WHAT TO DO WHEN YOUR DATA IS NOISY, SPARSE, AND SHORT 0
  • 3. Talla  NLP for internal business use cases  Smart knowledge management  Hiring! 2
  • 4. What is “Chat data?” USER2: USER3 do you have new new cal on your Talla account already? Looks like it’s not available for me yet. Would be nice if we could also get inbox support enabled since it’s so much better than gmail. cc USER1 USER3: USER2 I realized that after I typed this that I was using my personal gmail when I updated to the new changes. I looked on Talla and I didn’t see the same option to update to new calendar yet. USER4: USER2 I just enabled Inbox for our domain USER4: new calendar is set to letting google decide when to roll it out, but it looks like we can also enable it as an option now USER4: I've now set that to be available as well. These may take some time to show up USER1: USER2 its been enabled for awhile. USER1: (inbox) USER1: and the new calendar is enabled, soon as google decides you are allowed to have it. USER2: Thanks USER1 USER4 3
  • 5. Things similar to chat data  Sequential interactions  Forum posts  Some email  IT ticketing system interactions  Short text  Associated with a user  Possibly directed at another user  Highly context dependent 4
  • 6. Problems with chat Increasing number of data sources In theory contains lots of valuable information In practice data is unlabeled “Water, water, everywhere, but not a drop to drink.” 5
  • 7. Goal: Issue detection and matching  People get help through chat platforms  Extract that data and automate the process  USER1’s interaction should help USER3! USER1: Hi, does anyone know if we have patriot’s day off? USER2: Yeah USER1, we do. USER1: Thanks! … USER3: Hey, do we get patriot’s day off? 6
  • 8. Automating knowledge delivery  Find issues or questions that people have  Match new issues to pre-existing ones  Serve the appropriate response or answer  Extracting answers is very hard  Focus on matching and search 7
  • 9. Overview  Jumpstart ML: Active Learning  Topic modeling  Dimensionality Reduction and Representations 8
  • 10. Find questions and analyze  Use patterns to find questions Has ‘?’ token Has a question word  Not too hard  Good start for finding past issues 9
  • 11. Problems with extracted questions  Most questions need context to understand. e.g.: “What is it?” ”Can I use her personal email?”  Intent varies: Want information Do this thing for me Huh? 10
  • 12. Only some questions make sense out of context  “Who is she?” “What is that?” “Will that fix my computer?”  Anaphora—it, that  Pronouns—He, she, etc  “What day is it?”, “Where am I?”  Answer depends on time, person asking  Requires more involved data model 11
  • 13. Questions have different intents  “Performative” – Please help me? ex:  hi can you please help me reset my 2 factor authentication on salesforce?  “Informational” – What is it?  what's the pl code?  “Navigational” – How do I do this?  how do i record a vidyo meeting? 12
  • 14. Can we write special case rules?  Borderline cases  is there a way to find out the size of an hbase table? – User asks “Is there (a way…)” to get directions  can anyone tell me where i find the out of stock request report? – User asks someone to give them information  Many variants  Alternative is to label data and use supervised learning 13
  • 15. We want to label data, but… Managing crowdworkers: Expensive Time consuming Can’t be used unless data is safely anonymous Will the model work afterwards? 14
  • 16. Active Learning makes labeling more efficient  More value for your time  Can use with crowd workers or without  Good for chat: Models train fast Quick to annotate  Supervised learning with little labeled data Annotate Train/Predict Get data 15
  • 17. How it works (roughly)  Annotate 𝐷0 ∈ 𝐷  Train your model on 𝐷0  Predict labels on remaining data (𝐷 − 𝐷0)  Choose more data, 𝐷1 ∈ 𝐷 − 𝐷0,  Choice of 𝐷1 is based on label predictions  Repeat  ???  Profit! Annotate Train/Predict Get data 16
  • 18. Where we are  Jumpstart ML: Active Learning  Topic modeling  Dimensionality Reduction and Representations 17
  • 19. More to data than questions or intent  What do people talk about?  What kind of issues are common?  Are there clear lines defining topics?  Finding problem areas  Strategic thinking about what to tackle 18
  • 20. Know Your Data Read some of it (if you can) Learn the context Cluster and overview 19
  • 21. Clustering or modeling chat topics  LDA, LSA, NMF, others  Human supervision necessary for interpretation (boo!)  Messages short, so chat is hard  Larger documents have broader topic distributions  We expect messages to be about fewer topics 20
  • 22. Using LDA with Chat 𝜶 =. 𝟓 𝜶 =. 𝟏 𝜶 =. 𝟎𝟓 𝜶 = . 𝟎𝟑 know; does; link database; jermaine; running file; area; bank free; jermaine; database did; try; work online; palace; sorry mean; try; screen user; hi; email send; test; agent try; user; free did; ok; want client; server; user look; able; mean user; client; error error; server; user ok; did; update online; help; screen mean; app; does whats; agent; end mean; user; file hi; palace; property shall; working; process client; property; user online; user; change email; error; just emails; kelly; time online; user; update mandy; wrong; chance user; issue; want did; ok; property palace; live; test owner; end; invoice client; need; check ticket; whats; right run; right; check want; error; agent owner; report; password check; chloe; duncan emails; know; link live; palace; try 21
  • 23. Where we are  Jumpstart ML: Active Learning  Topic modeling  Dimensionality Reduction and Representations 22
  • 24. Why do dimensionality reduction?  We want to improve our supervised learning techniques  Chat data is even more sparse than many NL datasets  Good representations can help search and similarity models  Off the shelf representations are good  Off the shelf + custom representations are better 23
  • 25. Setting up methods for learning  Word2vec, NMF, even LDA  Most methods equivalent*  Chat has no clear document barriers  Methods assume either continuous context or separate documents  Using messages as contexts  too sparse 24
  • 26. Choosing a context  Representations are influenced by context choice  Figure out your goal  Choose context where words are associated in a way helpful for your goal  For our purposes: Words should be similar if they occur together in issues people have 25
  • 27. Using a time-based context window  Window before each question  Problem statement and questions should be related USER2: Can I email this form, or do I have to print it out? USER1: You need to drop the form off in person USER2: OK, sure. USER1: Great. USER2: Where can I get access to the printers? … 26
  • 28. Keywords are extracted from recent history USER2: Can I email this form, or do I have to print it out? USER1: You need to drop the form off in person USER2: OK, sure. USER1: Great. USER2: Where can I get access to the printers? … 27
  • 29. Similarity from resulting representations  ‘printer’  ['printer', 'choice', 'fuji', 'xerox', 'settings', 'sequence', 'default', 'rollover', 'driver', 'takes', 'smaller', 'main', ]  ‘issue’  ['issue', 'resolved', 'helping', 'experiencing', 'companies', 'related', 'assuming', 'reported', 'double', 'site', 'saw', 'causing', 'understand', 'sorted', 'logging', 'heard’]  ‘ssh’  ['ssh', 'config', 'dhcp, 'ping', 'reconnect', 'jpg’, 'webconsole', 'coats', 'lab’, 'browsers', 'instances', 'bypass’] 28
  • 30. Final Thoughts...  Tip of the iceberg  Understand how people interact  What information can we extract?  Can we escape our corpus? 29
  • 31. Thank you everyone!  thanks ['heaps', 'great', 'perfect', 'fantastic',] 30

Editor's Notes

  1. Hi, I’m Daniel Shank,
  2. Joke: If I coworker
  3. Joke: If I coworker
  4. Note that this is a clustering of messages, so seeing things like “ok” is not as bad as you might initially think
  5. Levy/Goldberg reference?
  6. Levy/Goldberg reference?
  7. Levy/Goldberg reference?