Getting Value Out of Chat Data:
Chat-based interfaces are increasingly common, whether as customers interacting with companies or as employees communicating with each other within an organization. Given the large number of chat logs being captured, along with recent advances in natural language processing, there is a desire to leverage this data for both insight generation and machine learning applications. Unfortunately, chat data is user-generated data, meaning it is often noisy and difficult to normalize. It is also mostly short texts and heavily context-dependent, which cause difficulty in applying methods such as topic modeling and information extraction.
Despite these challenges, it is still possible to extract useful information from these data sources. In this talk, I will be providing an overview of techniques and practices for working with chat-based user interaction data with a focus on machine-augmented data annotation and unsupervised learning methods.
Bio: Daniel Shank is a Senior Data Scientist at Talla, a company developing a platform for intelligent information discovery and delivery. His focus is on developing machine learning techniques to handle various business automation tasks, such as scheduling, polls, expert identification, as well as doing work on NLP. Before joining Talla as the company’s first employee in 2015, Daniel worked with TechStars Boston and did consulting work for ThriveHive, a small business focused marketing company in Boston. He studied economics at the University of Chicago.