SlideShare a Scribd company logo
1 of 25
Dr. David Talby
DEEP LEARNING FOR
NATURAL LANGUAGE UNDERSTANDING
CONTENTS
 NLP & THE PROMISE OF DEEP LEARNING
 IN ACTION: NAMED ENTITY RECOGNITION
 GOING TO PRODUCTION
AI VS. DOCTORS
Deep Learning
Computer
Vision
Access to Care
Diagnostic
Accuracy
NLP IN HEALTHCARE
Deep Learning
NLP
Efficiency
Accuracy
Radiology Diagnostic
Mental
Health
Safety
Events
Inpatient
Pre-
Auth
Key
Opinion
Leaders
Research
Meta
Analysis
Clinical
Coding
Financial
Anti-
Fraud
Adverse
Events
Drug Development
Recruit
for Trials
Natural Language Understanding
is an AI-Complete problem.
ED Triage Notes
states started last night, upper abd, took alka seltzer approx
0500, no relief. nausea no vomiting
Since yeatreday 10/10 "constant Tylenol 1 hr ago. +nausea.
diaphoretic. Mid abd radiates to back
Generalized abd radiating to lower x 3 days accompanied
by dark stools. Now with bloody stool this am. Denies dizzy,
sob, fatigue. Visiting from Japan on business.”
Features
Type of Pain
Intensity of Pain
Body part of region
Symptoms
Onset of symptoms
Attempted home remedy
HUMAN LANGUAGE IS CONTEXTUAL
HUMAN LANGUAGE IS NUANCED
THE PROMISE OF DEEP LEARNING
Get by with rules, search,
RegEx, attribute extraction
Welcome to the world of
NLP, ML and DL
Social media
Does this social media post
contain an offensive word?
Is this social media post
offensive?
Legal
Find patents with the terms
‘car’ and battery’, or synonyms
Who is patenting next-gen
electrical car batteries?
Support
Find products mentioned in
customer emails or phone calls
What is this customer
complaining about?
Finance
Extract the fee structure from a
mutual fund prospectus
Are UK pensions allowed to
invest in this fund?
Healthcare
Extract the patient’s blood
pressure reading from a note
Does this patient have high
blood pressure?
CONTENTS
 NLP & THE PROMISE OF DEEP LEARNING
 IN ACTION: NAMED ENTITY RECOGNITION
 GOING TO PRODUCTION
NAMED ENTITY RECOGNITION
From Sutton & McCallum’s An Introduction to Conditional Random Fields.
FROM CRF TO DEEP LEARNING (AND BACK)
From Yves Peirsman’s Named Entity Recognition and the Road to Deep Learning
• CoNLL-2003 shared task dataset
• CRF++ Implementation
• Feature engineering:
• the token itself
• Its Bigram & trigram
• Their prefix & suffix
• Its part of speech
• Its chunk type
• Does it start with a capital?
• Is it uppercase?
• Is it a digit?
• Surrounding context words
Starting Point: “Classic” machine learning approach
81.15%
F-score
CRF + WORD EMBEDDINGS
From Yves Peirsman’s Named Entity Recognition and the Road to Deep Learning
Replacing curated dictionaries with embeddings to model semantic similarity
84.9%
F-score
FORGET CRF. LET’S USE AN LSTM NETWORK
From Yves Peirsman’s Named Entity Recognition and the Road to Deep Learning
An LSTM is a type of RNN, well suited for sequential data with long-term dependencies
64.9%
LSTM F-score
76.1%
biLSTM F-score
TRANSFER LEARNING: USE PRETRAINED EMBEDDINGS
From Yves Peirsman’s Named Entity Recognition and the Road to Deep Learning
85.9%
F-score
Reuse the embeddings trained on Wikipedia,
instead of on CoNNL which only has 200,000 words
ADD CHARACTER BASED MODEL: BI-LSTM OR CNN
From Yves Peirsman’s Named Entity Recognition and the Road to Deep Learning
89.3%
F-score
In addition to token based models, add a character-based biLSTM or CNN
to learn and model word prefixes and suffixes
LET’S GET OVER 90% - BRING BACK THE CRF!
From Yves Peirsman’s Named Entity Recognition and the Road to Deep Learning
90.3%
F-score
Because predicting all labels independently of each other, not taking into account the
labels predicted for the surrounding words, leaves some accuracy on the table
In deep learning, architecture engineering
is the new feature engineering.
Stephen Merity
CONTENTS
 NLP & THE PROMISE OF DEEP LEARNING
 IN ACTION: NAMED ENTITY RECOGNITION
 GOING TO PRODUCTION
Data
Curation
Data
Science
Data
Engineering
Data
Operations
Moving from research to production?
 Business Case
 All four roles on the team
Data
Curation
Data
Science
Data
Engineering
Data
Operations
Get the data Get expert labels
Get pretrained datasets
& embeddings
“Inception v3 was trained on
1.28 million images”
“In the study, the algorithm went
head-to-head against 21 board-
certified dermatologists”
Facebook open sourced
pre-trained word vectors for
294 languages, trained
on Wikipedia using fastText
“used over 120,000 retinal
images to train a neural network
to detect diabetic retinopathy”
“All images were graded by 3 to 7
different ophthalmologists, from
a panel of 54 US-licensed senior
residents & ophthalmologists”
UMLS has over 1 million
biomedical concepts and 5
million concept names, from
over 100 controlled vocabularies
Data
Curation
Data
Science
Data
Engineering
Data
Operations
Read up on state of the art, domain specific research
“How to Train Good Word Embeddings
for Biomedical NLP”.
Chiu et al., In Proceedings of BioNLP’16, August 2016.
“Entity Recognition from Clinical Texts via Recurrent
Neural Network”.
Liu et al., BMC Medical Informatics & Decision Making, July 2017.
Are your ML/DL/NLP libraries research or industrial grade?
Data Sources API
Spark Core API (RDD’s, Project Tungsten)
Spark SQL API (DataFrame, Catalyst Optimizer)
Spark ML API (Pipeline, Transformer, Estimator)
Part of Speech Tagger
Named Entity Recognition
Sentiment Analysis
Spell Checker
Tokenizer
Stemmer
Lemmatizer
Entity Extraction
Topic Modeling
Word2Vec
TF-IDF
String distance calculation
N-grams calculation
Stop word removal
Train/Test & Cross-Validate
Ensembles
High Performance Natural Language Understanding at Scale
Data
Curation
Data
Science
Data
Engineering
Data
Operations
DeepLearning4j Spark-NLP
Data
Curation
Data
Science
Data
Engineering
Data
Operations
Data
Curation
Data
Science
Data
Engineering
Data
Operations
From: Post by Ben Lorica
david@pacific.ai
@davidtalby
in/davidtalby
THANK YOU!

More Related Content

Similar to Deep learning for natural language understanding

Workshop on Assignment 2 SCI115 Live workshop 103020.docx
Workshop on Assignment 2 SCI115 Live workshop 103020.docxWorkshop on Assignment 2 SCI115 Live workshop 103020.docx
Workshop on Assignment 2 SCI115 Live workshop 103020.docxdunnramage
 
How deep learning reshapes medicine
How deep learning reshapes medicineHow deep learning reshapes medicine
How deep learning reshapes medicineHongyoon Choi
 
Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature surveyAkshay Hegde
 
NLP in Practice - Part I
NLP in Practice - Part INLP in Practice - Part I
NLP in Practice - Part IDelip Rao
 
CSCE181 Big ideas in NLP
CSCE181 Big ideas in NLPCSCE181 Big ideas in NLP
CSCE181 Big ideas in NLPInsoo Chung
 
The (very) basics of AI for the Radiology resident
The (very) basics of AI for the Radiology residentThe (very) basics of AI for the Radiology resident
The (very) basics of AI for the Radiology residentPedro Staziaki
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéDatabricks
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéJen Aman
 
Deep Learning - Speaker Verification, Sound Event Detection
Deep Learning - Speaker Verification, Sound Event DetectionDeep Learning - Speaker Verification, Sound Event Detection
Deep Learning - Speaker Verification, Sound Event DetectionSai Kiran Kadam
 
Week1- Introduction.pptx
Week1- Introduction.pptxWeek1- Introduction.pptx
Week1- Introduction.pptxfahmi324663
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...RajkiranVeluri
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...Databricks
 
How to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical TrialsHow to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical TrialsDavid Talby
 
Using Machine Learning to Automate Clinical Pathways
Using Machine Learning to Automate Clinical PathwaysUsing Machine Learning to Automate Clinical Pathways
Using Machine Learning to Automate Clinical Pathwaysdiannepatricia
 
A Practical Use of Artificial Intelligence in the Fight Against Cancer by Bri...
A Practical Use of Artificial Intelligence in the Fight Against Cancer by Bri...A Practical Use of Artificial Intelligence in the Fight Against Cancer by Bri...
A Practical Use of Artificial Intelligence in the Fight Against Cancer by Bri...Data Con LA
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic ComputingMeena Nagarajan
 
Deep Learning for Automatic Speaker Recognition
Deep Learning for Automatic Speaker RecognitionDeep Learning for Automatic Speaker Recognition
Deep Learning for Automatic Speaker RecognitionSai Kiran Kadam
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-searchDiana Maynard
 

Similar to Deep learning for natural language understanding (20)

Workshop on Assignment 2 SCI115 Live workshop 103020.docx
Workshop on Assignment 2 SCI115 Live workshop 103020.docxWorkshop on Assignment 2 SCI115 Live workshop 103020.docx
Workshop on Assignment 2 SCI115 Live workshop 103020.docx
 
How deep learning reshapes medicine
How deep learning reshapes medicineHow deep learning reshapes medicine
How deep learning reshapes medicine
 
Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature survey
 
NLP in Practice - Part I
NLP in Practice - Part INLP in Practice - Part I
NLP in Practice - Part I
 
CSCE181 Big ideas in NLP
CSCE181 Big ideas in NLPCSCE181 Big ideas in NLP
CSCE181 Big ideas in NLP
 
The (very) basics of AI for the Radiology resident
The (very) basics of AI for the Radiology residentThe (very) basics of AI for the Radiology resident
The (very) basics of AI for the Radiology resident
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
 
Deep Learning - Speaker Verification, Sound Event Detection
Deep Learning - Speaker Verification, Sound Event DetectionDeep Learning - Speaker Verification, Sound Event Detection
Deep Learning - Speaker Verification, Sound Event Detection
 
Week1- Introduction.pptx
Week1- Introduction.pptxWeek1- Introduction.pptx
Week1- Introduction.pptx
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
 
How to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical TrialsHow to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical Trials
 
Using Machine Learning to Automate Clinical Pathways
Using Machine Learning to Automate Clinical PathwaysUsing Machine Learning to Automate Clinical Pathways
Using Machine Learning to Automate Clinical Pathways
 
A Practical Use of Artificial Intelligence in the Fight Against Cancer by Bri...
A Practical Use of Artificial Intelligence in the Fight Against Cancer by Bri...A Practical Use of Artificial Intelligence in the Fight Against Cancer by Bri...
A Practical Use of Artificial Intelligence in the Fight Against Cancer by Bri...
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic Computing
 
Deep Learning for Automatic Speaker Recognition
Deep Learning for Automatic Speaker RecognitionDeep Learning for Automatic Speaker Recognition
Deep Learning for Automatic Speaker Recognition
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-search
 
Deep learning for NLP
Deep learning for NLPDeep learning for NLP
Deep learning for NLP
 

More from David Talby

Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...David Talby
 
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st WorldTurning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st WorldDavid Talby
 
New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022David Talby
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
 
Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021David Talby
 
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...David Talby
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 editionDavid Talby
 
Build your open source data science platform
Build your open source data science platformBuild your open source data science platform
Build your open source data science platformDavid Talby
 
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...David Talby
 
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System
Architecting a Predictive,  Petabyte-Scale, Self-Learning Fraud Detection SystemArchitecting a Predictive,  Petabyte-Scale, Self-Learning Fraud Detection System
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection SystemDavid Talby
 
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...David Talby
 

More from David Talby (11)

Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...
 
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st WorldTurning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
 
New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
 
Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021
 
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
Build your open source data science platform
Build your open source data science platformBuild your open source data science platform
Build your open source data science platform
 
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
 
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System
Architecting a Predictive,  Petabyte-Scale, Self-Learning Fraud Detection SystemArchitecting a Predictive,  Petabyte-Scale, Self-Learning Fraud Detection System
Architecting a Predictive, Petabyte-Scale, Self-Learning Fraud Detection System
 
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
 

Recently uploaded

SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 

Recently uploaded (20)

SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 

Deep learning for natural language understanding

  • 1. Dr. David Talby DEEP LEARNING FOR NATURAL LANGUAGE UNDERSTANDING
  • 2. CONTENTS  NLP & THE PROMISE OF DEEP LEARNING  IN ACTION: NAMED ENTITY RECOGNITION  GOING TO PRODUCTION
  • 3. AI VS. DOCTORS Deep Learning Computer Vision Access to Care Diagnostic Accuracy
  • 4. NLP IN HEALTHCARE Deep Learning NLP Efficiency Accuracy Radiology Diagnostic Mental Health Safety Events Inpatient Pre- Auth Key Opinion Leaders Research Meta Analysis Clinical Coding Financial Anti- Fraud Adverse Events Drug Development Recruit for Trials
  • 5. Natural Language Understanding is an AI-Complete problem.
  • 6. ED Triage Notes states started last night, upper abd, took alka seltzer approx 0500, no relief. nausea no vomiting Since yeatreday 10/10 "constant Tylenol 1 hr ago. +nausea. diaphoretic. Mid abd radiates to back Generalized abd radiating to lower x 3 days accompanied by dark stools. Now with bloody stool this am. Denies dizzy, sob, fatigue. Visiting from Japan on business.” Features Type of Pain Intensity of Pain Body part of region Symptoms Onset of symptoms Attempted home remedy HUMAN LANGUAGE IS CONTEXTUAL
  • 8. THE PROMISE OF DEEP LEARNING Get by with rules, search, RegEx, attribute extraction Welcome to the world of NLP, ML and DL Social media Does this social media post contain an offensive word? Is this social media post offensive? Legal Find patents with the terms ‘car’ and battery’, or synonyms Who is patenting next-gen electrical car batteries? Support Find products mentioned in customer emails or phone calls What is this customer complaining about? Finance Extract the fee structure from a mutual fund prospectus Are UK pensions allowed to invest in this fund? Healthcare Extract the patient’s blood pressure reading from a note Does this patient have high blood pressure?
  • 9. CONTENTS  NLP & THE PROMISE OF DEEP LEARNING  IN ACTION: NAMED ENTITY RECOGNITION  GOING TO PRODUCTION
  • 10. NAMED ENTITY RECOGNITION From Sutton & McCallum’s An Introduction to Conditional Random Fields.
  • 11. FROM CRF TO DEEP LEARNING (AND BACK) From Yves Peirsman’s Named Entity Recognition and the Road to Deep Learning • CoNLL-2003 shared task dataset • CRF++ Implementation • Feature engineering: • the token itself • Its Bigram & trigram • Their prefix & suffix • Its part of speech • Its chunk type • Does it start with a capital? • Is it uppercase? • Is it a digit? • Surrounding context words Starting Point: “Classic” machine learning approach 81.15% F-score
  • 12. CRF + WORD EMBEDDINGS From Yves Peirsman’s Named Entity Recognition and the Road to Deep Learning Replacing curated dictionaries with embeddings to model semantic similarity 84.9% F-score
  • 13. FORGET CRF. LET’S USE AN LSTM NETWORK From Yves Peirsman’s Named Entity Recognition and the Road to Deep Learning An LSTM is a type of RNN, well suited for sequential data with long-term dependencies 64.9% LSTM F-score 76.1% biLSTM F-score
  • 14. TRANSFER LEARNING: USE PRETRAINED EMBEDDINGS From Yves Peirsman’s Named Entity Recognition and the Road to Deep Learning 85.9% F-score Reuse the embeddings trained on Wikipedia, instead of on CoNNL which only has 200,000 words
  • 15. ADD CHARACTER BASED MODEL: BI-LSTM OR CNN From Yves Peirsman’s Named Entity Recognition and the Road to Deep Learning 89.3% F-score In addition to token based models, add a character-based biLSTM or CNN to learn and model word prefixes and suffixes
  • 16. LET’S GET OVER 90% - BRING BACK THE CRF! From Yves Peirsman’s Named Entity Recognition and the Road to Deep Learning 90.3% F-score Because predicting all labels independently of each other, not taking into account the labels predicted for the surrounding words, leaves some accuracy on the table
  • 17. In deep learning, architecture engineering is the new feature engineering. Stephen Merity
  • 18. CONTENTS  NLP & THE PROMISE OF DEEP LEARNING  IN ACTION: NAMED ENTITY RECOGNITION  GOING TO PRODUCTION
  • 19. Data Curation Data Science Data Engineering Data Operations Moving from research to production?  Business Case  All four roles on the team
  • 20. Data Curation Data Science Data Engineering Data Operations Get the data Get expert labels Get pretrained datasets & embeddings “Inception v3 was trained on 1.28 million images” “In the study, the algorithm went head-to-head against 21 board- certified dermatologists” Facebook open sourced pre-trained word vectors for 294 languages, trained on Wikipedia using fastText “used over 120,000 retinal images to train a neural network to detect diabetic retinopathy” “All images were graded by 3 to 7 different ophthalmologists, from a panel of 54 US-licensed senior residents & ophthalmologists” UMLS has over 1 million biomedical concepts and 5 million concept names, from over 100 controlled vocabularies
  • 21. Data Curation Data Science Data Engineering Data Operations Read up on state of the art, domain specific research “How to Train Good Word Embeddings for Biomedical NLP”. Chiu et al., In Proceedings of BioNLP’16, August 2016. “Entity Recognition from Clinical Texts via Recurrent Neural Network”. Liu et al., BMC Medical Informatics & Decision Making, July 2017. Are your ML/DL/NLP libraries research or industrial grade?
  • 22. Data Sources API Spark Core API (RDD’s, Project Tungsten) Spark SQL API (DataFrame, Catalyst Optimizer) Spark ML API (Pipeline, Transformer, Estimator) Part of Speech Tagger Named Entity Recognition Sentiment Analysis Spell Checker Tokenizer Stemmer Lemmatizer Entity Extraction Topic Modeling Word2Vec TF-IDF String distance calculation N-grams calculation Stop word removal Train/Test & Cross-Validate Ensembles High Performance Natural Language Understanding at Scale Data Curation Data Science Data Engineering Data Operations DeepLearning4j Spark-NLP

Editor's Notes

  1. There is not one “language” – every vertical and communication channel has its own jargon that includes vocabulary, grammar, assumptions and semantics. For example – in these ED triage notes, none of the sentences is in valid English, and the words “patient” and “pain” do not appear.
  2. Another challenge is that a lot of what we say is not in the text itself – it’s about the relationship, occasion, social norms, feeling to be communicated. Language can be viewed as a compression problem – can you summarize a 2-hour event into a few sentences? How was the movie? What did the doctor say?
  3. Challenges in NER: Going beyond dictionaries and lists. For examples, “Chandler” is obviously not the city of Chandler, AZ and “Central Perk” is obviously a place even if you’ve never heard of it (since it the location of a meeting). There can be many kinds of entities that a given problem will need to extract: companies, people, genes, diseases, financial terms, etc.