This talk covers three key aspects of applying deep learning for natural language understanding. First, we'll review current use cases for NLP, discuss what makes language understanding a particularly hard problem, and how deep learning promises to help. Second, we'll walk through an example of building a named entity recognizer - showing the common interplay between LSTM's, CNN's, transfer learning and CRF's in today's state of the art systems. Third, we'll cover best practices for taking such systems from prototypes to production. This talk is intended for practicing data scientists and R&D leaders who need to use the latest advances in the field in systems they're currently building.
2. CONTENTS
NLP & THE PROMISE OF DEEP LEARNING
IN ACTION: NAMED ENTITY RECOGNITION
GOING TO PRODUCTION
3. AI VS. DOCTORS
Deep Learning
Computer
Vision
Access to Care
Diagnostic
Accuracy
4. NLP IN HEALTHCARE
Deep Learning
NLP
Efficiency
Accuracy
Radiology Diagnostic
Mental
Health
Safety
Events
Inpatient
Pre-
Auth
Key
Opinion
Leaders
Research
Meta
Analysis
Clinical
Coding
Financial
Anti-
Fraud
Adverse
Events
Drug Development
Recruit
for Trials
6. ED Triage Notes
states started last night, upper abd, took alka seltzer approx
0500, no relief. nausea no vomiting
Since yeatreday 10/10 "constant Tylenol 1 hr ago. +nausea.
diaphoretic. Mid abd radiates to back
Generalized abd radiating to lower x 3 days accompanied
by dark stools. Now with bloody stool this am. Denies dizzy,
sob, fatigue. Visiting from Japan on business.”
Features
Type of Pain
Intensity of Pain
Body part of region
Symptoms
Onset of symptoms
Attempted home remedy
HUMAN LANGUAGE IS CONTEXTUAL
8. THE PROMISE OF DEEP LEARNING
Get by with rules, search,
RegEx, attribute extraction
Welcome to the world of
NLP, ML and DL
Social media
Does this social media post
contain an offensive word?
Is this social media post
offensive?
Legal
Find patents with the terms
‘car’ and battery’, or synonyms
Who is patenting next-gen
electrical car batteries?
Support
Find products mentioned in
customer emails or phone calls
What is this customer
complaining about?
Finance
Extract the fee structure from a
mutual fund prospectus
Are UK pensions allowed to
invest in this fund?
Healthcare
Extract the patient’s blood
pressure reading from a note
Does this patient have high
blood pressure?
9. CONTENTS
NLP & THE PROMISE OF DEEP LEARNING
IN ACTION: NAMED ENTITY RECOGNITION
GOING TO PRODUCTION
11. FROM CRF TO DEEP LEARNING (AND BACK)
From Yves Peirsman’s Named Entity Recognition and the Road to Deep Learning
• CoNLL-2003 shared task dataset
• CRF++ Implementation
• Feature engineering:
• the token itself
• Its Bigram & trigram
• Their prefix & suffix
• Its part of speech
• Its chunk type
• Does it start with a capital?
• Is it uppercase?
• Is it a digit?
• Surrounding context words
Starting Point: “Classic” machine learning approach
81.15%
F-score
12. CRF + WORD EMBEDDINGS
From Yves Peirsman’s Named Entity Recognition and the Road to Deep Learning
Replacing curated dictionaries with embeddings to model semantic similarity
84.9%
F-score
13. FORGET CRF. LET’S USE AN LSTM NETWORK
From Yves Peirsman’s Named Entity Recognition and the Road to Deep Learning
An LSTM is a type of RNN, well suited for sequential data with long-term dependencies
64.9%
LSTM F-score
76.1%
biLSTM F-score
14. TRANSFER LEARNING: USE PRETRAINED EMBEDDINGS
From Yves Peirsman’s Named Entity Recognition and the Road to Deep Learning
85.9%
F-score
Reuse the embeddings trained on Wikipedia,
instead of on CoNNL which only has 200,000 words
15. ADD CHARACTER BASED MODEL: BI-LSTM OR CNN
From Yves Peirsman’s Named Entity Recognition and the Road to Deep Learning
89.3%
F-score
In addition to token based models, add a character-based biLSTM or CNN
to learn and model word prefixes and suffixes
16. LET’S GET OVER 90% - BRING BACK THE CRF!
From Yves Peirsman’s Named Entity Recognition and the Road to Deep Learning
90.3%
F-score
Because predicting all labels independently of each other, not taking into account the
labels predicted for the surrounding words, leaves some accuracy on the table
17. In deep learning, architecture engineering
is the new feature engineering.
Stephen Merity
18. CONTENTS
NLP & THE PROMISE OF DEEP LEARNING
IN ACTION: NAMED ENTITY RECOGNITION
GOING TO PRODUCTION
20. Data
Curation
Data
Science
Data
Engineering
Data
Operations
Get the data Get expert labels
Get pretrained datasets
& embeddings
“Inception v3 was trained on
1.28 million images”
“In the study, the algorithm went
head-to-head against 21 board-
certified dermatologists”
Facebook open sourced
pre-trained word vectors for
294 languages, trained
on Wikipedia using fastText
“used over 120,000 retinal
images to train a neural network
to detect diabetic retinopathy”
“All images were graded by 3 to 7
different ophthalmologists, from
a panel of 54 US-licensed senior
residents & ophthalmologists”
UMLS has over 1 million
biomedical concepts and 5
million concept names, from
over 100 controlled vocabularies
21. Data
Curation
Data
Science
Data
Engineering
Data
Operations
Read up on state of the art, domain specific research
“How to Train Good Word Embeddings
for Biomedical NLP”.
Chiu et al., In Proceedings of BioNLP’16, August 2016.
“Entity Recognition from Clinical Texts via Recurrent
Neural Network”.
Liu et al., BMC Medical Informatics & Decision Making, July 2017.
Are your ML/DL/NLP libraries research or industrial grade?
22. Data Sources API
Spark Core API (RDD’s, Project Tungsten)
Spark SQL API (DataFrame, Catalyst Optimizer)
Spark ML API (Pipeline, Transformer, Estimator)
Part of Speech Tagger
Named Entity Recognition
Sentiment Analysis
Spell Checker
Tokenizer
Stemmer
Lemmatizer
Entity Extraction
Topic Modeling
Word2Vec
TF-IDF
String distance calculation
N-grams calculation
Stop word removal
Train/Test & Cross-Validate
Ensembles
High Performance Natural Language Understanding at Scale
Data
Curation
Data
Science
Data
Engineering
Data
Operations
DeepLearning4j Spark-NLP
There is not one “language” – every vertical and communication channel has its own jargon that includes vocabulary, grammar, assumptions and semantics.
For example – in these ED triage notes, none of the sentences is in valid English, and the words “patient” and “pain” do not appear.
Another challenge is that a lot of what we say is not in the text itself – it’s about the relationship, occasion, social norms, feeling to be communicated.
Language can be viewed as a compression problem – can you summarize a 2-hour event into a few sentences? How was the movie? What did the doctor say?
Challenges in NER: Going beyond dictionaries and lists. For examples, “Chandler” is obviously not the city of Chandler, AZ and “Central Perk” is obviously a place even if you’ve never heard of it (since it the location of a meeting).
There can be many kinds of entities that a given problem will need to extract: companies, people, genes, diseases, financial terms, etc.