COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
Health care special interest-i2b2
1. Extracting information
from clinical notes
H. Yang, I. Spasic, F. Sarafraz,
John A. Keane, Goran Nenadic
School of Computer Science
University of Manchester
2. Motivation & aim
Electronic clinical notes
electronic medical/health records
hospital discharge summaries
Extract information on
individual patients and their diseases
clinical practice
treatments, drugs used, etc.
Aim: support data analytics
e.g. monitoring quality
Huge interest locally and internationally
3. Clinical notes
Highly condensed text
sometimes without proper sentences
hospital discharge summaries are more structured
list of medications, symptoms, etc.
Terminological variability
orthographic, acronyms, local conventions
Various sections
previous history, social/family background
5. NLP challenges in clinical data
A series of international challenges in information
extraction from clinical narratives
organisers: Informatics for Integrating Biology & the
Bedside (i2b2)
3 shared tasks so far
− De-identification of medical records and identification of
smokers from their clinical records (2007)
Identification of obesity & related diseases in patients from
hospital discharge documents (2008)
Extraction of medications and related information from
patients’ discharge documents (2009)
2010 challenge
concept, assertions, relations
6. i2b2 2008
Extract status of diseases in patients
obesity, diabetes mellitus, hypercholesterolemia,
hypertriglyceridemia, hypertension, heart failure (16 in total)
status: yes, no, unmentioned, questionable
on textual and “intuitive” level
28 teams worldwide
UoM ranked 1st in textual and 7th in intuitive
Our methodology
Term-based exact and approximate matching
Context-based pattern- and rule-based matching
Machine learning approach
Yang, H., Spasic, I., Keane, J., Nenadic, G.: A Text Mining Approach to the Prediction of a
Disease Status from Clinical Discharge Summaries, JAMIA 16(4):596-600
7. Methodology
Linguistic section splitting, sentence splitting,
pre-processing chunking, POS tagging, parsing
Information textual evidence extraction,
extraction section filtering, morphological
Medical
(rules, machine clues (e.g. drug/disease name
resources
learning) affixes)
•Disease names
•Drug names
•Body parts Template filling, filtering negative
•Symptoms results, relations and heuristics:
•Abbreviations Constructing Organ : Symptom,
•Synonyms results Symptom : Disease,
Disease : Drug,
Drug : Mode of application
8. Rule-based IE
Disease status patterns
- context-based patterns
[N] negative for CHF
[Q] question of asthma
[U] no known diagnosis of CAD
[U] we should consider further asthma studies as an
outpatient
- semantics-based patterns
[N] normal coronaries, a thin black man
Clinical resources used in sentence extraction
clinical inference rules e.g., weight>90kg,
LDL>160mg/dl, HDL<35mg/dl
medications e.g., ‘anti-depressant’
9. Textual Annotation Results
Performance on Disease Status (Ranked 1st)
Micro-average: Accuracy (0.9723)
Macro-average: P (0.8482), R (0.7737), F-score (0.8052)
#Eval #Corr #Gold Precision Recall F-score
Y 2267 2132 2192 0.9404 0.9726 0.9562
N 56 40 65 0.7142 0.6153 0.6611
Q 12 9 17 0.7500 0.5294 0.6206
U 5709 5640 5770 0.9879 0.9774 0.9826
10. Intuitive Annotation Results
Performance on Disease Status (Ranked 7th)
Micro-average: Accuracy (0.9572)
Macro-average: P (0.6383), R (0.6294), F-score (0.6336)
#Eval #Corr #Gold Precision Recall F-Score
Y 2160 2068 2285 0.9574 0.9050 0.9304
N 5236 5014 5100 0.9576 0.9831 0.9702
Q 3 0 14 0 0 0
11. i2b2 2009
Extract mentions of medication and related
information
drugs the patient takes
dose, mode of application, frequency, duration, etc.
(for each mention)
19 teams worldwide
UoM ranked 3rd
Our approach was based on combining
extensive dictionaries
morphological and derivational patterns
12. Evaluation (F-measure)
Medication 83.59%
Dosage 82.67%
Frequency 83.49%
Mode 85.33%
Duration 51.00%
Reason 38.81%
All fields 78.47%
Spasić I, Sarafraz F, Keane JA, Nenadic G: “Medication Information Extraction
with Linguistic Pattern Matching and Semantic Rules”, JAMIA (to appear)
13. Summary
NLP and text mining techniques are useful for extraction
of clinical data
- disease status extraction: 95-97% accuracy
- medication information extraction: 80% F-measure
Construction of reliable and sufficient resources
- clinical terms and abbreviations (e.g., disease synonyms,
symptoms, drugs)
- context patterns related to diseases, medication, etc.
Domain knowledge required
construction of domain- and task-specific resources
complex clinical facts and conditions for inference
more comprehensive knowledge representation needed