SlideShare a Scribd company logo
1 of 23
Kavita Ganesan & Michael Subotin 
Presented at: 2014 Conference on IEEE Big Data
All sorts of notes types! 
 Admit notes 
◦ documenting why patient is being admitted 
◦ baseline status, etc. 
 Progress notes 
◦ progress during course of hospitalization 
 Discharge notes 
◦ conclusion of a hospital stay or series of treatments 
 Others 
◦ Operative notes 
◦ Procedure notes 
◦ Delivery notes 
◦ Emergency Department notes, etc
PRIMARY CARE PHYSICIAN: 
Dr. XXXXX XXXXXXXX. 
CHIEF COMPLAINT: 
Injured right little toe. 
HISTORY OF PRESENT ILLNESS: 
This is a 63-year-old male with a past medical history of multiple 
myeloma who presents today after hitting his fifth toe of the right foot 
on a wood panel yesterday…… 
Review of Systems: 
CONSTITUTIONAL: No fever, chills, or weight loss. 
RESPIRATORY: No cough, shortness of breath, or wheezing. 
CARDIOVASCULAR: No chest pain, chest pressure, or palpitations. 
............... 
PAST MEDICAL HISTORY 
Multiple myeloma, peripheral neuropathy, hypertension.. 
PAST SURGICAL HISTORY:- 
Stem cell transplant. 
SOCIAL HISTORY 
The patient formerly smoked tobacco; however, quit within the last 10 
years. 
FAMILY HISTORY: 
Hypertension. 
ALLERGIES: 
ASPIRIN. 
……… 
Purpose of visit 
Patient’s current 
condition in 
narrative form 
Ongoing issues, 
issues in the past 
Information on 
allergies
PRIMARY CARE PHYSICIAN: 
Dr. XXXXX XXXXXXXX. 
CHIEF COMPLAINT: 
Injured right little toe. 
HISTORY OF PRESENT ILLNESS: 
This is a 63-year-old male with a past medical history of multiple 
myeloma who presents today after hitting his fifth toe of the right foot 
on a wood panel yesterday…… 
Review of Systems: 
CONSTITUTIONAL: No fever, chills, or weight loss. 
RESPIRATORY: No cough, shortness of breath, or wheezing. 
CARDIOVASCULAR: No chest pain, chest pressure, or palpitations. 
............... 
PAST MEDICAL HISTORY 
Multiple myeloma, peripheral neuropathy, hypertension.. 
PAST SURGICAL HISTORY:- 
Stem cell transplant. 
SOCIAL HISTORY 
The patient formerly smoked tobacco; however, quit within the last 10 
years. 
This is how most notes look: 
• some longer, some shorter 
• different set of headers, etc 
FAMILY HISTORY: 
Hypertension. 
ALLERGIES: 
ASPIRIN. 
……… 
Purpose of visit 
Patient’s current 
condition in 
narrative form 
Ongoing issues, 
issues in the past 
Information on 
allergies
PRIMARY CARE PHYSICIAN: 
Dr. XXXXX XXXXXXXX. 
PRIMARY CARE PHYSICIAN: 
Dr. XXXXX XXXXXXXX. 
PRIMARY CARE PHYSICIAN: 
Dr. XXXXX XXXXXXXX. 
CHIEF COMPLAIN: 
Injured right little toe. 
CHIEF COMPLAIN: 
Injured right little toe. 
CHIEF COMPLAINT: 
Injured right little toe. 
HISTORY OF PRESENT ILLNESS: 
This is a 63-year-old male with 
a past medical history of… 
HISTORY OF PRESENT ILLNESS: 
This is a 63-year-old male with 
a past medical history of… 
HISTORY OF PRESENT ILLNESS: 
This is a 63-year-old male with 
a past medical history of… 
Review of Systems: 
CONSTITUTIONAL: No fever, 
chills, or weight loss. 
CARDIOVASCULAR: No chest pain, 
chest pressure, or palpitations. 
............... 
Review of Systems: 
CONSTITUTIONAL: No fever, 
chills, or weight loss. 
CARDIOVASCULAR: No chest pain, 
chest pressure, or palpitations. 
............... 
……… 
Review of Systems: 
CONSTITUTIONAL: No fever, 
chills, or weight loss. 
CARDIOVASCULAR: No chest pain, 
chest pressure, or palpitations. 
............... 
……… 
……… 
 Very unstructured 
◦ formatting cues  inconsistent 
◦ varies: across physicians, notes, 
hospitals 
 Hard to analyze specific sections 
◦ E.g. analyze allergies patient population 
◦ Need to segment notes to extract 
all allergy info.
◦ Information collected vary from note types to note types 
 Ex. info on progress notes vs. admit note 
◦ Contents & formatting can vary from hospital to hospital 
 Even within the same organization – E.g. Kaiser 
◦ Contents & formatting vary between physicians 
 Different styles, speed of typing, etc.
 If you are looking at a single note type, from a single 
hospital - then maybe 
 Not suitable as a general segmentation approach: 
 Can easily break: 
◦ on unseen note types and minor format variations 
◦ Example: 
 regex based on all caps 
 regex based on seen headers only
 Several works have explored supervised methods to 
segmenting clinical notes 
[Cho et al. 2003, tepper et al. 2012, apostolva et al. 2009] 
 Problem: methods not general! 
◦ Cho et al. 2003: One model for each type of note 
 20 note types  20 models! 
 Not practical  maintain each model 
◦ Tepper et al. 2012: Model had low adaptability to unseen 
documents 
 features used, training data used, etc.
 General segmentation approach for clinical texts 
 Requirements: 
◦ Single model/approach for most note types 
◦ Discount extreme non-standard formatting 
e.g. tabular format 
 Segment: 
◦ Header 
◦ Top level sections 
◦ Footer
PRIMARY CARE PHYSICIAN: 
Dr. XXXXX XXXXXXXX. 
CHIEF COMPLAINT: 
Injured right little toe. 
HISTORY OF PRESENT ILLNESS: 
This is a 63-year-old male with a past medical history of multiple 
myeloma who presents today after hitting his fifth toe of the right foot 
on a wood panel yesterday…… 
Review of Systems: 
CONSTITUTIONAL: No fever, chills, or weight loss. 
RESPIRATORY: No cough, shortness of breath, or wheezing. 
CARDIOVASCULAR: No chest pain, chest pressure, or palpitations. 
............... 
PAST MEDICAL HISTORY 
Multiple myeloma, peripheral neuropathy, hypertension.. 
PAST SURGICAL HISTORY:- 
Stem cell transplant. 
SOCIAL HISTORY 
The patient formerly smoked tobacco; however, quit within the last 10 
years. 
FAMILY HISTORY: 
Hypertension. 
ALLERGIES: 
ASPIRIN. 
……… 
Header 
Top-level section 
Top-level section 
Top-level section 
Top-level section 
Top-level section 
Top-level section 
Top-level section
 Supervised approach using L1-Logistic Regression with a 
constraint combination approach 
 Idea: scan each line in a clinical document and label as: 
◦ BeginHeader 
◦ ContHeader 
◦ BeginSection 
◦ ContSection 
◦ Footer 
 Labels are predicted with certain confidence 
 But, problem using line-wise predictions as is: 
◦ Label sequences may not make sense 
◦ E.g. There maybe a BeginHeader after a BeginSection  
incorrect
 Post-processing: enforce sequence combination rules: 
◦ First line of document: BeginHeader or BeginSection 
◦ BeginHeader cannot come right after BeginHeader or ContHeader 
◦ ContHeader must come after BeginHeader or ContHeader 
◦ ContSection must come after BeginSection or ContSection 
◦ Footer cannot come right after BeginHeader or ContHeader 
 Rules applied after all lines in document labeled 
◦ Applied to consecutive label pairs 
◦ Computed efficiently: Viterbi algorithm
Inpatient Outpatient 
• Notes from 12 different enterprises 
• Some large enterprises 
• All sorts of note types 
• Some noisy sectioning, some clean 
• 100 radiology notes 
• Fairly clean sections 
• One hospital 
• All sorts of note types 
• Fairly well sectioned 
• 35, 000 notes in total 
• 2000 randomly sampled notes 
(inpatient) 
• 100 radiology notes 
• Fairly clean sections
 Emphasis on training data 
 Variation in training data 
◦ Use different note types for training 
◦ Intuition: help model generalize well 
 Sample training data: 
◦ Instead of using all training data from 2100 notes 
◦ Generated subsets of training data with varying size and 
cross-validate on test sets 
◦ Intuition: allows to pick the best model 
 Best model only used < 700 notes (out of 2100)
 5 test sets 
◦ 4/5 test set from hospitals not in train set 
 true estimate of accuracy 
◦ Covers both inpatient and outpatient notes 
◦ Covers different note types 
◦ ~12,500 test notes 
 Primary evaluation metric: line-wise accuracy 
◦ percentage of correctly predicted line labels
1st model: limited variety 
(hp + discharge) 
Train set 
3-folded cross 
validation 
Unseen test 
accuracy 
Inp1HospB (300 - limited) 96.70% 67.00% 
Inp3HospD (300 - varied) 96.58% 88.23% 
2nd model: variety 
(11 types - hp, ds, pn…) 
Model with variety: 
higher accuracy on 
unseen test set 
3-folded cross-validation 
accuracy: high in both 
Important to have variety in training notes in 
building general segmentation model
Accuracy consistently 
> 90% across enterprises 
Client/Data In/Outpatient # Test Docs Accuracy 
1. Inp1HospB In 300 92.58% 
2. Inp2HospC In 1000 93.29% 
3. Inp3HospD In 300 95.81% 
4. Rad1MixedHosps Out 9000 92.45% 
5. Rad2HospA Out 1902 93.67% 
Average 93.56% 
• Average accuracy: 93.56% 
• Covers inpatient/outpatient 
Single model: But, performs well across enterprises
Document Type Accuracy 
1. History and Physical 95.70% 
2. Physician Clinicals 93.10% 
3. Discharge Summary 94.00% 
4. Consult Note 94.60% 
5. Short Stay Summary 94.60% 
6. Operative Note 92.20% 
7. Progress Note 87.80% 
8. Cardiac Cath Report 85.40% 
9. Procedure Note 83.60% 
• Model performs well across note types 
• Lowest performance: procedure notes 
low recall on segmenting “technique” sections 
Performs 
very well 
> 90% 
Reasonable.. 
> 80% 
Accuracy Breakdown for Inp2HospC
94.00% 
93.00% 
92.00% 
91.00% 
90.00% 
89.00% 
88.00% 
87.00% 
86.00% 
# Notes vs. Accuracy 
No benefit with more notes 
0 500 1000 1500 2000 
Accuracy 
# Training Notes 
Avg. accurracy peaks @500 
notes on all test sets 
No benefit with more notes 
No need for big data for a general model. 
We need good data from all that big data!
 Unigrams – of each line (LineUnigram) 
 Relative position of line in document (PosInDoc) 
◦ Top, Middle, Bottom 
 Known Header features (KnownHeader) 
◦ Find potential headers using repository of seen headers 
◦ Seen headers can have canonical type 
E.g. Past Medical History, Previous Med History “PAST_MEDICAL_HISTORY” 
◦ If potential headers found, we include features: 
 Canonical type 
 Unigram & Char n-gram of potential header 
 Caps/colon info – mixed case, all caps, lowercase 
 Length of potential header
Feature Set 
Avg. 
Accuracy Improvement 
LineUnigram 85.55% 
LineUnigram+PosInDoc 88.62% +3.46% 
LineUnigram+PosInDoc+KnownHeader 93.10% +4.81%
 Explored: 
◦ Supervised approach to building a very general segmentation 
model for clinical texts 
 Evaluation showed: 
◦ Model works well on notes across enterprises 
◦ Model works across note types 
 Key to effectiveness: 
◦ Variation in training data –all sorts of note types 
◦ Training data selection strategy – sample and cross-validate 
◦ Feature set – not explored in existing works
Contact: 
Kavita Ganesan 
ganesan.kavita@gmail.com 
www.kavita-ganesan.com 
www.text-analytics101.com

More Related Content

Viewers also liked

Opinion Driven Decision Support System
Opinion Driven Decision Support SystemOpinion Driven Decision Support System
Opinion Driven Decision Support SystemKavita Ganesan
 
Micropinion Generation
Micropinion GenerationMicropinion Generation
Micropinion GenerationKavita Ganesan
 
Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)Kavita Ganesan
 
Opinosis Presentation @ Coling 2010: Opinosis - A Graph Based Approach to Abs...
Opinosis Presentation @ Coling 2010: Opinosis - A Graph Based Approach to Abs...Opinosis Presentation @ Coling 2010: Opinosis - A Graph Based Approach to Abs...
Opinosis Presentation @ Coling 2010: Opinosis - A Graph Based Approach to Abs...Kavita Ganesan
 
Introduction to Java Strings, By Kavita Ganesan
Introduction to Java Strings, By Kavita GanesanIntroduction to Java Strings, By Kavita Ganesan
Introduction to Java Strings, By Kavita GanesanKavita Ganesan
 
Francais orthographe
Francais orthographeFrancais orthographe
Francais orthographezouhaer
 
Power guineu 1[1]
Power guineu 1[1]Power guineu 1[1]
Power guineu 1[1]43705656K
 
What do We Know about Drag Kings?
What do We Know about Drag Kings?What do We Know about Drag Kings?
What do We Know about Drag Kings?Teila123
 
Financial terms
Financial terms Financial terms
Financial terms Tanu Bansal
 
28th Social Work Day at the United Nations 2011
28th Social Work Day at the  United Nations 201128th Social Work Day at the  United Nations 2011
28th Social Work Day at the United Nations 2011IFSW
 
User eXitus - Nenechte sve navstevniky odchazet BarCamp 2011 Ostrava
User eXitus - Nenechte sve navstevniky odchazet BarCamp 2011 OstravaUser eXitus - Nenechte sve navstevniky odchazet BarCamp 2011 Ostrava
User eXitus - Nenechte sve navstevniky odchazet BarCamp 2011 Ostravajirikomar
 
Carlos lenin estrada
Carlos lenin estradaCarlos lenin estrada
Carlos lenin estradacarloslenin19
 
Real Estate Impacts of Alternative Energy Technology
Real Estate Impacts of Alternative Energy TechnologyReal Estate Impacts of Alternative Energy Technology
Real Estate Impacts of Alternative Energy TechnologyZeroNet-Energy-Solutions
 
Salem Area Market Statistics Q1 2011
Salem Area Market Statistics Q1 2011Salem Area Market Statistics Q1 2011
Salem Area Market Statistics Q1 2011Christopher Polak
 
Victoriamolinatp1 110601071455-phpapp01
Victoriamolinatp1 110601071455-phpapp01Victoriamolinatp1 110601071455-phpapp01
Victoriamolinatp1 110601071455-phpapp01Pilii Ise Gelsi
 
Prsentation eng 101
Prsentation  eng 101Prsentation  eng 101
Prsentation eng 101sopno100
 
What is your earliest memory
What is your earliest memoryWhat is your earliest memory
What is your earliest memorymarco_fro19
 

Viewers also liked (20)

Opinion Driven Decision Support System
Opinion Driven Decision Support SystemOpinion Driven Decision Support System
Opinion Driven Decision Support System
 
Micropinion Generation
Micropinion GenerationMicropinion Generation
Micropinion Generation
 
Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)
 
Opinosis Presentation @ Coling 2010: Opinosis - A Graph Based Approach to Abs...
Opinosis Presentation @ Coling 2010: Opinosis - A Graph Based Approach to Abs...Opinosis Presentation @ Coling 2010: Opinosis - A Graph Based Approach to Abs...
Opinosis Presentation @ Coling 2010: Opinosis - A Graph Based Approach to Abs...
 
Introduction to Java Strings, By Kavita Ganesan
Introduction to Java Strings, By Kavita GanesanIntroduction to Java Strings, By Kavita Ganesan
Introduction to Java Strings, By Kavita Ganesan
 
Francais orthographe
Francais orthographeFrancais orthographe
Francais orthographe
 
Power guineu 1[1]
Power guineu 1[1]Power guineu 1[1]
Power guineu 1[1]
 
Slide
SlideSlide
Slide
 
What do We Know about Drag Kings?
What do We Know about Drag Kings?What do We Know about Drag Kings?
What do We Know about Drag Kings?
 
Financial terms
Financial terms Financial terms
Financial terms
 
La moral kantiana( què he de fer
La moral kantiana( què he de ferLa moral kantiana( què he de fer
La moral kantiana( què he de fer
 
28th Social Work Day at the United Nations 2011
28th Social Work Day at the  United Nations 201128th Social Work Day at the  United Nations 2011
28th Social Work Day at the United Nations 2011
 
User eXitus - Nenechte sve navstevniky odchazet BarCamp 2011 Ostrava
User eXitus - Nenechte sve navstevniky odchazet BarCamp 2011 OstravaUser eXitus - Nenechte sve navstevniky odchazet BarCamp 2011 Ostrava
User eXitus - Nenechte sve navstevniky odchazet BarCamp 2011 Ostrava
 
Carlos lenin estrada
Carlos lenin estradaCarlos lenin estrada
Carlos lenin estrada
 
Real Estate Impacts of Alternative Energy Technology
Real Estate Impacts of Alternative Energy TechnologyReal Estate Impacts of Alternative Energy Technology
Real Estate Impacts of Alternative Energy Technology
 
UI Prototype
UI PrototypeUI Prototype
UI Prototype
 
Salem Area Market Statistics Q1 2011
Salem Area Market Statistics Q1 2011Salem Area Market Statistics Q1 2011
Salem Area Market Statistics Q1 2011
 
Victoriamolinatp1 110601071455-phpapp01
Victoriamolinatp1 110601071455-phpapp01Victoriamolinatp1 110601071455-phpapp01
Victoriamolinatp1 110601071455-phpapp01
 
Prsentation eng 101
Prsentation  eng 101Prsentation  eng 101
Prsentation eng 101
 
What is your earliest memory
What is your earliest memoryWhat is your earliest memory
What is your earliest memory
 

Similar to Kavita Ganesan & Michael Subotin Present Segmentation of Unstructured Clinical Notes

Shock-case-study-8.21.20.pptx
Shock-case-study-8.21.20.pptxShock-case-study-8.21.20.pptx
Shock-case-study-8.21.20.pptxrishitagarg8
 
Phtls prep-packet-2-day
Phtls prep-packet-2-dayPhtls prep-packet-2-day
Phtls prep-packet-2-daynuno marques
 
Documentation 101 - BMH/Tele
Documentation 101 - BMH/TeleDocumentation 101 - BMH/Tele
Documentation 101 - BMH/TeleTeleClinEd
 
Patient selection and functional outcomes by Dr Ashutosh Hardikar
Patient selection and functional outcomes by Dr Ashutosh HardikarPatient selection and functional outcomes by Dr Ashutosh Hardikar
Patient selection and functional outcomes by Dr Ashutosh HardikarCICM 2019 Annual Scientific Meeting
 
BCC4: Michael Parr on ICU - Surviving Trauma Guidelines
BCC4: Michael Parr on ICU - Surviving Trauma GuidelinesBCC4: Michael Parr on ICU - Surviving Trauma Guidelines
BCC4: Michael Parr on ICU - Surviving Trauma GuidelinesSMACC Conference
 
Professor Richard Beale @ MRF's Meningitis & Septicaemia in Children & Adults...
Professor Richard Beale @ MRF's Meningitis & Septicaemia in Children & Adults...Professor Richard Beale @ MRF's Meningitis & Septicaemia in Children & Adults...
Professor Richard Beale @ MRF's Meningitis & Septicaemia in Children & Adults...Meningitis Research Foundation
 
minimallyinvasivecardiacsurgery-130110015719-phpapp02 (1) (1).pptx
minimallyinvasivecardiacsurgery-130110015719-phpapp02 (1) (1).pptxminimallyinvasivecardiacsurgery-130110015719-phpapp02 (1) (1).pptx
minimallyinvasivecardiacsurgery-130110015719-phpapp02 (1) (1).pptxMubasshirBabar
 
Nrs 410 topic 1 mandatory discussion question
Nrs 410 topic 1 mandatory discussion questionNrs 410 topic 1 mandatory discussion question
Nrs 410 topic 1 mandatory discussion questionagathachristie189
 
Clinical materials for medicine III
Clinical materials for medicine IIIClinical materials for medicine III
Clinical materials for medicine IIIDr Ajith Karawita
 
STEMI Training
STEMI TrainingSTEMI Training
STEMI Trainingcm6157
 
GCSC Stroke Symposium 2022-COMBINED
GCSC Stroke Symposium 2022-COMBINEDGCSC Stroke Symposium 2022-COMBINED
GCSC Stroke Symposium 2022-COMBINEDHollandAdhaus
 
Prof. Todor (Ted) A. Popov - 6th Clinical Research Conference
Prof. Todor (Ted) A. Popov - 6th Clinical Research ConferenceProf. Todor (Ted) A. Popov - 6th Clinical Research Conference
Prof. Todor (Ted) A. Popov - 6th Clinical Research ConferenceStarttech Ventures
 
Remote Ischaemic Conditioning: A Paper Review & Uses in Paramedic Practice
Remote Ischaemic Conditioning: A Paper Review & Uses in Paramedic PracticeRemote Ischaemic Conditioning: A Paper Review & Uses in Paramedic Practice
Remote Ischaemic Conditioning: A Paper Review & Uses in Paramedic Practicebgander23
 
Lessons from CRITICOEN.pptx
Lessons from CRITICOEN.pptxLessons from CRITICOEN.pptx
Lessons from CRITICOEN.pptxPradeep Pande
 
Spontaneous pneumothorax: Are we treating the patient or the xray?
Spontaneous pneumothorax: Are we treating the patient or the xray?Spontaneous pneumothorax: Are we treating the patient or the xray?
Spontaneous pneumothorax: Are we treating the patient or the xray?kellyam18
 

Similar to Kavita Ganesan & Michael Subotin Present Segmentation of Unstructured Clinical Notes (20)

Shock-case-study-8.21.20.pptx
Shock-case-study-8.21.20.pptxShock-case-study-8.21.20.pptx
Shock-case-study-8.21.20.pptx
 
6 minute walk test
6 minute walk test6 minute walk test
6 minute walk test
 
Phtls prep-packet-2-day
Phtls prep-packet-2-dayPhtls prep-packet-2-day
Phtls prep-packet-2-day
 
Documentation 101 - BMH/Tele
Documentation 101 - BMH/TeleDocumentation 101 - BMH/Tele
Documentation 101 - BMH/Tele
 
Surgery revision
Surgery revisionSurgery revision
Surgery revision
 
Patient selection and functional outcomes by Dr Ashutosh Hardikar
Patient selection and functional outcomes by Dr Ashutosh HardikarPatient selection and functional outcomes by Dr Ashutosh Hardikar
Patient selection and functional outcomes by Dr Ashutosh Hardikar
 
BCC4: Michael Parr on ICU - Surviving Trauma Guidelines
BCC4: Michael Parr on ICU - Surviving Trauma GuidelinesBCC4: Michael Parr on ICU - Surviving Trauma Guidelines
BCC4: Michael Parr on ICU - Surviving Trauma Guidelines
 
Professor Richard Beale @ MRF's Meningitis & Septicaemia in Children & Adults...
Professor Richard Beale @ MRF's Meningitis & Septicaemia in Children & Adults...Professor Richard Beale @ MRF's Meningitis & Septicaemia in Children & Adults...
Professor Richard Beale @ MRF's Meningitis & Septicaemia in Children & Adults...
 
minimallyinvasivecardiacsurgery-130110015719-phpapp02 (1) (1).pptx
minimallyinvasivecardiacsurgery-130110015719-phpapp02 (1) (1).pptxminimallyinvasivecardiacsurgery-130110015719-phpapp02 (1) (1).pptx
minimallyinvasivecardiacsurgery-130110015719-phpapp02 (1) (1).pptx
 
Nrs 410 topic 1 mandatory discussion question
Nrs 410 topic 1 mandatory discussion questionNrs 410 topic 1 mandatory discussion question
Nrs 410 topic 1 mandatory discussion question
 
Clinical materials for medicine III
Clinical materials for medicine IIIClinical materials for medicine III
Clinical materials for medicine III
 
STEMI Training
STEMI TrainingSTEMI Training
STEMI Training
 
GCSC Stroke Symposium 2022-COMBINED
GCSC Stroke Symposium 2022-COMBINEDGCSC Stroke Symposium 2022-COMBINED
GCSC Stroke Symposium 2022-COMBINED
 
Covid 19 (1)
Covid 19 (1)Covid 19 (1)
Covid 19 (1)
 
Covid 19 (1)
Covid 19 (1)Covid 19 (1)
Covid 19 (1)
 
Covid 19 (1)
Covid 19 (1)Covid 19 (1)
Covid 19 (1)
 
Prof. Todor (Ted) A. Popov - 6th Clinical Research Conference
Prof. Todor (Ted) A. Popov - 6th Clinical Research ConferenceProf. Todor (Ted) A. Popov - 6th Clinical Research Conference
Prof. Todor (Ted) A. Popov - 6th Clinical Research Conference
 
Remote Ischaemic Conditioning: A Paper Review & Uses in Paramedic Practice
Remote Ischaemic Conditioning: A Paper Review & Uses in Paramedic PracticeRemote Ischaemic Conditioning: A Paper Review & Uses in Paramedic Practice
Remote Ischaemic Conditioning: A Paper Review & Uses in Paramedic Practice
 
Lessons from CRITICOEN.pptx
Lessons from CRITICOEN.pptxLessons from CRITICOEN.pptx
Lessons from CRITICOEN.pptx
 
Spontaneous pneumothorax: Are we treating the patient or the xray?
Spontaneous pneumothorax: Are we treating the patient or the xray?Spontaneous pneumothorax: Are we treating the patient or the xray?
Spontaneous pneumothorax: Are we treating the patient or the xray?
 

More from Kavita Ganesan

Comparison between cbow, skip gram and skip-gram with subword information (1)
Comparison between cbow, skip gram and skip-gram with subword information (1)Comparison between cbow, skip gram and skip-gram with subword information (1)
Comparison between cbow, skip gram and skip-gram with subword information (1)Kavita Ganesan
 
Comparison between cbow, skip gram and skip-gram with subword information
Comparison between cbow, skip gram and skip-gram with subword informationComparison between cbow, skip gram and skip-gram with subword information
Comparison between cbow, skip gram and skip-gram with subword informationKavita Ganesan
 
Statistical Methods for Integration and Analysis of Online Opinionated Text...
Statistical Methods for Integration and Analysis of Online Opinionated Text...Statistical Methods for Integration and Analysis of Online Opinionated Text...
Statistical Methods for Integration and Analysis of Online Opinionated Text...Kavita Ganesan
 
In situ evaluation of entity retrieval and opinion summarization
In situ evaluation of entity retrieval and opinion summarizationIn situ evaluation of entity retrieval and opinion summarization
In situ evaluation of entity retrieval and opinion summarizationKavita Ganesan
 
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit Kavita Ganesan
 
Very Small Tutorial on Terrier 3.0 Retrieval Toolkit
Very Small Tutorial on Terrier 3.0 Retrieval ToolkitVery Small Tutorial on Terrier 3.0 Retrieval Toolkit
Very Small Tutorial on Terrier 3.0 Retrieval ToolkitKavita Ganesan
 
Opinion-Based Entity Ranking
Opinion-Based Entity RankingOpinion-Based Entity Ranking
Opinion-Based Entity RankingKavita Ganesan
 

More from Kavita Ganesan (7)

Comparison between cbow, skip gram and skip-gram with subword information (1)
Comparison between cbow, skip gram and skip-gram with subword information (1)Comparison between cbow, skip gram and skip-gram with subword information (1)
Comparison between cbow, skip gram and skip-gram with subword information (1)
 
Comparison between cbow, skip gram and skip-gram with subword information
Comparison between cbow, skip gram and skip-gram with subword informationComparison between cbow, skip gram and skip-gram with subword information
Comparison between cbow, skip gram and skip-gram with subword information
 
Statistical Methods for Integration and Analysis of Online Opinionated Text...
Statistical Methods for Integration and Analysis of Online Opinionated Text...Statistical Methods for Integration and Analysis of Online Opinionated Text...
Statistical Methods for Integration and Analysis of Online Opinionated Text...
 
In situ evaluation of entity retrieval and opinion summarization
In situ evaluation of entity retrieval and opinion summarizationIn situ evaluation of entity retrieval and opinion summarization
In situ evaluation of entity retrieval and opinion summarization
 
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
 
Very Small Tutorial on Terrier 3.0 Retrieval Toolkit
Very Small Tutorial on Terrier 3.0 Retrieval ToolkitVery Small Tutorial on Terrier 3.0 Retrieval Toolkit
Very Small Tutorial on Terrier 3.0 Retrieval Toolkit
 
Opinion-Based Entity Ranking
Opinion-Based Entity RankingOpinion-Based Entity Ranking
Opinion-Based Entity Ranking
 

Recently uploaded

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 

Kavita Ganesan & Michael Subotin Present Segmentation of Unstructured Clinical Notes

  • 1. Kavita Ganesan & Michael Subotin Presented at: 2014 Conference on IEEE Big Data
  • 2. All sorts of notes types!  Admit notes ◦ documenting why patient is being admitted ◦ baseline status, etc.  Progress notes ◦ progress during course of hospitalization  Discharge notes ◦ conclusion of a hospital stay or series of treatments  Others ◦ Operative notes ◦ Procedure notes ◦ Delivery notes ◦ Emergency Department notes, etc
  • 3. PRIMARY CARE PHYSICIAN: Dr. XXXXX XXXXXXXX. CHIEF COMPLAINT: Injured right little toe. HISTORY OF PRESENT ILLNESS: This is a 63-year-old male with a past medical history of multiple myeloma who presents today after hitting his fifth toe of the right foot on a wood panel yesterday…… Review of Systems: CONSTITUTIONAL: No fever, chills, or weight loss. RESPIRATORY: No cough, shortness of breath, or wheezing. CARDIOVASCULAR: No chest pain, chest pressure, or palpitations. ............... PAST MEDICAL HISTORY Multiple myeloma, peripheral neuropathy, hypertension.. PAST SURGICAL HISTORY:- Stem cell transplant. SOCIAL HISTORY The patient formerly smoked tobacco; however, quit within the last 10 years. FAMILY HISTORY: Hypertension. ALLERGIES: ASPIRIN. ……… Purpose of visit Patient’s current condition in narrative form Ongoing issues, issues in the past Information on allergies
  • 4. PRIMARY CARE PHYSICIAN: Dr. XXXXX XXXXXXXX. CHIEF COMPLAINT: Injured right little toe. HISTORY OF PRESENT ILLNESS: This is a 63-year-old male with a past medical history of multiple myeloma who presents today after hitting his fifth toe of the right foot on a wood panel yesterday…… Review of Systems: CONSTITUTIONAL: No fever, chills, or weight loss. RESPIRATORY: No cough, shortness of breath, or wheezing. CARDIOVASCULAR: No chest pain, chest pressure, or palpitations. ............... PAST MEDICAL HISTORY Multiple myeloma, peripheral neuropathy, hypertension.. PAST SURGICAL HISTORY:- Stem cell transplant. SOCIAL HISTORY The patient formerly smoked tobacco; however, quit within the last 10 years. This is how most notes look: • some longer, some shorter • different set of headers, etc FAMILY HISTORY: Hypertension. ALLERGIES: ASPIRIN. ……… Purpose of visit Patient’s current condition in narrative form Ongoing issues, issues in the past Information on allergies
  • 5. PRIMARY CARE PHYSICIAN: Dr. XXXXX XXXXXXXX. PRIMARY CARE PHYSICIAN: Dr. XXXXX XXXXXXXX. PRIMARY CARE PHYSICIAN: Dr. XXXXX XXXXXXXX. CHIEF COMPLAIN: Injured right little toe. CHIEF COMPLAIN: Injured right little toe. CHIEF COMPLAINT: Injured right little toe. HISTORY OF PRESENT ILLNESS: This is a 63-year-old male with a past medical history of… HISTORY OF PRESENT ILLNESS: This is a 63-year-old male with a past medical history of… HISTORY OF PRESENT ILLNESS: This is a 63-year-old male with a past medical history of… Review of Systems: CONSTITUTIONAL: No fever, chills, or weight loss. CARDIOVASCULAR: No chest pain, chest pressure, or palpitations. ............... Review of Systems: CONSTITUTIONAL: No fever, chills, or weight loss. CARDIOVASCULAR: No chest pain, chest pressure, or palpitations. ............... ……… Review of Systems: CONSTITUTIONAL: No fever, chills, or weight loss. CARDIOVASCULAR: No chest pain, chest pressure, or palpitations. ............... ……… ………  Very unstructured ◦ formatting cues  inconsistent ◦ varies: across physicians, notes, hospitals  Hard to analyze specific sections ◦ E.g. analyze allergies patient population ◦ Need to segment notes to extract all allergy info.
  • 6. ◦ Information collected vary from note types to note types  Ex. info on progress notes vs. admit note ◦ Contents & formatting can vary from hospital to hospital  Even within the same organization – E.g. Kaiser ◦ Contents & formatting vary between physicians  Different styles, speed of typing, etc.
  • 7.  If you are looking at a single note type, from a single hospital - then maybe  Not suitable as a general segmentation approach:  Can easily break: ◦ on unseen note types and minor format variations ◦ Example:  regex based on all caps  regex based on seen headers only
  • 8.  Several works have explored supervised methods to segmenting clinical notes [Cho et al. 2003, tepper et al. 2012, apostolva et al. 2009]  Problem: methods not general! ◦ Cho et al. 2003: One model for each type of note  20 note types  20 models!  Not practical  maintain each model ◦ Tepper et al. 2012: Model had low adaptability to unseen documents  features used, training data used, etc.
  • 9.  General segmentation approach for clinical texts  Requirements: ◦ Single model/approach for most note types ◦ Discount extreme non-standard formatting e.g. tabular format  Segment: ◦ Header ◦ Top level sections ◦ Footer
  • 10. PRIMARY CARE PHYSICIAN: Dr. XXXXX XXXXXXXX. CHIEF COMPLAINT: Injured right little toe. HISTORY OF PRESENT ILLNESS: This is a 63-year-old male with a past medical history of multiple myeloma who presents today after hitting his fifth toe of the right foot on a wood panel yesterday…… Review of Systems: CONSTITUTIONAL: No fever, chills, or weight loss. RESPIRATORY: No cough, shortness of breath, or wheezing. CARDIOVASCULAR: No chest pain, chest pressure, or palpitations. ............... PAST MEDICAL HISTORY Multiple myeloma, peripheral neuropathy, hypertension.. PAST SURGICAL HISTORY:- Stem cell transplant. SOCIAL HISTORY The patient formerly smoked tobacco; however, quit within the last 10 years. FAMILY HISTORY: Hypertension. ALLERGIES: ASPIRIN. ……… Header Top-level section Top-level section Top-level section Top-level section Top-level section Top-level section Top-level section
  • 11.  Supervised approach using L1-Logistic Regression with a constraint combination approach  Idea: scan each line in a clinical document and label as: ◦ BeginHeader ◦ ContHeader ◦ BeginSection ◦ ContSection ◦ Footer  Labels are predicted with certain confidence  But, problem using line-wise predictions as is: ◦ Label sequences may not make sense ◦ E.g. There maybe a BeginHeader after a BeginSection  incorrect
  • 12.  Post-processing: enforce sequence combination rules: ◦ First line of document: BeginHeader or BeginSection ◦ BeginHeader cannot come right after BeginHeader or ContHeader ◦ ContHeader must come after BeginHeader or ContHeader ◦ ContSection must come after BeginSection or ContSection ◦ Footer cannot come right after BeginHeader or ContHeader  Rules applied after all lines in document labeled ◦ Applied to consecutive label pairs ◦ Computed efficiently: Viterbi algorithm
  • 13. Inpatient Outpatient • Notes from 12 different enterprises • Some large enterprises • All sorts of note types • Some noisy sectioning, some clean • 100 radiology notes • Fairly clean sections • One hospital • All sorts of note types • Fairly well sectioned • 35, 000 notes in total • 2000 randomly sampled notes (inpatient) • 100 radiology notes • Fairly clean sections
  • 14.  Emphasis on training data  Variation in training data ◦ Use different note types for training ◦ Intuition: help model generalize well  Sample training data: ◦ Instead of using all training data from 2100 notes ◦ Generated subsets of training data with varying size and cross-validate on test sets ◦ Intuition: allows to pick the best model  Best model only used < 700 notes (out of 2100)
  • 15.  5 test sets ◦ 4/5 test set from hospitals not in train set  true estimate of accuracy ◦ Covers both inpatient and outpatient notes ◦ Covers different note types ◦ ~12,500 test notes  Primary evaluation metric: line-wise accuracy ◦ percentage of correctly predicted line labels
  • 16. 1st model: limited variety (hp + discharge) Train set 3-folded cross validation Unseen test accuracy Inp1HospB (300 - limited) 96.70% 67.00% Inp3HospD (300 - varied) 96.58% 88.23% 2nd model: variety (11 types - hp, ds, pn…) Model with variety: higher accuracy on unseen test set 3-folded cross-validation accuracy: high in both Important to have variety in training notes in building general segmentation model
  • 17. Accuracy consistently > 90% across enterprises Client/Data In/Outpatient # Test Docs Accuracy 1. Inp1HospB In 300 92.58% 2. Inp2HospC In 1000 93.29% 3. Inp3HospD In 300 95.81% 4. Rad1MixedHosps Out 9000 92.45% 5. Rad2HospA Out 1902 93.67% Average 93.56% • Average accuracy: 93.56% • Covers inpatient/outpatient Single model: But, performs well across enterprises
  • 18. Document Type Accuracy 1. History and Physical 95.70% 2. Physician Clinicals 93.10% 3. Discharge Summary 94.00% 4. Consult Note 94.60% 5. Short Stay Summary 94.60% 6. Operative Note 92.20% 7. Progress Note 87.80% 8. Cardiac Cath Report 85.40% 9. Procedure Note 83.60% • Model performs well across note types • Lowest performance: procedure notes low recall on segmenting “technique” sections Performs very well > 90% Reasonable.. > 80% Accuracy Breakdown for Inp2HospC
  • 19. 94.00% 93.00% 92.00% 91.00% 90.00% 89.00% 88.00% 87.00% 86.00% # Notes vs. Accuracy No benefit with more notes 0 500 1000 1500 2000 Accuracy # Training Notes Avg. accurracy peaks @500 notes on all test sets No benefit with more notes No need for big data for a general model. We need good data from all that big data!
  • 20.  Unigrams – of each line (LineUnigram)  Relative position of line in document (PosInDoc) ◦ Top, Middle, Bottom  Known Header features (KnownHeader) ◦ Find potential headers using repository of seen headers ◦ Seen headers can have canonical type E.g. Past Medical History, Previous Med History “PAST_MEDICAL_HISTORY” ◦ If potential headers found, we include features:  Canonical type  Unigram & Char n-gram of potential header  Caps/colon info – mixed case, all caps, lowercase  Length of potential header
  • 21. Feature Set Avg. Accuracy Improvement LineUnigram 85.55% LineUnigram+PosInDoc 88.62% +3.46% LineUnigram+PosInDoc+KnownHeader 93.10% +4.81%
  • 22.  Explored: ◦ Supervised approach to building a very general segmentation model for clinical texts  Evaluation showed: ◦ Model works well on notes across enterprises ◦ Model works across note types  Key to effectiveness: ◦ Variation in training data –all sorts of note types ◦ Training data selection strategy – sample and cross-validate ◦ Feature set – not explored in existing works
  • 23. Contact: Kavita Ganesan ganesan.kavita@gmail.com www.kavita-ganesan.com www.text-analytics101.com