More Related Content Similar to Healthcare NLP - Four Essentials to Make the Most of Unstructured Data (20) More from Health Catalyst (20) Healthcare NLP - Four Essentials to Make the Most of Unstructured Data2. © 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Essentials for Natural Language Processing
The healthcare industry has recently realized a
sharp increase in interest in natural language
processing (NLP).
The unstructured clinical record contains a
wealth of insight into patients that isn’t available
in the structured record.
3. © 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Essentials for Natural Language Processing
Additionally, advances in data science
and AI have introduced new techniques
for analyzing text, broadening and
deepening understanding of the patient.
Any organization seeking to leverage
their data to improve outcomes, reduce
cost, and further medical research
needs to consider the wealth of insight
stored in text and how they will create
value from that data using NLP.
4. © 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Essentials for Natural Language Processing
The first step in using NLP can be the most
difficult, and many organizations never meet
the initial challenge of making the data
available for analysis.
NLP requires that data engineers transform
unstructured text into a usable format (see
need to know aspect #2 below) and in a
location where the NLP technology can make
use of it.
This NLP pre-requisite can be a complex
process, involving larger data sets and
different technologies than many data
engineers are familiar with.
5. © 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Essentials for Natural Language Processing
This presentation outlines four need-to-know
ways to meet and overcome the challenges
of making unstructured text available for
advanced NLP analysis.
It’s focused on the challenges and skillsets
required to build a solid foundation for text
analytics.
6. © 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Understanding Free Text Is the Foundation
for Healthcare NLP
In my role of leading NLP efforts for
healthcare analytics vendor, I recently
worked on a patient safety surveillance
tool that helps health systems monitor for
potential adverse events.
For example, administering Narcan to
reverse the effects of a patient who
doesn’t respond well to a pain killer or
hospital-acquired pressure ulcers.
While the administration of Narcan is
commonly documented in structured data,
pressure ulcers are often found in
unstructured nursing notes.
7. © 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Understanding Free Text Is the Foundation
for Healthcare NLP
To get the necessary data to improve
patient safety, we needed to leverage the
free text of nursing notes.
We found that five of the 33 adverse
events were primarily documented in
unstructured text.
To access and leverage the text data in
the patient safety tool, we needed NLP.
We needed more, however, than the right
tools for NLP itself to use the rich
information unstructured text holds.
8. © 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Need-to-Know Aspects of Working
with Unstructured Text
To effectively build a data pipeline for text,
and navigate unfamiliar challenges, data
engineers must understand four key points:
1. Text Is Bigger and More Complex
2. Text Comes from Different Data Sources
3. Text Is Stored in Multiple Areas
4. Text User Documentation Patterns Matter
9. © 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Need-to-Know Aspects of Working
with Unstructured Text
1: Text Is Bigger and More Complex
An average EMR record—such as a medication,
allergy, or diagnosis, etc.—runs between 50 to
150 bytes, or 50 to 150 MB per million records.
On the other hand, the average clinical note
record is approximately 150 times as large.
With large health systems storing hundreds of
millions of note records, this scale introduces
data transfer and storage complexities that
many data engineers won’t have previously
confronted.
10. © 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Need-to-Know Aspects of Working
with Unstructured Text
2: Text Comes from Different Data Sources
Experienced data professionals know well that data sources vary widely.
The data model for one vendor is different from another (e.g., from one EMR
to another). With text, the stakes are even higher. A typical data pipeline for
structured data (Figure 1) from an EMR is less complex than an unstructured
data pipeline.
Figure 1: A typical structured data pipeline
11. © 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Need-to-Know Aspects of Working
with Unstructured Text
2: Text Comes from Different Data Sources
Structured data typically involves working with
just SQL and supporting tools (e.g., SSIS or
Informatica).
On the other hand, working with unstructured
text (Figure 2) involves a variety of tools
outside the typical data engineer’s skillset—
including programming languages such as C#
or Python and search engines such as
Elasticsearch and SOLR.
On top of this, the transformations required for
text vary significantly based on how it’s stored
in the source.
12. © 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Need-to-Know Aspects of Working
with Unstructured Text
2: Text Comes from Different Data Sources
Figure 2: An analytics vendor’s unstructured text pipeline for three EMR vendors
13. © 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Need-to-Know Aspects of Working
with Unstructured Text
3: Text Is Stored in Multiple Areas
It’s easy to think of text as a monolith—that all
the text in a system lives in one place.
Where text is stored, however, depends on the
type of text and the system in use.
For example, clinical notes, radiology
reports, and pathology reports may exist
in two or three different sets of tables,
depending on the source system.
14. © 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Need-to-Know Aspects of Working
with Unstructured Text
3: Text Is Stored in Multiple Areas
Location will also vary based on the specific
implementation of that system.
With one vendor’s system, radiology reports
may be in the same table as clinical notes
or in the same tables as results, depending
on the workflow decisions behind the
configuration of the organization’s EMR.
One EMR vendor stores shorter text results
as a separate table from notes and reports,
while another will put results from the
tasking/messaging engine in another table.
15. © 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Need-to-Know Aspects of Working
with Unstructured Text
4: Text User Documentation Patterns Matter
Understanding how users document data
matters. For example, during a recent project to
identify adverse events for patients, we searched
for documentation of in-hospital falls.
The patient safety expert I was working with, a
nurse, had always seen patient falls documented
in nursing progress notes, but we found very
few mentions of any falls in those notes.
16. © 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Need-to-Know Aspects of Working
with Unstructured Text
4: Text User Documentation Patterns Matter
After discussions with the health information
management group and nurses at the health
system, we learned that it used a structured-only
documentation methodology for nursing.
The best source for documentation of in-hospital
falls was the physician progress notes.
17. © 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Four Need-to-Know Aspects of Working
with Unstructured Text
4: Text User Documentation Patterns Matter
This insight made a small difference in how
our data scientist searched for falls data, but
it made a significant difference in the results.
Filtering which notes went into the NLP
algorithm improved accuracy, particularly
the sensitivity of the algorithm.
18. © 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Understanding the Nuances of Text Makes
Successful NLP Possible
Working with text data is different than structured data.
Keep in mind this article’s four lessons:
Unstructured text records are significantly
larger than structured records.
Data engineers often need to preprocess
text before running NLP, which often
requires tools outside normal data pipelines.
Text may be stored in different areas of
source systems or EMRs.
Each organization may document text differently.
>
>
>
>
19. © 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Understanding the Nuances of Text Makes
Successful NLP Possible
Data engineers who want to meet the challenges
of text and unlock its rich information will benefit
by starting on a focused project, rather than
taking on too many text tasks at once (a bottom-
up versus a top-down approach).
I recommend starting with a great use case that
aligns with organizational goals.
Using the patient safety scenario from
earlier, if an organization is focused on
improving patient safety, it may find that safety
events are documented in unstructured text,
limiting its ability to identify patient harm.
20. © 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Understanding the Nuances of Text Makes
Successful NLP Possible
Starting by pulling text for one type of safety
event (e.g., deep vein thromboses) can help
data engineers form a process.
They can then replicate this process for other
use cases and start pulling the text data and
using NLP tools to reduce patient harm and
transform healthcare more broadly.
21. © 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
For more information:
“This book is a fantastic piece of work”
– Robert Lindeman MD, FAAP, Chief Physician Quality Officer
22. © 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
More about this topic
Link to original article for a more in-depth discussion.
Healthcare NLP: Four Essentials to Make the Most of Unstructured Data
How Healthcare Text Analytics and Machine Learning Work Together to Improve Patient Outcomes
Mike Dow, Technical Director; Levi Thatcher, VP, Data Science
Text Analytics in Healthcare—Two Promising Frameworks that Meet Its Unique Demands
Mike Dow, Technical Director
Regenstrief Institute and Health Catalyst Team to Reveal Hidden Meaning in Clinical Data for Better
Patient Care – Health Catalyst News
The Top Three Recommendations for Successfully Deploying Predictive Analytics in Healthcare
Eric Just, Senior VP of Product Development
Three Approaches to Predictive Analytics in Healthcare
Health Catalyst Insight
23. © 2018 Health Catalyst
Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
Other Clinical Quality Improvement Resources
Click to read additional information at www.healthcatalyst.com
Mike learned of the value of data early in his career. While working at a major EMR vendor in
2001, he led a project to help identify patients who were affected by drug recalls. He continued
his work in various roles at Allscripts, including reporting, data exchange and systems
architecture. From 2006 to 2015, Mike led the technology group at Galen Healthcare Solutions.
While the company and his team grew by 50% annually during this time, they became known for
excellence, earning awards like Best in KLAS for Technical Services and a Best Place to Work by Modern
Healthcare. Mike joined Health Catalyst in 2015 to help with strategic client implementations. He has since
joined the product development team to lead Health Catalyst’s text analytics initiative, making information
previously locked in text notes available to Health Catalyst’s apps and data architects.
Mike Dow