More Related Content
Similar to Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic (20)
More from DataWorks Summit (20)
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
- 1. ©2015 MFMER | slide-1
Big Data Platform Processes Daily
Healthcare Data for Clinic Use
at Mayo Clinic
Dequan Chen, Ph.D.
Mayo Clinic Big Data Core Team
chen.dequan@mayo.edu; 507-250-7400
San Jose Convention Center
June 9, 2015
- 2. ©2015 MFMER | slide-2
Outlines
• Healthcare Data Challenge at Mayo Clinic
HL7 Data for Integrated Enterprise Usage
Data Volumes, Types, and Processing Velocity
Incapability of Existing RDBMS/Web Systems
• Big Data Implementation for Enterprise-Level
Clinical and Non-Clinical Usage
• Big Data Implementation in Support of
Colorectal Surgery Applications
• On-going & Future Direction
• Conclusion
- 3. ©2015 MFMER | slide-3
Healthcare Data Challenge at Mayo Clinic
- 4. ©2015 MFMER | slide-4
Healthcare Data Challenge at Mayo Clinic
HL7 Data for Integrated Enterprise Usage
• World’s largest integrated not-for-profit
healthcare system – > 70 hospitals and clinics
Enterprise Core Value: The Needs of the
Patient Come First
• Mayo Clinic Rochester, Minn. recognized as the
top hospital in the nation for 2014-2015 by U.S.
News & World Report
• Provides care for > 1m (1,317,900 in 2014)
patients from all 50 states & > 150 countries
annually
- 5. ©2015 MFMER | slide-5
Healthcare Data Challenge at Mayo Clinic
HL7 Data for Integrated Enterprise Usage
• Generates large amounts of EHR Data
Structured
Semi-Structured
Unstructured
• HL7 messages – mix of semi- and un-structured
EHR data
Enterprise-level clinical usage (diagnosis, treatment,
prevention, or clinical reporting)
Enterprise-level non-clinical usage (research,
business intelligence, or health information exchange)
- 6. ©2015 MFMER | slide-6
Healthcare Data Challenge at Mayo Clinic
HL7 Data for Integrated Enterprise Usage
• HL7 Message Example: (Source: http://www.priorityhealth.com)
MSH|^~&|XXXX|C|PRIORITYHEALTH|PRIORITYHEALTH|20080511103530||ORU
^R01|Q335939501T337311002|P|2.3|||
PID|1||94000000000^^^Priority
Health||LASTNAME^FIRSTNAME^INIT||19460101|M|||||
PD1|1|||1234567890^PCPLAST^PCPFIRST^M^^^^^NPI|
OBR|1||185L29839X64489JLPF~X64489^ACC_NUM|JLPF^Lipid Panel -
C||||||||||||1694^DOCLAST^DOCFIRST^^MD||||||20080511103529|||
OBX|1|NM|JHDL^HDL Cholesterol
(CAD)|1|62|CD:289^mg/dL|>40^>40|""||""|F|||20080511103500|||^^^""|
OBX|2|NM|JTRIG^Triglyceride (CAD)|1|72|CD:289^mg/dL|35-
150^35^150|""||""|F|||20080511103500|||^^^""|
OBX|3|NM|JVLDL^VLDL-C (calc -
CAD)|1|14|CD:289^mg/dL||""||""|F|||20080511103500|||^^^""|
OBX|4|NM|JLDL^LDL-C (calc - CAD)|1|134|CD:289^mg/dL|0-
100^0^100|H||""|F|||20080511103500|||^^^""|
OBX|5|NM|JCHO^Cholesterol (CAD)|1|210|CD:289^mg/dL|90-
200^90^200|H||""|F|||20080511103500|||^^^""|
…
- 7. ©2015 MFMER | slide-7
Healthcare Data Challenge at Mayo Clinic
Data Volumes
• Daily enterprise-wide volume of real-time HL7
message data (msgs/day)
• Large number (~1.83 billion prior to 12-31-
2014) of historical HL7 message data at Mayo
Clinic
- 8. ©2015 MFMER | slide-8
Healthcare Data Challenge at Mayo Clinic
Data Types, and Processing Velocity
• 60+ document types of HL7 messages
Each document type generated by an individual
healthcare source system
Ex: Clinical Notes (cnote), Surgical Notes (opnote),
Radiology, Pathology, Health_Quest, ECG/EKG…
• Capability of fast processing (storing, analyzing,
retrieving) of all types of HL7 data
Real-time data and/or historical data
Seconds - ER, ICU and Surgery Healthcare
Minutes - Internal or Prevention Medicine
- 9. ©2015 MFMER | slide-9
Healthcare Data Challenge at Mayo Clinic
Challenges of Existing RDBMS/Web Systems
• For enterprise-level clinical and non-clinical
usage, the existing multiple RDBMS-based
system implementations cannot achieve:
All Real-time HL7 Messages – synchronously
stored, analyzed and retrieved
All Real-time and/or Historical HL7 Messages –
quickly analyzed and retrieved
Fast Free-Text Search on any medical terms
Easy & Lower-cost scalability (scale-up & scale-out)
- 10. ©2015 MFMER | slide-10
Big Data Implementation for Enterprise-
Level Clinical and Non-Clinical Usage
- 11. ©2015 MFMER | slide-11
Mayo Clinic Big Data Platform
MC BigData Appliance (V1.0)
• Started implementation in Jan 2014
• Purchased from Teradata
• SUSE Linux Enterprise Server 11 (SLES 11)
• Integration and Production Hadoop clusters
• Each Hadoop cluster:
2 edge nodes, 2 master nodes, 6 data nodes
Hadoop Stack – TDH1.3.2: Teradata-certified and
modified HDP1.3.2
HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Flume, Hcatalog,
WebHcat, Zookeeper, Ganglia, Nagios, Oozie, Hue, and Ambari
- 12. ©2015 MFMER | slide-12
Mayo Clinic Big Data Platform
MC BigData Appliance (V1.0)
• Each Hadoop cluster (cont’d):
Built-in PostgreSQL Database Instance
Apache Storm (version 0.9.1)
ElasticSearch (ES) (version 1.0.0) Cluster
Instances of Home-Developed Storm Topology –
MayoTopology (One instance for one doc-type of
HL7 messages)
o1 spout
o 2+ bolts
- 13. ©2015 MFMER | slide-13
Data Flow Architecture
Persisting Mayo Clinic Healthcare Data Into HDFS and
ElasticSearch Index inside MC BigData Appliance(V1.0)
- 14. ©2015 MFMER | slide-14
Testing (Measurement) Architecture
Measurement of Data Persisting Capacity of MC BigData
Appliance (V1.0)
- 15. ©2015 MFMER | slide-15
HL7 Data Processing Capability
HDFS and ES Index Data Persisting Capacity of MC
BigData Appliance (V1.0)
• Daily HL7-Persisting Capacity
• 62 ± 4 million HL7 messages/day
• No statistically significant change occurs when more
types of HL7 Messages were included
• ~20-50x more capacity than current daily max
volume of all internal HL7 messages
- 16. ©2015 MFMER | slide-16
HL7 Data Processing Capability
Ultra-Fast Free-Text Search Capacity by ES Index of MC
BigData Appliance (V1.0)
• Data on ES Index
HL7-V2-messages-derived-JSON-documents
• Data Set size vs. Query speed (querying “Pain”)
• ES Index: ~20000 – 30000x faster than NSE
- 17. ©2015 MFMER | slide-17
Production Processing HL7 Data
Reliability of MayoTopology Instances on BDProd Cluster
on MC BigData Appliance (V1.0 & V2.0)
• Production running started in May 2014
• Expanded to add additional edge nodes (3
more) and data nodes (4 more)
• Upgraded from TDH1.3.2 to TDH 2.1
.. from ES1.0.0 to .. to ES1.5.2
• Successfully identified and fixed critical issues
on the appliance
• No Data Loss occurred up to now
- 18. ©2015 MFMER | slide-18
Big Data Implementation in Support of
Colorectal Surgery Applications
- 19. ©2015 MFMER | slide-19
Goals for Support of Colorectal Surgery
Applications
• Optimize an existing NLP (Natural Language
Processing) Pipeline
• Move from thousands of HL7 documents to tens of
thousands of documents processed daily
• Replace existing free-text search facility used
by Clinical Web Service supported applications
• Move from minutes to milliseconds per search
• Simplify overall architecture, increase data
volume/velocity, and reduce costs
- 20. ©2015 MFMER | slide-20
Output
Parser
Colorectal
Surgical
Applications
HL7
HL7
HL7
UIMA
Annotators
4-6 JMS queues,
250-300k HL7
msgs /day
Elasticsearch
(indexing & free
text search)
HL7
HL7
HL7
Storm
(ID,
Transform,
and Parse)
HDFS
HL7 mgs for
annotation
To
Annotation
Facility
Clinical
Web
Services
REST
API
SQL
NLP Discovery
(MR, Hive, Pig,
other)
Existing
Components
Radiology
Surgical
ECG/EKG
Pathology
Clinical Notes
Insurance
Claims
Flume
RDBMS
Persist
Rules
Engine
New
Components
Services
EnterpriseMessagingQueues(ESB)
Solution Architecture
In Support of Colorectal Surgery Applications
- 21. ©2015 MFMER | slide-21
Enterprise Messaging Queues
ClinNotes
Surgery
Radiology
Rch Results
Insurance
ClinNotes
OpNotes
RadiolRpt
ECG Rpt
HDFS
Big Data Platform
Storm
1.Parse/Transform HL7
2.Persist JSON to Elasticsearch
3.Persist HL7 to HDFS
4.Route HL7 to NLP Queues
ClinNotes
OpNotes
RadiolRpt
ECG Rpt
NLP Input Queues
NLP Output Queues
RBMS Structured Data Store
NLP Evidence (annotations)
Structured Data (from source
systems)
CRS Point
of Care Tool
User Interfaces
External NLP Annotators
Parse HL7
Setup UIMA Resources
based on Message Type
Run UIMA Pipeline
Output 1:n Annotation
Results (NLP evidence)
A1. CRS_BLEED
A2. CRS_ILEUS
A3. NEURO_BLEED
An. <expandable>
…
Big Data Input Queues
Josh Pankratz – Apr 24, 2014
Elasticsearch
SQL
Solution Implementation
- 23. ©2015 MFMER | slide-23
CRS HL7 Data Processing Capability
MC BigData Appliance(Hadoop-ES)-NLPAnnotation(DS)-
AmalgaRDB/Web for Colorectal Surgery (CRS)
• Daily HL7-Persisting Capacity
• 535k ± 31k messages/day
• ~8-25x more capacity than current daily max
volume of CRS HL7 messages
- 24. ©2015 MFMER | slide-24
Production Processing CRS HL7 Data
Reliability of MC BigData Appliance(Hadoop-ES)-
NLPAnnotation(DS)-AmalgaRDB/Web for Colorectal
Surgery (CRS)
• Production running started in July 2014
• No Data Loss occurred up to now
- 26. ©2015 MFMER | slide-26
On-going & Future Direction
• Move current NLP Annotation Pipeline from
DataStage production server environment to
MC BigData appliance Hadoop cluster for
CRS applications
• Storm Topology
• Dedicated edge nodes
Faster & more reliable NLP annotation
Higher Capacity of HL7 message processing
- 27. ©2015 MFMER | slide-27
On-going & Future Direction
• Build a unified data architecture – Unified Data
Platform (UDP, an enterprise-integrated
system) over the next few years:
Enhance the Big Data platform
Utilize existing RDBMS-based replication and data
warehouse environment
Create a variety of data endpoints (cubes, data
services, advanced visualizations) for enterprise
usage
Integrate with non-Hadoop components for
advanced Big Data analytics – R, Revolution R..
- 29. ©2015 MFMER | slide-29
Conclusion
Take-Home Messages
• The implemented BigData platform coupled with
DataStage (NLP) & RDBMS exceeds current
Mayo Clinic patient-care needs:
• Reliably handle ~20-50x more capacity than
current daily volume of all HL7 messages
• Provide ultra-fast Free-Text Search capabilities on
medical terms
• Reliably handle ~8-25x more capacity than current
daily volume of HL7 messages for Colorectal
Surgery Applications
• Significantly outperform RDBMS-only-based
systems
- 30. ©2015 MFMER | slide-30
Conclusion
Take-Home Messages
• Big Data is a core component of Mayo Clinic
UDP, which can utilize the power of Big Data
technology at enterprise-level:
Large data storage capability
Structured, semi-structured and unstructured data
Fast data exchange with RDBMS-based systems
A variety of data-oriented Hadoop components –
HDFS, Pig, Hive, HBase, Spark ..
In-situ non-Hadoop data-processing components –
R, Revolution R ..
- 32. ©2015 MFMER | slide-32
Reference Links
• Mayo Clinic: http://www.mayoclinic.org/
• HL7: http://www.hl7.org/
• Hadoop Stack: http://hortonworks.com
• Apache Storm: https://storm.apache.org/
• ElasticSearch: https://www.elastic.co/
• Teradata: http://www.teradata.com/