Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Big-Data in HealthCare _ Overview
1. BIG DATA IN HEALTHCARE
MSc. Information Systems
2014 / 2015
Presented by:
Younes Hamdaoui (Std_ID 21266804)
Supervised by:
Dr Samia Oussena
Dr Wei Jie
2. INTRODUCTION
Clinics and health organisations have more reasons to become data-driven.
Patients are expecting quality and safety in treatments, fast clinical results
and doctors need electronic systems and analytical tools to make precise and
rapid judgements.
The size of data generated in healthcare sector is rapidly growing.
“U.S. health care data alone reached 150 exabytes in 2011”. (Institute for
Health Technology Transformation. Transforming Health Care Through Big Data. (2013))
3. PLAN
INTRODUCTION
WHY BIG DATA IN HEALTHCARE (HC) ?
• HISTORY OF DATA USAGE IN HC
• BIG DATA IN HC TODAY
BIG DATA APPLICATION IN HEALTH SECTOR
• COLLECTING DATA
• PROCESSING AND ANALYSING DATA
FUTURE OF BIG DATA IN HC
CONCLUSION
4. WHY BIG DATA IN HEALTHCARE ?
HISTORY OF DATA USAGE IN HC (1)
Paper based forms are still used in many developing countries (Knocking on doors
to collect information). The collected data needs to be moved to a computerized
system;
The process of computerizing data affects the quality of data and takes time;
Most of the decisions in global health are based on old data;
HC sector lacked important data that could help improve this area, questions like :
How many people are affected by diseases and disasters ? How many children were
born or died last week? And under what circumstances ? Were impossible to answer
rapidly.
5. Today health organisations use more developed tools to deal with the huge amount
of data created daily, to track their data and use it for analysis.
• Microsoft Excel monitoring spreadsheets
• Electronic Health Record (EHR)
• Relational Database Management Systems (RDBMS)
• Business Intelligence (BI) tools (SAP Business Objects, SQL Server
Integration/Analysis/Reporting Services, IBM Cognos)
Complex, slow, very expensive and no real-time analysis.
HISTORY OF DATA USAGE IN HC (2)
“80% of the development effort in a traditional big data project goes into
data integration and only 20% percent goes toward data analysis.”(Big Data
Analytics. Extract, Transform, and Load Big Data with Apache Hadoop.
(n.d.) INTEL White Paper.)
6. BIG DATA IN HEALTHCARE TODAY
Real cases :
Propeller Health: Uses data from sensors for asthma inhalers and
from mobile applications to help determine patients with asthma risks
before an attack arises. Collects weather and air quality information
to classify risk-level and risk factors by area.
NextBio: Uses personal patients information, molecular and genomic
data to help making personalised medical decisions.
(http://profitable-practice.softwareadvice.com/what-is-big-data-in-healthcare-
0813/)
7. Other Examples:
Comparative Effectiveness research: analysing clinical and financial efficiency of
interventions to enhance clinical care services in terms of quality and
performance.
Clinical Operation Intelligence: find misuse in clinical operations in order to
improve them.
Public Health Analysis: analysing health data sets of populations to determine the
overall effectiveness of medications.
(http://big-project.eu/blog/potential-big-data-applications-healthcare-sector)
8. THE ETL PROCESS
The extract, transform and load process is a crucial component for populating
data systems.
An ETL process recaptures data from various systems, refine and prepare it
for future investigation by using analytic and reporting tools.
ETL, Songini, M.L. (2004)
9. BIG DATA APPLICATION IN HEALTH SECTOR
COLLECTING DATA
% sqoop import
--connect jdbc:mysql://localhost/UWLHealth
--table BloodPressure -m 1
Importing a
single Table
Source Channel Sink
Specifying the path
to the locations of
log files
Holding area where
events flows are defined
through interceptors and
channel selectors
-Process events only
through the channel
-Writes data to HDFS
or HBase
Agent
e
v
e
n
t
s
Processed
Logs
Hadoop Image Processing Interface
10. PROCESSING AND ANALYSING DATA
Refining the
extracted
data using
Hive Query
Language
CREATE TABLE UWL_BloodPressure as select *,
normalerate – stdrate as rate_diff,
IF((normalerate - stdrate) > 20, ‘LOW',
IF((normalerate – stdrate) < -20, ‘HIGH', 'NORMAL')) AS
BPressure,
IF((normalerate - stdrate) > 20, ‘NOTOK',
IF((normalerate - stdrate) < -20, ‘NOTOK', ‘OK’)) AS
BPressure_variation
from UWL_Health;
“The Apache Hive data warehouse software facilitates querying and managing large
datasets residing in distributed storage. Hive provides a mechanism to project
structure onto this data and query the data using a SQL-like language called
HiveQL.” (https://hive.apache.org/)
ODBC DRIVER
Visualization of the
UWL_BloodPressure
table
Analysis
11. FUTURE OF BIG DATA IN HC
SENSORS
Real-time
visualization
HEALTH
DATA
Real-time Interaction
with patients
Real-time tracking of health
Real-time detection of anomalies
Real-time recommendations
Automatic urgency detection
Doctors will
diagnose patients
the minute they
step in hospitals
Real-time
prescriptions,
advices and
response to
emergencies
Disease surveillance
13. References
Transforming Health Care Through Big Data. (n.d.) .Institute for Health Technology
Transformation. Available at: http://ihealthtran.com/big-data-in-healthcare [Accessed 5
Mar. 2015].
Big Data Analytics. Extract, Transform, and Load Big Data with Apache Hadoop. (n.d.)
INTEL White Paper.
Profitable-practice.softwareadvice.com. What is “Big Data” in Healthcare, and Who’s
Already Doing It?. Available at: http://profitable-practice.softwareadvice.com/what-is-
big-data-in-healthcare-0813/ [Accessed 11 Mar. 2015].
Big-project.eu. The Potential of Big Data Applications for the Healthcare Sector | BIG -
Big Data Public Private Forum. Available at: http://big-project.eu/blog/potential-big-
data-applications-healthcare-sector [Accessed 11 Mar. 2015].
Songini, M.L. 2004, "ETL", Computerworld, [Online], vol. 38, no. 5, pp. 23.
Reach1to1 Technologies, (2015). Hive, Pig and Sqoop. [online] Available at:
http://reach1to1.com/technology/hive-pig-sqoop/ [Accessed 11 Mar. 2015].
Hortonworks, (2015). Apache Flume. [online] Available at:
http://hortonworks.com/hadoop/flume/ [Accessed 11 Mar. 2015].
Wikipedia, (2015). Apache Hive. [online] Available at:
http://en.wikipedia.org/wiki/Apache_Hive [Accessed 11 Mar. 2015].
Wikipedia, (2015). Pig (programming tool). [online] Available at:
http://en.wikipedia.org/wiki/Pig_%28programming_tool%29 [Accessed 11 Mar. 2015].
Editor's Notes
“Five exabytes of data would contain all the words ever spoken by human beings on earth.” (Institute for Health Technology Transformation. Transforming Health Care Through Big Data. (2013))
Propeller Health : company focused on asthma management
Clinical Operation Intelligence: “e.g. analysing medical procedures to find performance opportunities, such as improved clinical processes, fine-tuning and adaptation of clinical guidelines.”
Public Health Analysis: “e.g. using nation-wide disease registries, databases covering secondary data related to patients with specific diagnosis or procedure.”
(http://big-project.eu/blog/potential-big-data-applications-healthcare-sector)
The ETL process in Big Data tools is more efficient that traditional ETL tools (BI tools):
BI ETL : Expensive, difficult to implement, time consuming
BIG DATA ETL : faster, huge amount of data, parallel processing, cheaper
“Sqoop is a tool designed to transfer data between Hadoop and relational database management system (RDBMS). We can use Sqoop to import data from a RDBMS such as MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform the data in Hadoop by running MapReduce job, and then export the data back into an RDBMS.” (http://reach1to1.com/technology/hive-pig-sqoop/)
“Apache™ Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of streaming data into the Hadoop Distributed File System (HDFS).” (http://hortonworks.com/hadoop/flume/)
“Apache Hive is datawarehouse infrastructure built on top of hadoop for providing data summarization, query and analysis”(http://en.wikipedia.org/wiki/Apache_Hive)
“Pig is a high-level platform for creating MapReduce programs used with Hadoop. The language for this platform is called Pig Latin. Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for RDBMS.” ( )
ODBC : Open Database Connectivity.