SlideShare a Scribd company logo
1 of 56
Knowing me, knowing you, knowing your disease:
A new paradigm in healthcare privacy-preserving data sharing and
big data analytics
Omiros Metaxas
ATHENA Research Center & University of Athens
Research Areas
Database and
Information
Systems
Human-
Computer
Interaction
Scientific
Systems
Personalization &
Social Networks
Electronic
Infrastructures
Applications
• Query Optimization
• Cloud Query Processing
• Heterogeneous Systems
• Data mining / analytics
• Data curation
• Database User Interfaces
• Complex Data Visualization
• Scientific Experiment Management
• Scientific Databases
• Workflow Management
• Distributed Systems
• Cultural Heritage
• Life Sciences
• Physical Sciences
• User Modeling
• User Profiling
• Adaptivity
• Digital Libraries
• Data Repositories
• Interoperability
• Open Access Policies
• Cloud Data Services
BioMed
Oceans
Space & Earth
Culture Environment
OA Policies
Data Proc
From big data to new medical practice
• Manage heterogeneous, federated biomedical data sources & models
• Data provenance & on-line transformation (ETL)
• “Sanitization” (Anonymisation)
• Semi-automatic data profiling & curation
• Decentralization: Use Blockchain to manage access to sensitive Data
1. Big Data Management
• Address High Dimensionality & heterogeneity
• Scaling through Distributed processing
• Twofold Similarity Analysis (patients like mine & patients like me)
• KDD, statistical simulation & DSS based on BIG - routine - DATA
• Biomedical & Imaging Model-Based Analysis
• Privacy preserving algorithms & mechanisms
2. Big Data Analytics & Model-Based Analysis
• Scientific workflow support
• Collaboration, data sharing and 2nd opinion support
• Personalized, Unified Access to internal & external well-organized data,
information, models & knowledge
• DSS Tools & Applications for every role
3. Clinicians, Researchers & Patients Support
• Ethics & Privacy
• Transform daily routine’s data into useful information & knowledge
• Promote Model-Guided Personalized Medicine utilizing similarity
analysis, simulation models and DSS tools
• Tools & Models validation based on clinicians’ feedback
4. Medical Practice Reengineering
• Organize communities (clinicians, researchers & patients)
• Save, organize and diffuse information & knowledge
• Promote health (self care, awareness – patients like me, similarity)
• Market Place for everything and everyone (data, models, services and
applications)
5. Create & Support an Ecosystem
Big Data
• Volume (high)
• Velocity (high)
• Variety (great)
• Veracity (lack of)
• Value (hard to extract)
Big Data Analytics
• Capture (multi source)
• Aggregate (distributed storage)
• Process (distributed processing)
Privacy by Design & Privacy by Default
• Privacy preserving data publication & sharing
• Privacy preserving complex data flow execution
• Secure Data Access
• Privacy & Security data profiling
Quality Assurance, Quality of Service , Compliance & Dissemination
Privacy by design middleware Layer
ATHENA, GNUBILA [WP5]
Privacy preserving distributed data
processing
HOWWHERE
Private Data Sources
Federated Data Management & Data Harmonisation Layer
WHAT
Application Layer (WEB & Apps)
SIEMENS , ATHENA, HES-SO [WP2, WP8]
Data Exploration, Analytics & Cohort Builder
based on advanced Similarity & Semantic Search
HWC, DigiMe [WP3]
Personal Data Account (PDA) & Dynamic
consent management
WHY
GNUBILA, ATHENA, HWC [WP6, WP3]
Blockchain Integration & Smart
contracts management
HES-SO, ATHENA [WP4] : Semantic Modeling and data
integration
HES-SO, GNUBILA [WP4]: Persistent Identifiers
Cataloguing (PID)
API (for SaaS applications) ATHENA, GNUBILA (WP5, WP6)
Hospitals
Electronic Medical Records
Personal Data Subjects
social media accounts, clinical data repositories, personal
drives, wearable devices
LYN [WP11]: Coordination & Management
LYN [WP10]: Dissemination and Exploitation
CNR [WP9]: Penetration & Re-Identification Challenge NCTM [WP2]: Regulatory and Compliance Study
HES-SO [WP1]: Requirements Analysis
LYN [WP7]: Platform-driven Assessment
DigiMe, HWC [WP3]
Personal Data acquisition and
management
ATHENA [WP5]: Data Profiling & curation (quality, privacy & analysis)
Quality Assurance, Quality of Service , Compliance & Dissemination
Privacy by design middleware Layer
ATHENA, GNUBILA [WP5]
Privacy preserving distributed data
processing
HOW
Application Layer (WEB & Apps)
SIEMENS , ATHENA, HES-SO [WP2, WP8]
Data Exploration, Analytics & Cohort Builder
based on advanced Similarity & Semantic Search
HWC, DigiMe [WP3]
Personal Data Account (PDA) & Dynamic
consent management
WHY
GNUBILA, ATHENA, HWC [WP6, WP3]
Blockchain Integration & Smart
contracts management
API (for SaaS applications) ATHENA, GNUBILA (WP5, WP6)
LYN [WP11]: Coordination & Management
LYN [WP10]: Dissemination and Exploitation
CNR [WP9]: Penetration & Re-Identification Challenge NCTM [WP2]: Regulatory and Compliance Study
HES-SO [WP1]: Requirements Analysis
LYN [WP7]: Platform-driven Assessment
DigiMe, HWC [WP3]
Personal Data acquisition and
management
WHERE
Private Data Sources
Federated Data Management & Data Harmonisation Layer
WHAT
HES-SO, ATHENA [WP4] : Semantic Modeling and data
integration
HES-SO, GNUBILA [WP4]: Persistent Identifiers
Cataloguing (PID)
Hospitals
Electronic Medical Records
Personal Data Subjects
social media accounts, clinical data repositories, personal
drives, wearable devices
ATHENA [WP5]: Data Profiling & curation (quality, privacy & analysis)
 Data collection / origin
◦ Pseudonymised (de-identified) clinical (routine) data
◦ Personal data including machine-generated data from Internet of Things (IoT)
◦ Derived data related to the usage and the processing of the data
 Data storage & preservation
◦ Federated data management for clinical data
 ETL, pre-processing and pseudo-anonymization flow
◦ DIGI.me Personal Data Account (PDA) application
 retrieve personal data to an encrypted local library, which the users can then add to a personal cloud
 Data Modelling, Harmonisation, Cataloguing and Integration
◦ Global dynamic Subjective-Objective-Assessment-Plan (SOAP) model
◦ Use biomedical taxonomies and ontologies such as LOINC, SNOMED CT, ICD-10-CM, CPT, MESH
◦ Persistent Identifiers (PIDs)
 Secure data access, sharing and processing in line with GDPR legislation
Data Collection and Management
Hospitals
OPBG - Vatican
UCL/GOSH – London
DH – Berlin
IGG – Genova
KU - Leuven
CHUV – Lausanne
…
Quality Assurance, Quality of Service , Compliance & Dissemination
Application Layer (WEB & Apps)
SIEMENS , ATHENA, HES-SO [WP2, WP8]
Data Exploration, Analytics & Cohort Builder
based on advanced Similarity & Semantic Search
HWC, DigiMe [WP3]
Personal Data Account (PDA) & Dynamic
consent management
WHY
LYN [WP11]: Coordination & Management
LYN [WP10]: Dissemination and Exploitation
CNR [WP9]: Penetration & Re-Identification Challenge NCTM [WP2]: Regulatory and Compliance Study
HES-SO [WP1]: Requirements Analysis
LYN [WP7]: Platform-driven Assessment
WHERE
Private Data Sources
Federated Data Management & Data Harmonisation Layer
WHAT
HES-SO, ATHENA [WP4] : Semantic Modeling and data
integration
HES-SO, GNUBILA [WP4]: Persistent Identifiers
Cataloguing (PID)
Hospitals
Electronic Medical Records
Personal Data Subjects
social media accounts, clinical data repositories, personal
drives, wearable devices
ATHENA [WP5]: Data Profiling & curation (quality, privacy & analysis)
Privacy by design middleware Layer
ATHENA, GNUBILA [WP5]
Privacy preserving distributed data
processing
HOW
GNUBILA, ATHENA, HWC [WP6, WP3]
Blockchain Integration & Smart
contracts management
API (for SaaS applications) ATHENA, GNUBILA (WP5, WP6)
DigiMe, HWC [WP3]
Personal Data acquisition and
management
Data access & Privacy preservation
 Security / privacy breaches:
◦ avoid a single point of failure (i.e., datawarehouse, TTP): decentralize data
(transactions, patient data) and control using federation and blockchain
◦ offer multiple levels of privacy preservation
 Ownership: Users should control their data, easily join or leave
 Transparency: Users should audit the usage of their data
 Privacy is important
MDPSeC CDP
Blockchain as an access-control manager
Patient
PIDs
PIDs
PIDs
Digital Object Architecture (DOA)
PI
(1) Initiates a Data
Access request
(2) Re-identification &
consent request
(2) consent request
(Anonymous)
Medical Data
consent
consent
consent
New cohort
Request
Smart
Contract
(3a,b) Sharing of
(Anonymous) EHRs
Sharing
Privacy
preserving data
publishing
Blockchain integration @ MHMD
(3c) Execute a privacy
Preserving computation
Bio-medical model
Privacy preserving
distributed
complex data
flow execution
Transaction
Actors (WHO)
Data controllers
Data processors
Data subjects
Data controllers
Data (WHAT)
Functions (WHY)
Methods (HOW)
Output (WHAT)
 a decentralized personal data management platform focused
on privacy
 combine blockchain and off-blockchain storage
 users own, control and monitor their data and data usage
 utilize blockchain & smart contracts as an automated access-
control manager
 does not require trust in a third party
 pointers to de-identified data  suitable for random queries
 support full data processing through PPDM
 Smart Contract
Blockchain integration
WHO
subjects & controllers processors & requesters
WHAT & WHY
HOW
Data Functions Output
DMP &
(privacy) profiling
PPDM: MPC, DP, Encryption
(on pseudoanonymized data)
PredictionsPublishing
(external parties)
Mining
(within MHMD)
Models EHR data
Publishing: Anonymization &
Watermarking
Blockchain & Smart contracts
(control & trace data usage)
Personal data
access
Three main use cases:
 Personal Data Access
◦ Patient accessing his/her EHR
 Data publishing
◦ Research VS other purposes
◦ Anonymization requirements
◦ Watermarking
 Privacy Preserving Data Mining (within platform)
◦ Move data (authorized applications get and process the data i.e., MDP / Cardioproof)
◦ Move computation to data: secure multiparty computation (SMC, DP) on federated data /
distrustful parties (MHMD, HBP)
◦ Other encryption techniques (homomorphic)
Encryption and privacy preserving policies
 static data publishing: “Sanitization” (Anonymization)
 secure multi party computation: Only overall aggregated data are
transferred between nodes
 interactive anonymization: Differential Privacy & Crowd-Blending
privacy
 encryption: Fully/Partially Homomorphic Encryption (FHE)
 decentralization: Use Blockchain to Protect Personal Data
Encryption and privacy preserving policies
 Privacy & Sensitivity Data Profiling:
◦ Define privacy profiles per data type & usage scenario
 Trade-offs among efficiency, accuracy & privacy
 Define a formal methodology to describe “privacy
budget” in terms of expected accuracy
 Automate privacy preserving method selection based
on privacy & sensitivity profile and efficiency /
accuracy trade-offs
Encryption and privacy preserving policies
Efficiency
Secure Data publishing
 Different dangers
◦ Identity leakage
◦ Attribute leakage
◦ Participation leakage
 Different transformations
◦ Generalization
◦ Suppression
◦ Perturbation
◦ Partitioning
◦ Noise addition
 “Sanitization” (Anonymisation) hiding individual information
(ensuring k-anonymity) but preserving aggregated
(sufficient) statistics
Secure Data publishing
 Amnesia anonymization tool
◦ It offers several versions of k-anonymity
◦ It allows the user to select and customize possible solutions
◦ It offers graphical tools that allow the user to analyze the anonymized dataset
◦ It is scalable and uses all available CPU cores in the anonymization process
 Watermarking techniques
 The setting: Data is horizontally distributed at different sites on a Private
Data Network (PDN) of mutually distrustfully parties
 The aim: Compute the data mining algorithm on the data so that nothing
but the output is learned
◦ Use secure computation using SMPC, encryption, DP etc
◦ Assume Semi-honest types of adversaries that follow the protocol
 Makes sense where the participating parties really trust each other (e.g., hospitals)
 Training (learning) vs Reasoning: different requirements and privacy
related issues
◦ training: needs access to patient records
◦ reasoning: needs only the model and new data subjects but…
 Inference from the results: One can break privacy using well specified queries and analyzing
the results
Privacy Preserving Data Mining
 Distributed elastic execution
 Iterative dataflow execution: Support ML algorithms
 Powerful data programming paradigm: SQL with User Defined Functions
 Privacy-aware query processing
Distributed Privacy Preserving Data Mining:
EXAREME
Query
Federatio
n
Decompose query into
local and global parts
Dataflow Execution Example
1 N
id m-name m-valueid m-name m-value
Local queries Local queries
Partial
aggregated
results
Run local
queries
Run local
queries
“count, avg, std”
m-name N avg std
m-name Σx Σx2 N
Σx,Σx2,N Σx,Σx2,N
Partial
aggregated
results
m-name Σx Σx2 N
L:“Σx, Σx2, N”
G:“N, avg, std”
Run global
queries
N, avg, std
Quality Assurance, Quality of Service , Compliance & Dissemination
LYN [WP11]: Coordination & Management
LYN [WP10]: Dissemination and Exploitation
CNR [WP9]: Penetration & Re-Identification Challenge NCTM [WP2]: Regulatory and Compliance Study
HES-SO [WP1]: Requirements Analysis
LYN [WP7]: Platform-driven Assessment
WHERE
Private Data Sources
Federated Data Management & Data Harmonisation Layer
WHAT
HES-SO, ATHENA [WP4] : Semantic Modeling and data
integration
HES-SO, GNUBILA [WP4]: Persistent Identifiers
Cataloguing (PID)
Hospitals
Electronic Medical Records
Personal Data Subjects
social media accounts, clinical data repositories, personal
drives, wearable devices
Privacy by design middleware Layer
ATHENA, GNUBILA [WP5]
Privacy preserving distributed data
processing
HOW
GNUBILA, ATHENA, HWC [WP6, WP3]
Blockchain Integration & Smart
contracts management
API (for SaaS applications) ATHENA, GNUBILA (WP5, WP6)
DigiMe, HWC [WP3]
Personal Data acquisition and
management
Application Layer (WEB & Apps)
SIEMENS , ATHENA, HES-SO [WP2, WP8]
Data Exploration, Analytics & Cohort Builder
based on advanced Similarity & Semantic Search
HWC, DigiMe [WP3]
Personal Data Account (PDA) & Dynamic
consent management
WHY
ATHENA [WP5]: Data Profiling & curation (quality, privacy & analysis)
Data Cleaning, Exploration & Analytics
 Data curation & profiling, knowledge discovery and statistical
simulation framework
◦ Process driven by bottom-up evidence AND top-down models/knowledge
◦ Data profiling, cleaning & exploration: Statistical analysis, advanced visualization, rule based
cleaning
◦ Data Mining, pattern discovery and similarity analysis: Well established ML
◦ Statistical simulation: Dependency analysis/reasoning based on Bayesian Nets
Data Cleaning, Exploration & Analytics
Data
Query System
Action
Results
Analytics System
Curation System
XYZ System
 Abstraction
 Analytics (data mining, machine learning, discovery)
 Cleaning and curation
 Homogenization and integration
 Querying and searching
 Transformation
 Visualization
 Zooming
 …
Data Cleaning, Exploration & Analytics
Individualized diagnosis, prognosis &
treatment plan
Data analytics flow to P. Medicine
Precision Medicine
Support
Reasoning, Simulation & DSS
Domain knowledge & assumptions Clinical workflows
Data Analysis &
Modelling
Knowledge Discovery & Model training
Disease signatures & patient groups
Variables dependencies & prediction models
For a particular
patient
Unknown / missing data
Predict value of missing
variable
Final DAG (based on MCMC&DP, threshold=0.5)
Age
ParCHD
Procedures
ExIntoler
Cyanosis
CPBP
CPArrhy
CPConcl
CPTermRsn
BSA
TPVRegurg
TriRegurg
RVD
RedRV
PSMotion
RestrPatt
AVBlock
SupravArrhy
VentricArrhy
Age
ParCHD
Procedures
ExIntoler
Cyanosis
CPBP
CPArrhy
CPConcl
CPTermRsn
BSA
TPVRegurg
TriRegurg
RVD
RedRV
PSMotion
RestrPatt
AVBlock
SupravArrhy
VentricArrhy
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Transformed & validated data
Data Curation &
Exploration
Cleaning, profiling & pre-processing
Biomarker based personalized acquisition
TOP-DOWNBOTTOM-UP
Data Management & Harmonisation
Individualized diagnosis, prognosis &
treatment plan
Data analytics flow to P. Medicine
Precision Medicine
Support
Reasoning, Simulation & DSS
Domain knowledge & assumptions Clinical workflows
Data Analysis &
Modelling
Knowledge Discovery & Model training
Disease signatures & patient groups
Variables dependencies & prediction models
For a particular
patient
Unknown / missing data
Predict value of missing
variable
Final DAG (based on MCMC&DP, threshold=0.5)
Age
ParCHD
Procedures
ExIntoler
Cyanosis
CPBP
CPArrhy
CPConcl
CPTermRsn
BSA
TPVRegurg
TriRegurg
RVD
RedRV
PSMotion
RestrPatt
AVBlock
SupravArrhy
VentricArrhy
Age
ParCHD
Procedures
ExIntoler
Cyanosis
CPBP
CPArrhy
CPConcl
CPTermRsn
BSA
TPVRegurg
TriRegurg
RVD
RedRV
PSMotion
RestrPatt
AVBlock
SupravArrhy
VentricArrhy
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Biomarker based personalized acquisition
TOP-DOWNBOTTOM-UP
Data Management & Harmonisation
Transformed & validated data
Data Curation &
Exploration
Cleaning, profiling & pre-processing
 Data profiling:
◦ ensures and assess the actual content, structure and quality of the data
◦ reveal their characteristics, strengths and weaknesses
 Types:
◦ Structural: Schema, Type (e.g., numeric or text), Format (e.g., mm/dd/yyyy)
◦ Statistical: distribution, missing values, tails
◦ Logical: rules, constraints
◦ Identity: deduplication / resolution, ref. table matching
◦ Security / privacy Data Profiling: assessing relevance, sensitivity, risk for the
individual and practical value
Data Profiling and Curation
Data Profiling and Curation
DCV: semi-automatic tool
 data profiling
 data cleaning, validation & transformation
 privacy preserving data analysis
 interactive and efficient web-based interface
 workflow support (rerun experiments, reproduce results)
Interactive Visualisations
User-defined Cleaning Rules
Click on red piece of
pie to see violations
Variable discretisation
Individualized diagnosis, prognosis &
treatment plan
Data analytics flow to P. Medicine
Precision Medicine
Support
Reasoning, Simulation & DSS
Clinical workflows
For a particular
patient
Unknown / missing data
Predict value of missing
variable
Transformed & validated data
Data Curation &
Exploration
Cleaning, profiling & pre-processing
TOP-DOWNBOTTOM-UP
Data Management & Harmonisation
Domain knowledge & assumptions
Data Analysis &
Modelling
Knowledge Discovery & Model training
Disease signatures & patient groups
Variables dependencies & prediction models
Final DAG (based on MCMC&DP, threshold=0.5)
Age
ParCHD
Procedures
ExIntoler
Cyanosis
CPBP
CPArrhy
CPConcl
CPTermRsn
BSA
TPVRegurg
TriRegurg
RVD
RedRV
PSMotion
RestrPatt
AVBlock
SupravArrhy
VentricArrhy
Age
ParCHD
Procedures
ExIntoler
Cyanosis
CPBP
CPArrhy
CPConcl
CPTermRsn
BSA
TPVRegurg
TriRegurg
RVD
RedRV
PSMotion
RestrPatt
AVBlock
SupravArrhy
VentricArrhy
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Biomarker based personalized acquisition
 Disease signatures: latent factors characterizing disease
◦ Patterns over the most relevant disease variables, e.g., biomarkers
◦ Several approaches (probabilistic latent factor analysis, well established ML
argorithms)
 Predictive analysis: Patient classification or regression for
categorization and outcome analysis
 Descriptive analysis: clustering algorithms & probabilistic (mixed)
membership models
 Similarity Analysis: patients “like” me or mine (patient/clinician role)
Data Mining & KDD
Classification (model training & pattern
discovery)
Data analytics flow to P. Medicine
Domain knowledge & assumptions
Data Analysis &
Modelling
Knowledge Discovery & Model training
Disease signatures & patient groups
Variables dependencies & prediction models
Final DAG (based on MCMC&DP, threshold=0.5)
Age
ParCHD
Procedures
ExIntoler
Cyanosis
CPBP
CPArrhy
CPConcl
CPTermRsn
BSA
TPVRegurg
TriRegurg
RVD
RedRV
PSMotion
RestrPatt
AVBlock
SupravArrhy
VentricArrhy
Age
ParCHD
Procedures
ExIntoler
Cyanosis
CPBP
CPArrhy
CPConcl
CPTermRsn
BSA
TPVRegurg
TriRegurg
RVD
RedRV
PSMotion
RestrPatt
AVBlock
SupravArrhy
VentricArrhy
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Transformed & validated data
Data Curation &
Exploration
Cleaning, profiling & pre-processing
Biomarker based personalized acquisition
TOP-DOWNBOTTOM-UP
Data Management & Harmonisation
Individualized diagnosis, prognosis &
treatment plan
Precision Medicine
Support
Reasoning, Simulation & DSS
Clinical workflows
For a particular
patient
Unknown / missing data
Predict value of missing
variable
 NEUROLOGICAL AND NEUROMUSCULAR DISEASE (NND) Use-case:
Automatic classification of 7 Joint Movement Patterns based on kinematic
data.
 Training: on specific extracted features or raw gait analysis waveforms (time
series)
 Cross-Validation: Stratified 10-fold
 Method: Random Forests, kNN
 Results: Models prediction accuracies>85%
Classification (categorization)
Aim: To predict early disease outcome in JIA using baseline variables
Analysis: Random Forest algorithm on three datasets
Conclusion: Difficulty identifying patients who remained active
Clinical (acc=0.6)
Clinical with
Luminex (0.57)
Clinical with
microbiota (0.52)
Classification (outcome prediction)
Individualized diagnosis, prognosis &
treatment plan
Data analytics flow to P. Medicine
Precision Medicine
Support
Reasoning, Simulation & DSS
Clinical workflows
For a particular
patient
Unknown / missing data
Predict value of missing
variable
Transformed & validated data
Data Curation &
Exploration
Cleaning, profiling & pre-processing
TOP-DOWNBOTTOM-UP
Data Management & Harmonisation
Domain knowledge & assumptions
Data Analysis &
Modelling
Knowledge Discovery & Model training
Disease signatures & patient groups
Variables dependencies & prediction models
Final DAG (based on MCMC&DP, threshold=0.5)
Age
ParCHD
Procedures
ExIntoler
Cyanosis
CPBP
CPArrhy
CPConcl
CPTermRsn
BSA
TPVRegurg
TriRegurg
RVD
RedRV
PSMotion
RestrPatt
AVBlock
SupravArrhy
VentricArrhy
Age
ParCHD
Procedures
ExIntoler
Cyanosis
CPBP
CPArrhy
CPConcl
CPTermRsn
BSA
TPVRegurg
TriRegurg
RVD
RedRV
PSMotion
RestrPatt
AVBlock
SupravArrhy
VentricArrhy
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Biomarker based personalized acquisition
Probabilistic Modeling for statistical
simulation
Modelling
Dependency Analysis
Inference
Probabilistic Modeling for statistical
simulation
Finding most important dependencies and independencies:
e.g. disDur, neutro,pga are almost uncorrelated and excluded
Qualitative dependency analysis: Learning
the structure (DAG)
Quantitative analysis: Learning model
parameters (Cond. Prob.)
Data analytics flow to P. Medicine
Domain knowledge & assumptions
Data Analysis &
Modelling
Knowledge Discovery & Model training
Disease signatures & patient groups
Variables dependencies & prediction models
Final DAG (based on MCMC&DP, threshold=0.5)
Age
ParCHD
Procedures
ExIntoler
Cyanosis
CPBP
CPArrhy
CPConcl
CPTermRsn
BSA
TPVRegurg
TriRegurg
RVD
RedRV
PSMotion
RestrPatt
AVBlock
SupravArrhy
VentricArrhy
Age
ParCHD
Procedures
ExIntoler
Cyanosis
CPBP
CPArrhy
CPConcl
CPTermRsn
BSA
TPVRegurg
TriRegurg
RVD
RedRV
PSMotion
RestrPatt
AVBlock
SupravArrhy
VentricArrhy
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Transformed & validated data
Data Curation &
Exploration
Cleaning, profiling & pre-processing
Biomarker based personalized acquisition
TOP-DOWNBOTTOM-UP
Data Management & Harmonisation
Individualized diagnosis, prognosis &
treatment plan
Precision Medicine
Support
Reasoning, Simulation & DSS
Clinical workflows
For a particular
patient
Unknown / missing data
Predict value of missing
variable
JIA clinical model
Sensitivity Analysis on Outcome
tmj active
very small sample
- Bad prognosis
- Aggressive treatment
What if.. A new patient with 2 act. knee
joints & symmetry
What if.. Therapy = inject ?
What if.. Therapy = MTX
-same percentage
-worse prognosis
What about domain knowledge??
Data Analysis &
Modelling
Knowledge Discovery & Model training
Disease signatures & patient groups
Variables dependencies & prediction models
Final DAG (based on MCMC&DP, threshold=0.5)
Age
ParCHD
Procedures
ExIntoler
Cyanosis
CPBP
CPArrhy
CPConcl
CPTermRsn
BSA
TPVRegurg
TriRegurg
RVD
RedRV
PSMotion
RestrPatt
AVBlock
SupravArrhy
VentricArrhy
Age
ParCHD
Procedures
ExIntoler
Cyanosis
CPBP
CPArrhy
CPConcl
CPTermRsn
BSA
TPVRegurg
TriRegurg
RVD
RedRV
PSMotion
RestrPatt
AVBlock
SupravArrhy
VentricArrhy
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Transformed & validated data
Data Curation &
Exploration
Cleaning, profiling & pre-processing
Biomarker based personalized acquisition
TOP-DOWNBOTTOM-UP
Data Management & Harmonisation
Individualized diagnosis, prognosis &
treatment plan
Precision Medicine
Support
Reasoning, Simulation & DSS
Clinical workflows
For a particular
patient
Unknown / missing data
Predict value of missing
variable
Domain knowledge & assumptions
BIO-KNOWLEDGE ASSOCIATION MAP
Multi View Topic
Modelling
NLP & Named
Entity Recognition
Semantic bio terms
(PDBCodes,
chem2Bio2RDF,
LODD)
Generate specific association maps for
different types of entities (e.g., genes,
MESH, proteins, drugs)
Annotate publications
with bioterms (genes,
pdbCodes etc)
Full text PubMed
papers and meta data
including MESH
Identify multi modal
topics & quantify
associations Text pdbCode
cancer tumor growth breast lines
apoptosis tumors prostate kinase 1m17 2ity 1qcf
binding dna brca brct cancer res
mutations domain results
1tsr 1ycs 2ac0
1gzh
nef hiv vpr ssb virus felv hck
replication ssdna
1eyg 2nef 1jmc
1m8l 1efn 1izn
 Analyze large collections of documents, and meta-
data to:
 identify active areas of research: discover hidden themes (topics)
 understand what is actually produced: project the output to the reduced topic
space (calc topic distributions per document or other entity (e.g. gene or
protein)
 create association maps (interaction networks ) among different entities (e.g.,
genes, drugs, diseases, proteins)
• promote target identification: “Pathway expansion” for no ‘druggable’
targets, multi-target drugs, drug repositioning (indication expansion)
 identify emerging research areas , e.g., target identification, or the understanding
of disease mechanisms: create new therapeutic opportunities
 assess coverage, identify gaps or new therapeutic opportunities: compare funded
research, patents
Mining scientific literature
WHY
What is involved…
Extract features and annotate (enrich)
content using NLP, Named Entity
Recognition & Semantic Annotation
Tokenize, remove stop words
Refine stop words for
specific domain
1
ENRICH &
PRE-PROCESS
Identify topics: distribution over words
& “side” information
Automatic topic curation & entitling
Assign topics to publications
Evaluate & categorize
topics
Assess topic labels
2
FIND
TOPICS
Calculate topic proportions & trends of
objects based on their publications
Calculate similarity among different
entities based on various metrics
Analyze & Validate the
results
3
CALCULATE
TRENDS &
SIMILARITIES
Create WEB interactive visualization
with data driven graphs, charts and
layouts
Design optimal views
Validate modeling results
4
VISUALIZE
 Probabilistic Multi-View Topic Modeling of Text-Augmented
Heterogeneous Information Networks
 interconnected (linked) entities which characterized by TEXT and
related side information & links (e.g., taxonomies, venues, projects /
research areas, citations, authors)
 side-information:
 structured or unstructured attributes and meta-data
 links / relations: e.g., authorship network, citation network
 Incomplete, noisy or not related to textual attributes
Methodology
Multi-View Topic Modeling
Text
gene
cells
expression
vector
aav
vectors
dna
therapy
figure
cell
target
gfp
targeting
delivery
diseases
Phrases
gene therapy
gene transfer
aav vectors
lentiviral vectors
Grants
PERSIST: Persisting Transgenesis
AAVEYE: GENE THERAPY FOR INHERITED
SEVERE PHOTORECEPTOR DISEASES
MESH Descriptors
Genetic Vectors
Lentivirus
Genetic Therapy
Dependovirus
Green Fluorescent Proteins
Journals
Molecular therapy
Research Areas
Biotechnology, generic tools and medical
technologies for human health
Expert: What is this Topic about??
Diagnostics and treatment development:
Gene therapy & genetic vectors
Multi-View Topic Modeling
Infectious diseases: HIV and NEF protein
Text
hiv
cells
cell
nef
viral
virus
bst
gag
infected
drug
vpu
gfp
assembly
surface
cellular
Phrases
gfp cells
hela cells
Infected cells
plasma membrane
Grants
HIV ACE: Targeting assembly of infectious
HIV particles
INEF: Inhibiting Nef: a novel drug target for
HIV-host interactions
MESH Descriptors
HIV-1
Antigens, CD
Cell Membrane
Membrane Glycoproteins
Journals
plos pathogens
Research Areas
HEALTH-2007-2 [Translating research for
human health]
PDB codes
2NEF, 1M8ML, 1EFN
Similarity & Graph
clustering
Topics & allocations
Modelling
LINKS represent topic
based similarity
NODES may represent drugs,
PDBCodes, genes or MeSH terms
Size: ~ # of publications
Categories may
represent Anatomical
Therapeutic Chemical
(ATC) class, Biological
Process, MeSH hierarchy
etc
e-Infrastructures & data repositories
Use Domain Knowledge to
• Enhance Patient Similarity Analysis
• Promote Decision Support
Clinical data clouds
• Analyze clinical data to
validate findings

More Related Content

What's hot

Scalable and secure sharing of personal health records in cloud computing usi...
Scalable and secure sharing of personal health records in cloud computing usi...Scalable and secure sharing of personal health records in cloud computing usi...
Scalable and secure sharing of personal health records in cloud computing usi...
Harilal Punalur
 
secured storage of Personal health record in cloude
secured storage of Personal health record in cloudesecured storage of Personal health record in cloude
secured storage of Personal health record in cloude
Mahaveer kandgule
 
Scalable and secure sharing of personal health records in cloud computing us...
Scalable and secure sharing of personal health
records in cloud computing us...Scalable and secure sharing of personal health
records in cloud computing us...
Scalable and secure sharing of personal health records in cloud computing us...
Duraiyarasan S
 
Paper MIE2016 from Proceedings pags 122-126
Paper MIE2016 from Proceedings pags 122-126Paper MIE2016 from Proceedings pags 122-126
Paper MIE2016 from Proceedings pags 122-126
vilaltajo
 
IHE-XDS_XDR-XDM Understanding
IHE-XDS_XDR-XDM UnderstandingIHE-XDS_XDR-XDM Understanding
IHE-XDS_XDR-XDM Understanding
Raghu Kodumuri
 

What's hot (20)

Scalable and secure sharing of personal health records
Scalable and secure sharing of personal health recordsScalable and secure sharing of personal health records
Scalable and secure sharing of personal health records
 
Scalable and secure sharing of personal health records in cloud computing usi...
Scalable and secure sharing of personal health records in cloud computing usi...Scalable and secure sharing of personal health records in cloud computing usi...
Scalable and secure sharing of personal health records in cloud computing usi...
 
secured storage of Personal health record in cloude
secured storage of Personal health record in cloudesecured storage of Personal health record in cloude
secured storage of Personal health record in cloude
 
Protecting Personal Data in a IoT Network with UMA
Protecting Personal Data in a IoT Network with UMAProtecting Personal Data in a IoT Network with UMA
Protecting Personal Data in a IoT Network with UMA
 
Scalable and secure sharing of public health record using attribute based Enc...
Scalable and secure sharing of public health record using attribute based Enc...Scalable and secure sharing of public health record using attribute based Enc...
Scalable and secure sharing of public health record using attribute based Enc...
 
phr
phrphr
phr
 
Research-KS-Jun2015
Research-KS-Jun2015Research-KS-Jun2015
Research-KS-Jun2015
 
Scalable and secure sharing of personal health records in cloud computing us...
Scalable and secure sharing of personal health
records in cloud computing us...Scalable and secure sharing of personal health
records in cloud computing us...
Scalable and secure sharing of personal health records in cloud computing us...
 
IHE Update and Overview
IHE Update and OverviewIHE Update and Overview
IHE Update and Overview
 
Paper MIE2016 from Proceedings pags 122-126
Paper MIE2016 from Proceedings pags 122-126Paper MIE2016 from Proceedings pags 122-126
Paper MIE2016 from Proceedings pags 122-126
 
HealthBlock: A Secured Healthcare Data using Blockchain
HealthBlock: A Secured Healthcare Data using BlockchainHealthBlock: A Secured Healthcare Data using Blockchain
HealthBlock: A Secured Healthcare Data using Blockchain
 
IHE Distributing Images: Cross-enterprise Document Sharing for Imaging (XDS-I)
IHE Distributing Images: Cross-enterprise Document Sharing for Imaging (XDS-I)IHE Distributing Images: Cross-enterprise Document Sharing for Imaging (XDS-I)
IHE Distributing Images: Cross-enterprise Document Sharing for Imaging (XDS-I)
 
Virdatint Distributed Data Virtualization Basics_2.6
Virdatint Distributed Data Virtualization Basics_2.6Virdatint Distributed Data Virtualization Basics_2.6
Virdatint Distributed Data Virtualization Basics_2.6
 
Developing a Healthcare Blockchain Solution
Developing a Healthcare Blockchain SolutionDeveloping a Healthcare Blockchain Solution
Developing a Healthcare Blockchain Solution
 
Scalable and secure sharing of personal health records in cloud computing usi...
Scalable and secure sharing of personal health records in cloud computing usi...Scalable and secure sharing of personal health records in cloud computing usi...
Scalable and secure sharing of personal health records in cloud computing usi...
 
Personal Health Records - An Overview
Personal Health Records - An OverviewPersonal Health Records - An Overview
Personal Health Records - An Overview
 
Secure Sharing of Personal Health Records in Cloud Computing using Encryption
Secure Sharing of Personal Health Records in Cloud Computing using EncryptionSecure Sharing of Personal Health Records in Cloud Computing using Encryption
Secure Sharing of Personal Health Records in Cloud Computing using Encryption
 
Blockchain for healthcare 2018
Blockchain for healthcare 2018Blockchain for healthcare 2018
Blockchain for healthcare 2018
 
IHE-XDS_XDR-XDM Understanding
IHE-XDS_XDR-XDM UnderstandingIHE-XDS_XDR-XDM Understanding
IHE-XDS_XDR-XDM Understanding
 
How blockchain is revolutionising healthcare industry’s challenges of genomic...
How blockchain is revolutionising healthcare industry’s challenges of genomic...How blockchain is revolutionising healthcare industry’s challenges of genomic...
How blockchain is revolutionising healthcare industry’s challenges of genomic...
 

Similar to Knowing me, knowing you, knowing your disease

Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Aridhia Informatics Ltd
 
Building an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-MakingBuilding an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-Making
Denodo
 
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
DataWorks Summit/Hadoop Summit
 

Similar to Knowing me, knowing you, knowing your disease (20)

Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...
Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...
Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...
 
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
 
Bridging Health Care and Clinical Trial Data through Technology
Bridging Health Care and Clinical Trial Data through TechnologyBridging Health Care and Clinical Trial Data through Technology
Bridging Health Care and Clinical Trial Data through Technology
 
Enterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for HealthcareEnterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for Healthcare
 
Data mining and data warehousing
Data mining and data warehousingData mining and data warehousing
Data mining and data warehousing
 
Transforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform StrategyTransforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform Strategy
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
 
The need for interoperability in blockchain-based initiatives to facilitate c...
The need for interoperability in blockchain-based initiatives to facilitate c...The need for interoperability in blockchain-based initiatives to facilitate c...
The need for interoperability in blockchain-based initiatives to facilitate c...
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
Data Virtualization at UMC Utrecht: Don't Collect, Connect! by Erik Fransen (...
Data Virtualization at UMC Utrecht: Don't Collect, Connect! by Erik Fransen (...Data Virtualization at UMC Utrecht: Don't Collect, Connect! by Erik Fransen (...
Data Virtualization at UMC Utrecht: Don't Collect, Connect! by Erik Fransen (...
 
Regulatory Intelligence
Regulatory IntelligenceRegulatory Intelligence
Regulatory Intelligence
 
Building an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-MakingBuilding an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-Making
 
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
 
Transform to Cognitive Healthcare with IBM Software Defined Infrastructure an...
Transform to Cognitive Healthcare with IBM Software Defined Infrastructure an...Transform to Cognitive Healthcare with IBM Software Defined Infrastructure an...
Transform to Cognitive Healthcare with IBM Software Defined Infrastructure an...
 
Big Data Analytics for Healthcare Decision Support- Operational and Clinical
Big Data Analytics for Healthcare Decision Support- Operational and ClinicalBig Data Analytics for Healthcare Decision Support- Operational and Clinical
Big Data Analytics for Healthcare Decision Support- Operational and Clinical
 
Evaluating How Blockchain Can Transform the Pharmaceutical and Healthcare Ind...
Evaluating How Blockchain Can Transform the Pharmaceutical and Healthcare Ind...Evaluating How Blockchain Can Transform the Pharmaceutical and Healthcare Ind...
Evaluating How Blockchain Can Transform the Pharmaceutical and Healthcare Ind...
 
0th PPT - BLOCKCHAIN-CBE (1).ppt
0th PPT - BLOCKCHAIN-CBE (1).ppt0th PPT - BLOCKCHAIN-CBE (1).ppt
0th PPT - BLOCKCHAIN-CBE (1).ppt
 
Cloud Based Privacy Preserving Data Encryption
Cloud Based Privacy Preserving Data EncryptionCloud Based Privacy Preserving Data Encryption
Cloud Based Privacy Preserving Data Encryption
 

More from eHealth Forum

More from eHealth Forum (14)

4th Athens Digital Health meetup
4th Athens Digital Health meetup4th Athens Digital Health meetup
4th Athens Digital Health meetup
 
GDPR The New Data Protection Law coming into effect May 2018. What does it me...
GDPR The New Data Protection Law coming into effect May 2018. What does it me...GDPR The New Data Protection Law coming into effect May 2018. What does it me...
GDPR The New Data Protection Law coming into effect May 2018. What does it me...
 
Big data for precision medicine: challenges and opportunities
Big data for precision medicine: challenges and opportunitiesBig data for precision medicine: challenges and opportunities
Big data for precision medicine: challenges and opportunities
 
The P4 Initiative: Personalized - Predictive - Preventive - Participatory Med...
The P4 Initiative: Personalized - Predictive - Preventive - Participatory Med...The P4 Initiative: Personalized - Predictive - Preventive - Participatory Med...
The P4 Initiative: Personalized - Predictive - Preventive - Participatory Med...
 
Digitalized Public Hospital under the Stethoscope: 5Ws and an H
Digitalized Public Hospital under the Stethoscope: 5Ws and an HDigitalized Public Hospital under the Stethoscope: 5Ws and an H
Digitalized Public Hospital under the Stethoscope: 5Ws and an H
 
Report: Greek Delegation at eHealth Week 2017, Malta
Report: Greek Delegation at eHealth Week 2017, MaltaReport: Greek Delegation at eHealth Week 2017, Malta
Report: Greek Delegation at eHealth Week 2017, Malta
 
Advancing eHealth in Greece - eHealth Week'17 Outcomes
Advancing eHealth in Greece - eHealth Week'17 OutcomesAdvancing eHealth in Greece - eHealth Week'17 Outcomes
Advancing eHealth in Greece - eHealth Week'17 Outcomes
 
Unraveling the opportunities & challenges of the Greek eHealth Ecosystem
Unraveling the  opportunities & challenges of the Greek eHealth EcosystemUnraveling the  opportunities & challenges of the Greek eHealth Ecosystem
Unraveling the opportunities & challenges of the Greek eHealth Ecosystem
 
Advancing eHealth in Greece
Advancing eHealth in GreeceAdvancing eHealth in Greece
Advancing eHealth in Greece
 
Blockchain Technology for Patients Medical Records
Blockchain Technology for Patients Medical RecordsBlockchain Technology for Patients Medical Records
Blockchain Technology for Patients Medical Records
 
The Greek ePrescription System
The Greek ePrescription SystemThe Greek ePrescription System
The Greek ePrescription System
 
The Impact of Digital Health on Our Everyday Lives
The Impact of Digital Health on Our Everyday Lives The Impact of Digital Health on Our Everyday Lives
The Impact of Digital Health on Our Everyday Lives
 
Planning the eHealth Forum of tomorrow
Planning the eHealth Forum of tomorrow Planning the eHealth Forum of tomorrow
Planning the eHealth Forum of tomorrow
 
Does Greece have an eHealth strategy plan?
Does Greece have an eHealth strategy plan? Does Greece have an eHealth strategy plan?
Does Greece have an eHealth strategy plan?
 

Recently uploaded

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 

Recently uploaded (20)

20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 

Knowing me, knowing you, knowing your disease

  • 1. Knowing me, knowing you, knowing your disease: A new paradigm in healthcare privacy-preserving data sharing and big data analytics Omiros Metaxas ATHENA Research Center & University of Athens
  • 2. Research Areas Database and Information Systems Human- Computer Interaction Scientific Systems Personalization & Social Networks Electronic Infrastructures Applications • Query Optimization • Cloud Query Processing • Heterogeneous Systems • Data mining / analytics • Data curation • Database User Interfaces • Complex Data Visualization • Scientific Experiment Management • Scientific Databases • Workflow Management • Distributed Systems • Cultural Heritage • Life Sciences • Physical Sciences • User Modeling • User Profiling • Adaptivity • Digital Libraries • Data Repositories • Interoperability • Open Access Policies • Cloud Data Services
  • 3. BioMed Oceans Space & Earth Culture Environment OA Policies Data Proc
  • 4. From big data to new medical practice • Manage heterogeneous, federated biomedical data sources & models • Data provenance & on-line transformation (ETL) • “Sanitization” (Anonymisation) • Semi-automatic data profiling & curation • Decentralization: Use Blockchain to manage access to sensitive Data 1. Big Data Management • Address High Dimensionality & heterogeneity • Scaling through Distributed processing • Twofold Similarity Analysis (patients like mine & patients like me) • KDD, statistical simulation & DSS based on BIG - routine - DATA • Biomedical & Imaging Model-Based Analysis • Privacy preserving algorithms & mechanisms 2. Big Data Analytics & Model-Based Analysis • Scientific workflow support • Collaboration, data sharing and 2nd opinion support • Personalized, Unified Access to internal & external well-organized data, information, models & knowledge • DSS Tools & Applications for every role 3. Clinicians, Researchers & Patients Support • Ethics & Privacy • Transform daily routine’s data into useful information & knowledge • Promote Model-Guided Personalized Medicine utilizing similarity analysis, simulation models and DSS tools • Tools & Models validation based on clinicians’ feedback 4. Medical Practice Reengineering • Organize communities (clinicians, researchers & patients) • Save, organize and diffuse information & knowledge • Promote health (self care, awareness – patients like me, similarity) • Market Place for everything and everyone (data, models, services and applications) 5. Create & Support an Ecosystem Big Data • Volume (high) • Velocity (high) • Variety (great) • Veracity (lack of) • Value (hard to extract) Big Data Analytics • Capture (multi source) • Aggregate (distributed storage) • Process (distributed processing) Privacy by Design & Privacy by Default • Privacy preserving data publication & sharing • Privacy preserving complex data flow execution • Secure Data Access • Privacy & Security data profiling
  • 5. Quality Assurance, Quality of Service , Compliance & Dissemination Privacy by design middleware Layer ATHENA, GNUBILA [WP5] Privacy preserving distributed data processing HOWWHERE Private Data Sources Federated Data Management & Data Harmonisation Layer WHAT Application Layer (WEB & Apps) SIEMENS , ATHENA, HES-SO [WP2, WP8] Data Exploration, Analytics & Cohort Builder based on advanced Similarity & Semantic Search HWC, DigiMe [WP3] Personal Data Account (PDA) & Dynamic consent management WHY GNUBILA, ATHENA, HWC [WP6, WP3] Blockchain Integration & Smart contracts management HES-SO, ATHENA [WP4] : Semantic Modeling and data integration HES-SO, GNUBILA [WP4]: Persistent Identifiers Cataloguing (PID) API (for SaaS applications) ATHENA, GNUBILA (WP5, WP6) Hospitals Electronic Medical Records Personal Data Subjects social media accounts, clinical data repositories, personal drives, wearable devices LYN [WP11]: Coordination & Management LYN [WP10]: Dissemination and Exploitation CNR [WP9]: Penetration & Re-Identification Challenge NCTM [WP2]: Regulatory and Compliance Study HES-SO [WP1]: Requirements Analysis LYN [WP7]: Platform-driven Assessment DigiMe, HWC [WP3] Personal Data acquisition and management ATHENA [WP5]: Data Profiling & curation (quality, privacy & analysis)
  • 6. Quality Assurance, Quality of Service , Compliance & Dissemination Privacy by design middleware Layer ATHENA, GNUBILA [WP5] Privacy preserving distributed data processing HOW Application Layer (WEB & Apps) SIEMENS , ATHENA, HES-SO [WP2, WP8] Data Exploration, Analytics & Cohort Builder based on advanced Similarity & Semantic Search HWC, DigiMe [WP3] Personal Data Account (PDA) & Dynamic consent management WHY GNUBILA, ATHENA, HWC [WP6, WP3] Blockchain Integration & Smart contracts management API (for SaaS applications) ATHENA, GNUBILA (WP5, WP6) LYN [WP11]: Coordination & Management LYN [WP10]: Dissemination and Exploitation CNR [WP9]: Penetration & Re-Identification Challenge NCTM [WP2]: Regulatory and Compliance Study HES-SO [WP1]: Requirements Analysis LYN [WP7]: Platform-driven Assessment DigiMe, HWC [WP3] Personal Data acquisition and management WHERE Private Data Sources Federated Data Management & Data Harmonisation Layer WHAT HES-SO, ATHENA [WP4] : Semantic Modeling and data integration HES-SO, GNUBILA [WP4]: Persistent Identifiers Cataloguing (PID) Hospitals Electronic Medical Records Personal Data Subjects social media accounts, clinical data repositories, personal drives, wearable devices ATHENA [WP5]: Data Profiling & curation (quality, privacy & analysis)
  • 7.  Data collection / origin ◦ Pseudonymised (de-identified) clinical (routine) data ◦ Personal data including machine-generated data from Internet of Things (IoT) ◦ Derived data related to the usage and the processing of the data  Data storage & preservation ◦ Federated data management for clinical data  ETL, pre-processing and pseudo-anonymization flow ◦ DIGI.me Personal Data Account (PDA) application  retrieve personal data to an encrypted local library, which the users can then add to a personal cloud  Data Modelling, Harmonisation, Cataloguing and Integration ◦ Global dynamic Subjective-Objective-Assessment-Plan (SOAP) model ◦ Use biomedical taxonomies and ontologies such as LOINC, SNOMED CT, ICD-10-CM, CPT, MESH ◦ Persistent Identifiers (PIDs)  Secure data access, sharing and processing in line with GDPR legislation Data Collection and Management
  • 8. Hospitals OPBG - Vatican UCL/GOSH – London DH – Berlin IGG – Genova KU - Leuven CHUV – Lausanne …
  • 9. Quality Assurance, Quality of Service , Compliance & Dissemination Application Layer (WEB & Apps) SIEMENS , ATHENA, HES-SO [WP2, WP8] Data Exploration, Analytics & Cohort Builder based on advanced Similarity & Semantic Search HWC, DigiMe [WP3] Personal Data Account (PDA) & Dynamic consent management WHY LYN [WP11]: Coordination & Management LYN [WP10]: Dissemination and Exploitation CNR [WP9]: Penetration & Re-Identification Challenge NCTM [WP2]: Regulatory and Compliance Study HES-SO [WP1]: Requirements Analysis LYN [WP7]: Platform-driven Assessment WHERE Private Data Sources Federated Data Management & Data Harmonisation Layer WHAT HES-SO, ATHENA [WP4] : Semantic Modeling and data integration HES-SO, GNUBILA [WP4]: Persistent Identifiers Cataloguing (PID) Hospitals Electronic Medical Records Personal Data Subjects social media accounts, clinical data repositories, personal drives, wearable devices ATHENA [WP5]: Data Profiling & curation (quality, privacy & analysis) Privacy by design middleware Layer ATHENA, GNUBILA [WP5] Privacy preserving distributed data processing HOW GNUBILA, ATHENA, HWC [WP6, WP3] Blockchain Integration & Smart contracts management API (for SaaS applications) ATHENA, GNUBILA (WP5, WP6) DigiMe, HWC [WP3] Personal Data acquisition and management
  • 10. Data access & Privacy preservation  Security / privacy breaches: ◦ avoid a single point of failure (i.e., datawarehouse, TTP): decentralize data (transactions, patient data) and control using federation and blockchain ◦ offer multiple levels of privacy preservation  Ownership: Users should control their data, easily join or leave  Transparency: Users should audit the usage of their data  Privacy is important
  • 11. MDPSeC CDP Blockchain as an access-control manager Patient PIDs PIDs PIDs Digital Object Architecture (DOA) PI (1) Initiates a Data Access request (2) Re-identification & consent request (2) consent request (Anonymous) Medical Data consent consent consent New cohort Request Smart Contract (3a,b) Sharing of (Anonymous) EHRs Sharing Privacy preserving data publishing Blockchain integration @ MHMD (3c) Execute a privacy Preserving computation Bio-medical model Privacy preserving distributed complex data flow execution Transaction Actors (WHO) Data controllers Data processors Data subjects Data controllers Data (WHAT) Functions (WHY) Methods (HOW) Output (WHAT)  a decentralized personal data management platform focused on privacy  combine blockchain and off-blockchain storage  users own, control and monitor their data and data usage  utilize blockchain & smart contracts as an automated access- control manager  does not require trust in a third party  pointers to de-identified data  suitable for random queries  support full data processing through PPDM
  • 12.  Smart Contract Blockchain integration WHO subjects & controllers processors & requesters WHAT & WHY HOW Data Functions Output DMP & (privacy) profiling PPDM: MPC, DP, Encryption (on pseudoanonymized data) PredictionsPublishing (external parties) Mining (within MHMD) Models EHR data Publishing: Anonymization & Watermarking Blockchain & Smart contracts (control & trace data usage) Personal data access
  • 13. Three main use cases:  Personal Data Access ◦ Patient accessing his/her EHR  Data publishing ◦ Research VS other purposes ◦ Anonymization requirements ◦ Watermarking  Privacy Preserving Data Mining (within platform) ◦ Move data (authorized applications get and process the data i.e., MDP / Cardioproof) ◦ Move computation to data: secure multiparty computation (SMC, DP) on federated data / distrustful parties (MHMD, HBP) ◦ Other encryption techniques (homomorphic) Encryption and privacy preserving policies
  • 14.  static data publishing: “Sanitization” (Anonymization)  secure multi party computation: Only overall aggregated data are transferred between nodes  interactive anonymization: Differential Privacy & Crowd-Blending privacy  encryption: Fully/Partially Homomorphic Encryption (FHE)  decentralization: Use Blockchain to Protect Personal Data Encryption and privacy preserving policies
  • 15.  Privacy & Sensitivity Data Profiling: ◦ Define privacy profiles per data type & usage scenario  Trade-offs among efficiency, accuracy & privacy  Define a formal methodology to describe “privacy budget” in terms of expected accuracy  Automate privacy preserving method selection based on privacy & sensitivity profile and efficiency / accuracy trade-offs Encryption and privacy preserving policies Efficiency
  • 16. Secure Data publishing  Different dangers ◦ Identity leakage ◦ Attribute leakage ◦ Participation leakage  Different transformations ◦ Generalization ◦ Suppression ◦ Perturbation ◦ Partitioning ◦ Noise addition  “Sanitization” (Anonymisation) hiding individual information (ensuring k-anonymity) but preserving aggregated (sufficient) statistics
  • 17. Secure Data publishing  Amnesia anonymization tool ◦ It offers several versions of k-anonymity ◦ It allows the user to select and customize possible solutions ◦ It offers graphical tools that allow the user to analyze the anonymized dataset ◦ It is scalable and uses all available CPU cores in the anonymization process  Watermarking techniques
  • 18.  The setting: Data is horizontally distributed at different sites on a Private Data Network (PDN) of mutually distrustfully parties  The aim: Compute the data mining algorithm on the data so that nothing but the output is learned ◦ Use secure computation using SMPC, encryption, DP etc ◦ Assume Semi-honest types of adversaries that follow the protocol  Makes sense where the participating parties really trust each other (e.g., hospitals)  Training (learning) vs Reasoning: different requirements and privacy related issues ◦ training: needs access to patient records ◦ reasoning: needs only the model and new data subjects but…  Inference from the results: One can break privacy using well specified queries and analyzing the results Privacy Preserving Data Mining
  • 19.  Distributed elastic execution  Iterative dataflow execution: Support ML algorithms  Powerful data programming paradigm: SQL with User Defined Functions  Privacy-aware query processing Distributed Privacy Preserving Data Mining: EXAREME
  • 20. Query Federatio n Decompose query into local and global parts Dataflow Execution Example 1 N id m-name m-valueid m-name m-value Local queries Local queries Partial aggregated results Run local queries Run local queries “count, avg, std” m-name N avg std m-name Σx Σx2 N Σx,Σx2,N Σx,Σx2,N Partial aggregated results m-name Σx Σx2 N L:“Σx, Σx2, N” G:“N, avg, std” Run global queries N, avg, std
  • 21. Quality Assurance, Quality of Service , Compliance & Dissemination LYN [WP11]: Coordination & Management LYN [WP10]: Dissemination and Exploitation CNR [WP9]: Penetration & Re-Identification Challenge NCTM [WP2]: Regulatory and Compliance Study HES-SO [WP1]: Requirements Analysis LYN [WP7]: Platform-driven Assessment WHERE Private Data Sources Federated Data Management & Data Harmonisation Layer WHAT HES-SO, ATHENA [WP4] : Semantic Modeling and data integration HES-SO, GNUBILA [WP4]: Persistent Identifiers Cataloguing (PID) Hospitals Electronic Medical Records Personal Data Subjects social media accounts, clinical data repositories, personal drives, wearable devices Privacy by design middleware Layer ATHENA, GNUBILA [WP5] Privacy preserving distributed data processing HOW GNUBILA, ATHENA, HWC [WP6, WP3] Blockchain Integration & Smart contracts management API (for SaaS applications) ATHENA, GNUBILA (WP5, WP6) DigiMe, HWC [WP3] Personal Data acquisition and management Application Layer (WEB & Apps) SIEMENS , ATHENA, HES-SO [WP2, WP8] Data Exploration, Analytics & Cohort Builder based on advanced Similarity & Semantic Search HWC, DigiMe [WP3] Personal Data Account (PDA) & Dynamic consent management WHY ATHENA [WP5]: Data Profiling & curation (quality, privacy & analysis)
  • 22. Data Cleaning, Exploration & Analytics  Data curation & profiling, knowledge discovery and statistical simulation framework ◦ Process driven by bottom-up evidence AND top-down models/knowledge ◦ Data profiling, cleaning & exploration: Statistical analysis, advanced visualization, rule based cleaning ◦ Data Mining, pattern discovery and similarity analysis: Well established ML ◦ Statistical simulation: Dependency analysis/reasoning based on Bayesian Nets
  • 23. Data Cleaning, Exploration & Analytics Data Query System Action Results Analytics System Curation System XYZ System  Abstraction  Analytics (data mining, machine learning, discovery)  Cleaning and curation  Homogenization and integration  Querying and searching  Transformation  Visualization  Zooming  …
  • 25. Individualized diagnosis, prognosis & treatment plan Data analytics flow to P. Medicine Precision Medicine Support Reasoning, Simulation & DSS Domain knowledge & assumptions Clinical workflows Data Analysis & Modelling Knowledge Discovery & Model training Disease signatures & patient groups Variables dependencies & prediction models For a particular patient Unknown / missing data Predict value of missing variable Final DAG (based on MCMC&DP, threshold=0.5) Age ParCHD Procedures ExIntoler Cyanosis CPBP CPArrhy CPConcl CPTermRsn BSA TPVRegurg TriRegurg RVD RedRV PSMotion RestrPatt AVBlock SupravArrhy VentricArrhy Age ParCHD Procedures ExIntoler Cyanosis CPBP CPArrhy CPConcl CPTermRsn BSA TPVRegurg TriRegurg RVD RedRV PSMotion RestrPatt AVBlock SupravArrhy VentricArrhy 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Transformed & validated data Data Curation & Exploration Cleaning, profiling & pre-processing Biomarker based personalized acquisition TOP-DOWNBOTTOM-UP Data Management & Harmonisation
  • 26. Individualized diagnosis, prognosis & treatment plan Data analytics flow to P. Medicine Precision Medicine Support Reasoning, Simulation & DSS Domain knowledge & assumptions Clinical workflows Data Analysis & Modelling Knowledge Discovery & Model training Disease signatures & patient groups Variables dependencies & prediction models For a particular patient Unknown / missing data Predict value of missing variable Final DAG (based on MCMC&DP, threshold=0.5) Age ParCHD Procedures ExIntoler Cyanosis CPBP CPArrhy CPConcl CPTermRsn BSA TPVRegurg TriRegurg RVD RedRV PSMotion RestrPatt AVBlock SupravArrhy VentricArrhy Age ParCHD Procedures ExIntoler Cyanosis CPBP CPArrhy CPConcl CPTermRsn BSA TPVRegurg TriRegurg RVD RedRV PSMotion RestrPatt AVBlock SupravArrhy VentricArrhy 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Biomarker based personalized acquisition TOP-DOWNBOTTOM-UP Data Management & Harmonisation Transformed & validated data Data Curation & Exploration Cleaning, profiling & pre-processing
  • 27.  Data profiling: ◦ ensures and assess the actual content, structure and quality of the data ◦ reveal their characteristics, strengths and weaknesses  Types: ◦ Structural: Schema, Type (e.g., numeric or text), Format (e.g., mm/dd/yyyy) ◦ Statistical: distribution, missing values, tails ◦ Logical: rules, constraints ◦ Identity: deduplication / resolution, ref. table matching ◦ Security / privacy Data Profiling: assessing relevance, sensitivity, risk for the individual and practical value Data Profiling and Curation
  • 28. Data Profiling and Curation DCV: semi-automatic tool  data profiling  data cleaning, validation & transformation  privacy preserving data analysis  interactive and efficient web-based interface  workflow support (rerun experiments, reproduce results)
  • 30. User-defined Cleaning Rules Click on red piece of pie to see violations
  • 32. Individualized diagnosis, prognosis & treatment plan Data analytics flow to P. Medicine Precision Medicine Support Reasoning, Simulation & DSS Clinical workflows For a particular patient Unknown / missing data Predict value of missing variable Transformed & validated data Data Curation & Exploration Cleaning, profiling & pre-processing TOP-DOWNBOTTOM-UP Data Management & Harmonisation Domain knowledge & assumptions Data Analysis & Modelling Knowledge Discovery & Model training Disease signatures & patient groups Variables dependencies & prediction models Final DAG (based on MCMC&DP, threshold=0.5) Age ParCHD Procedures ExIntoler Cyanosis CPBP CPArrhy CPConcl CPTermRsn BSA TPVRegurg TriRegurg RVD RedRV PSMotion RestrPatt AVBlock SupravArrhy VentricArrhy Age ParCHD Procedures ExIntoler Cyanosis CPBP CPArrhy CPConcl CPTermRsn BSA TPVRegurg TriRegurg RVD RedRV PSMotion RestrPatt AVBlock SupravArrhy VentricArrhy 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Biomarker based personalized acquisition
  • 33.  Disease signatures: latent factors characterizing disease ◦ Patterns over the most relevant disease variables, e.g., biomarkers ◦ Several approaches (probabilistic latent factor analysis, well established ML argorithms)  Predictive analysis: Patient classification or regression for categorization and outcome analysis  Descriptive analysis: clustering algorithms & probabilistic (mixed) membership models  Similarity Analysis: patients “like” me or mine (patient/clinician role) Data Mining & KDD
  • 34. Classification (model training & pattern discovery)
  • 35. Data analytics flow to P. Medicine Domain knowledge & assumptions Data Analysis & Modelling Knowledge Discovery & Model training Disease signatures & patient groups Variables dependencies & prediction models Final DAG (based on MCMC&DP, threshold=0.5) Age ParCHD Procedures ExIntoler Cyanosis CPBP CPArrhy CPConcl CPTermRsn BSA TPVRegurg TriRegurg RVD RedRV PSMotion RestrPatt AVBlock SupravArrhy VentricArrhy Age ParCHD Procedures ExIntoler Cyanosis CPBP CPArrhy CPConcl CPTermRsn BSA TPVRegurg TriRegurg RVD RedRV PSMotion RestrPatt AVBlock SupravArrhy VentricArrhy 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Transformed & validated data Data Curation & Exploration Cleaning, profiling & pre-processing Biomarker based personalized acquisition TOP-DOWNBOTTOM-UP Data Management & Harmonisation Individualized diagnosis, prognosis & treatment plan Precision Medicine Support Reasoning, Simulation & DSS Clinical workflows For a particular patient Unknown / missing data Predict value of missing variable
  • 36.  NEUROLOGICAL AND NEUROMUSCULAR DISEASE (NND) Use-case: Automatic classification of 7 Joint Movement Patterns based on kinematic data.  Training: on specific extracted features or raw gait analysis waveforms (time series)  Cross-Validation: Stratified 10-fold  Method: Random Forests, kNN  Results: Models prediction accuracies>85% Classification (categorization)
  • 37. Aim: To predict early disease outcome in JIA using baseline variables Analysis: Random Forest algorithm on three datasets Conclusion: Difficulty identifying patients who remained active Clinical (acc=0.6) Clinical with Luminex (0.57) Clinical with microbiota (0.52) Classification (outcome prediction)
  • 38. Individualized diagnosis, prognosis & treatment plan Data analytics flow to P. Medicine Precision Medicine Support Reasoning, Simulation & DSS Clinical workflows For a particular patient Unknown / missing data Predict value of missing variable Transformed & validated data Data Curation & Exploration Cleaning, profiling & pre-processing TOP-DOWNBOTTOM-UP Data Management & Harmonisation Domain knowledge & assumptions Data Analysis & Modelling Knowledge Discovery & Model training Disease signatures & patient groups Variables dependencies & prediction models Final DAG (based on MCMC&DP, threshold=0.5) Age ParCHD Procedures ExIntoler Cyanosis CPBP CPArrhy CPConcl CPTermRsn BSA TPVRegurg TriRegurg RVD RedRV PSMotion RestrPatt AVBlock SupravArrhy VentricArrhy Age ParCHD Procedures ExIntoler Cyanosis CPBP CPArrhy CPConcl CPTermRsn BSA TPVRegurg TriRegurg RVD RedRV PSMotion RestrPatt AVBlock SupravArrhy VentricArrhy 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Biomarker based personalized acquisition
  • 39. Probabilistic Modeling for statistical simulation Modelling Dependency Analysis Inference
  • 40. Probabilistic Modeling for statistical simulation Finding most important dependencies and independencies: e.g. disDur, neutro,pga are almost uncorrelated and excluded Qualitative dependency analysis: Learning the structure (DAG) Quantitative analysis: Learning model parameters (Cond. Prob.)
  • 41. Data analytics flow to P. Medicine Domain knowledge & assumptions Data Analysis & Modelling Knowledge Discovery & Model training Disease signatures & patient groups Variables dependencies & prediction models Final DAG (based on MCMC&DP, threshold=0.5) Age ParCHD Procedures ExIntoler Cyanosis CPBP CPArrhy CPConcl CPTermRsn BSA TPVRegurg TriRegurg RVD RedRV PSMotion RestrPatt AVBlock SupravArrhy VentricArrhy Age ParCHD Procedures ExIntoler Cyanosis CPBP CPArrhy CPConcl CPTermRsn BSA TPVRegurg TriRegurg RVD RedRV PSMotion RestrPatt AVBlock SupravArrhy VentricArrhy 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Transformed & validated data Data Curation & Exploration Cleaning, profiling & pre-processing Biomarker based personalized acquisition TOP-DOWNBOTTOM-UP Data Management & Harmonisation Individualized diagnosis, prognosis & treatment plan Precision Medicine Support Reasoning, Simulation & DSS Clinical workflows For a particular patient Unknown / missing data Predict value of missing variable
  • 44. tmj active very small sample - Bad prognosis - Aggressive treatment
  • 45. What if.. A new patient with 2 act. knee joints & symmetry
  • 46. What if.. Therapy = inject ?
  • 47. What if.. Therapy = MTX -same percentage -worse prognosis
  • 48. What about domain knowledge?? Data Analysis & Modelling Knowledge Discovery & Model training Disease signatures & patient groups Variables dependencies & prediction models Final DAG (based on MCMC&DP, threshold=0.5) Age ParCHD Procedures ExIntoler Cyanosis CPBP CPArrhy CPConcl CPTermRsn BSA TPVRegurg TriRegurg RVD RedRV PSMotion RestrPatt AVBlock SupravArrhy VentricArrhy Age ParCHD Procedures ExIntoler Cyanosis CPBP CPArrhy CPConcl CPTermRsn BSA TPVRegurg TriRegurg RVD RedRV PSMotion RestrPatt AVBlock SupravArrhy VentricArrhy 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Transformed & validated data Data Curation & Exploration Cleaning, profiling & pre-processing Biomarker based personalized acquisition TOP-DOWNBOTTOM-UP Data Management & Harmonisation Individualized diagnosis, prognosis & treatment plan Precision Medicine Support Reasoning, Simulation & DSS Clinical workflows For a particular patient Unknown / missing data Predict value of missing variable Domain knowledge & assumptions
  • 49. BIO-KNOWLEDGE ASSOCIATION MAP Multi View Topic Modelling NLP & Named Entity Recognition Semantic bio terms (PDBCodes, chem2Bio2RDF, LODD) Generate specific association maps for different types of entities (e.g., genes, MESH, proteins, drugs) Annotate publications with bioterms (genes, pdbCodes etc) Full text PubMed papers and meta data including MESH Identify multi modal topics & quantify associations Text pdbCode cancer tumor growth breast lines apoptosis tumors prostate kinase 1m17 2ity 1qcf binding dna brca brct cancer res mutations domain results 1tsr 1ycs 2ac0 1gzh nef hiv vpr ssb virus felv hck replication ssdna 1eyg 2nef 1jmc 1m8l 1efn 1izn
  • 50.  Analyze large collections of documents, and meta- data to:  identify active areas of research: discover hidden themes (topics)  understand what is actually produced: project the output to the reduced topic space (calc topic distributions per document or other entity (e.g. gene or protein)  create association maps (interaction networks ) among different entities (e.g., genes, drugs, diseases, proteins) • promote target identification: “Pathway expansion” for no ‘druggable’ targets, multi-target drugs, drug repositioning (indication expansion)  identify emerging research areas , e.g., target identification, or the understanding of disease mechanisms: create new therapeutic opportunities  assess coverage, identify gaps or new therapeutic opportunities: compare funded research, patents Mining scientific literature WHY
  • 51. What is involved… Extract features and annotate (enrich) content using NLP, Named Entity Recognition & Semantic Annotation Tokenize, remove stop words Refine stop words for specific domain 1 ENRICH & PRE-PROCESS Identify topics: distribution over words & “side” information Automatic topic curation & entitling Assign topics to publications Evaluate & categorize topics Assess topic labels 2 FIND TOPICS Calculate topic proportions & trends of objects based on their publications Calculate similarity among different entities based on various metrics Analyze & Validate the results 3 CALCULATE TRENDS & SIMILARITIES Create WEB interactive visualization with data driven graphs, charts and layouts Design optimal views Validate modeling results 4 VISUALIZE
  • 52.  Probabilistic Multi-View Topic Modeling of Text-Augmented Heterogeneous Information Networks  interconnected (linked) entities which characterized by TEXT and related side information & links (e.g., taxonomies, venues, projects / research areas, citations, authors)  side-information:  structured or unstructured attributes and meta-data  links / relations: e.g., authorship network, citation network  Incomplete, noisy or not related to textual attributes Methodology
  • 53. Multi-View Topic Modeling Text gene cells expression vector aav vectors dna therapy figure cell target gfp targeting delivery diseases Phrases gene therapy gene transfer aav vectors lentiviral vectors Grants PERSIST: Persisting Transgenesis AAVEYE: GENE THERAPY FOR INHERITED SEVERE PHOTORECEPTOR DISEASES MESH Descriptors Genetic Vectors Lentivirus Genetic Therapy Dependovirus Green Fluorescent Proteins Journals Molecular therapy Research Areas Biotechnology, generic tools and medical technologies for human health Expert: What is this Topic about?? Diagnostics and treatment development: Gene therapy & genetic vectors
  • 54. Multi-View Topic Modeling Infectious diseases: HIV and NEF protein Text hiv cells cell nef viral virus bst gag infected drug vpu gfp assembly surface cellular Phrases gfp cells hela cells Infected cells plasma membrane Grants HIV ACE: Targeting assembly of infectious HIV particles INEF: Inhibiting Nef: a novel drug target for HIV-host interactions MESH Descriptors HIV-1 Antigens, CD Cell Membrane Membrane Glycoproteins Journals plos pathogens Research Areas HEALTH-2007-2 [Translating research for human health] PDB codes 2NEF, 1M8ML, 1EFN
  • 55. Similarity & Graph clustering Topics & allocations Modelling LINKS represent topic based similarity NODES may represent drugs, PDBCodes, genes or MeSH terms Size: ~ # of publications Categories may represent Anatomical Therapeutic Chemical (ATC) class, Biological Process, MeSH hierarchy etc
  • 56. e-Infrastructures & data repositories Use Domain Knowledge to • Enhance Patient Similarity Analysis • Promote Decision Support Clinical data clouds • Analyze clinical data to validate findings

Editor's Notes

  1. Probably this should be analyzed on the other section
  2. Probably this should be analyzed on the other section
  3. Probably this should be analyzed on the other section
  4. E
  5. 4-8 slides
  6. Probably this should be analyzed on the other section
  7. analyze the content, structure, and relationships within data to uncover patterns and rules, inconsistencies, anomalies, and redundancies and automate curation process using a variety of advanced data cleaning methods
  8. analyze the content, structure, and relationships within data to uncover patterns and rules, inconsistencies, anomalies, and redundancies and automate curation process using a variety of advanced data cleaning methods
  9. a histogram of variable JADAS-71 (shows outliers with high values) a plot of JADAS-71 (Juvenile arthritis disease activity score, on 71 joints) against CHAQ-score (Childhood Health Assessment Questionnaire), both indications of disease severity (outliers in red). a line graph between weight and height (showing their obvious correlation). The graphs are interactive.
  10. DCV Data Cleaning Rule on JIA . Discrepancy found between the two variables that represent the outcome after 6 months. There is one violation (see red above) in the mapping between columns Outcome[29] and Outcome dichotomised[30] – we want to utilise column 30 for now. The correct mapping between these variables is: clinical inactive disease —> 1 persistent activity —> 0 disease flare —> 0
  11. Showing discretisation of Microbiota variables in quartiles.
  12. Biomarker: “a characteristic that is objectively measured and evaluated as an indicator of normal biologic processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention” Thus, biomarkers refer to single measurements able to improve differential diagnosis, track disease progression and measure treatment efficiency whereas disease signature involve multiple (multi or single modal) measurements that form a specific pattern
  13. 38