SlideShare a Scribd company logo
1 of 42
Download to read offline
How Data Commons Are Changing the Way That Large
Biomedical Datasets are Analyzed and Shared
Robert L. Grossman
Center for Data Intensive Science
University of Chicago
& Open Commons Consortium
February 12, 2018
Molecular Med Tri Conf
1. What is a Data Commons?
NCI Genomic Data Commons*
• The GDC makes over 2.5
PB of cancer genomics
data available to the
research community.
• The data is harmonized
by processing with a
common set of
bioinformatics pipelines.
• Each month, the GDC is
used by over 20,000
users and over 2 PB of
data is downloaded.
• The GDC is based upon
an open source
software stack that can
be used to build other
data commons.*See: NCI Genomic Data Commons: Grossman, Robert L., et al. "Toward a shared vision for cancer
genomic data." New England Journal of Medicine 375.12 (2016): 1109-1112.
The GDC consists of: 1) a data exploration & visualization portal (DAVE), 2) a
data submission portal, 3) a data analysis and harmonization system system, 4)
an API so third party can build applications.
A
B
C
D
Systems 1 & 2: Data Portals to Explore and Submit Data
See: NCI Genomic Data Commons: Grossman, Robert L., et al. "Toward a shared vision for cancer
genomic data." New England Journal of Medicine 375.12 (2016): 1109-1112.
• MuSE
(MD Anderson)
• VarScan2 (Washington
Univ.)
• SomaticSniper
(Washington Univ.)
• MuTect2
(Broad Institute)
Source: Zhenyu Zhang, et. al. and the GDC Project Team, Uniform Genomic Data Analysis in
the NCI Genomic Data Commons, to appear.
System 3: Data Harmonization System To Analyze all of the
Submitted Data with a Common Pipelines
System 4: An API to Support User Defined Applications and
Notebooks to Create a Data Ecosystem
https://gdc-api.nci.nih.gov/files/5003adf1-1cfd-467d-8234-0d396422a4ee?fields=state
API URL Endpoint Optional Entity ID Query parameters
• Based upon a (graph-based) data model
• Drives all internally developed applications, e.g. data portal
• Allows third parties to develop their own applications
• Can be used by other commons, by workspaces, by other
systems, by user-developed applications and notebooks
For more about the API, see: Shane Wilson, Michael Fitzsimons, Martin Ferguson, Allison Heath, Mark Jensen, Josh Miller, Mark W. Murphy, James
Porter, Himanso Sahni, Louis Staudt, Yajing Tang, Zhining Wang, Christine Yu, Junjun Zhang, Vincent Ferretti and Robert L. Grossman, Developing
Cancer Informatics Applications and Tools Using the NCI Genomic Data Commons API, Cancer Research, volume 77, number 21, 2017, pages e15-e18.
What is a Data Commons?
Data commons co-locate data with cloud computing infrastructure and
commonly used software services, tools & apps for managing, analyzing and
sharing data to create an interoperable resource for the research community.*
*Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson and Walt Wells, A Case for Data Commons Towards Data Science as a Service, IEEE
Computing in Science and Engineer, 2016. Source of image: The CDIS, GDC, & OCC data commons infrastructure at a University of Chicago data center.
Research ethics
committees (RECs) review
the ethical acceptability of
research involving human
participants. Historically,
the principal emphases of
RECs have been to protect
participants from physical
harms and to provide
assurance as to
participants’ interests and
welfare.*
[The Framework] is
guided by, Article 27 of
the 1948 Universal
Declaration of Human
Rights. Article 27
guarantees the rights
of every individual in
the world "to share in
scientific advancement
and its benefits"
(including to freely
engage in responsible
scientific inquiry)…*
Protect patients
The right of
patients to benefit
from research.
*GA4GH Framework for Responsible Sharing of Genomic and Health-Related Data, see goo.gl/CTavQR
Data sharing with protections provides the evidence
so patients can benefit from advances in research.
Data commons balance protecting patient data with open
research that benefits patients:
• Supports big data & data
intensive computing with
cloud computing
• Researchers can analyze data
with collaborative tools
(workspaces) – i. e. data does
not have to be downloaded)
• Data repository
• Researchers
download data.
Databases
Data Clouds
Data Commons
• Supports big data
• Workspaces
• Common data models
• Core data services
• Data & Commons
Governance
• Harmonized data
• Data sharing
• Reproducible research
1982 - present
2010 - 2020
2014 - 2024
2. The Gen3 Data Commons Platform
OCC Open Science Data Cloud (2010)
OCC – NASA Project Matsu (2009)
NCI Genomic Data Commons* (2016)
OCC-NOAA Environmental Data
Commons (2016)
OCC BloodPAC (2017)
Bionimbus Protected Data Cloud* (2013)
*Operated under a subcontract from NCI / Leidos Biomedical
to the University of Chicago with support from the OCC.
** CHOP is the lead, with the University of Chicago developing
a Gen3 Data Commons for the project.
Kids First (2017)**
Gen3
Gen2
Gen1
OCC is the Open
Commons Consortium
NCI CRDC (2017)
Brain Commons (2017)
cdis.uchicago.edu
• Open source
• Designed to support
project specific data
commons
• Designed to support
an ecosystem of
commons, workspaces,
notebooks &
applications.
• We are starting to
build an open source
Gen3 community.
• Cloud agnostic,
including your own
private cloud.
The Gen3 is Based on Widely Used Open Source Tech
Gen3 is Cloud Agnostic
• The Gen3 service for Digital IDs (indexd) enables Gen3 managed
digital objects to be on 1, 2, 3 or more private or public clouds.
• Gen3 client applications can then retrieve the digital object from a
local or remote cloud as required.
The Gen3 Data Model
is customizable &
extensible. The Gen3 Data
Model extends the GDC
data model.
Object-based
storage with access
control lists
Scalable workflows
Community
data
products
Data Commons Framework Services (Digital ID, Metadata, Authentication, Auth.,
etc.) that support multiple data commons.
Apps
Database
services
Data Commons 1
Data Commons 2
Portals for
accessing &
submitting
data
Workspaces
APIs
Data Commons Framework Services
Workspaces
Workspaces
Notebooks
Apps
Apps & Notebooks
Gen3 Framework
Services are designed to
support multiple Gen3
Data Commons
Core Gen3 Data Commons Framework Services
• Digital ID services
• Metadata services
• Authentication services
• Authorization services
• Data model driven APIs for submitting, searching & accessing data
• Designed to span multiple data commons
• Designed to support multiple private and commercial clouds
• In the future, we will support portable workspaces
NCI Clouds
Resources
Compliant
appsBionimbus,
Cancer Collab.,
etc.
FAIR Principles
Your Data Commons
Other data commons
Commons
Services
Operations
Center
Commons
services
Commons Services
Framework
appapp
app
Data Commons (bloodpac.org)
BloodPAC is a public-private
consortium for liquid biopsy data
developed using Gen3 technology by
the Open Commons Consortium
(OCC) containing:
1. Datasets from circulating tumor
cells, circulating tumor DNA, and
exosome assays
2. Relevant clinical data (e.g. clinical
diagnosis, treatment history and
outcomes)
3. Sample preparation and handling
protocols from different studies.
• The BRAIN Commons is a commons for PTSD, TBI, major depressive
disorders, and other neurological diseases.
• The images above show a meta-analysis of MRI structural measures
in PTSD by Dr. R. Morey, Duke University performed in the commons.
(www.braincommons.org)
• Clinical data
• Imaging data
• Genomic data
• Biospecimen data
• Wearable data
Data types supported by the BRAIN Commons
Linked to CDISC &
other relevant
standards
3. Developing Your Own Data Commons Using
the OCC Data Commons Framework
www.occ-data.org
• U.S based 501(c)(3) not-for-profit corporation founded in 2008.
• The OCC manages data commons to support medical and health care
research, including the BloodPAC Data Commons and the BRAIN
Commons.
• The OCC manages data commons and cloud computing infrastructure
to support more general scientific research, including the OCC NOAA
Environmental Data Commons and the Open Science Data Cloud.
• It is international and includes universities, not-for-profits, companies
and government agencies.
Sharing Data with Data Commons – the Main Steps
1. Require data sharing. Put data sharing requirements into your
project or consortium agreements.
2. Build a commons. Set up, work with others to set up, or join an
existing data commons, fund it, and develop an operating plan,
governance structure, and a sustainability plan.
3. Populate the commons. Provide resources to your data
generators to get the data into data commons.
4. Interoperate with other commons. Interoperate with other
commons that can accelerate research discoveries.
5. Support commons use. Support the development of third party
apps that can make discoveries over your commons.
Open Source Software for
Data Commons
Third party open
source apps
Third party vendor
apps
Sponsor developed apps
Public Clouds
Data
Commons
Governance &
Standards
On Premise Clouds
Commons
Services
Operations
Center
Data managed by the data commons
Sponsor or Co-Sponsors
OCC Data Commons Framework occ-data.org
1
2
3
45
Research groups
submit data
Clean and
process the
data following
the standards
Researchers use
the commons for
data analysis
Adapt the data
model to your
project
Example: Building the Data Commons Using
the OCC Data Commons Framework
Set up &
configure the
data commons
(CSOC)
Put in place the
OCC data
governance model
New research
discoveries
Researcher /
Working Group
Embargo
Consortium
Embargo
Broad Research
Community
Counts
only
Analyzed,
higher level
data
Raw data
What data?
To whom & when?
Infrastructure as a Service (virtual machines)
Platform as a Service (containers)
Software as a Service (software applications hosted by the commons)
Query Gateway
Counts
Approved
Queries
Approved Tools
& Services
Approved
Infrastructure
What service
model?
Public
Various Data sharing Models Are Supported by Data Commons
• It is important to note that with the Gen3 platform, multiple geographically
distributed data commons can interoperate in different ways:
o through data peering
o through a FAIR-based set of APIs for applications
o through scattering queries/analyses and gathering the results
o through a controlled and monitored query/analysis gateway
The Commons Alliance: Three Large Scale Data Commons Working
Towards Common APIs to Create to Create de Facto Standards
1. NCI Cloud CRDC
Framework Services / NCI
GDC (UChicago / Broad)
2. NIH All of Us (Broad /
Verily)
3. CZI HCA Data Platform
(UCSC/Broad)
For more information, see: Josh Denny, David Glazer, Robert L. Grossman, Benedict Paten & Anthony Philippakis, A Data
Biosphere for Biomedical Research, https://medium.com/@benedictpaten/a-data-biosphere-for-biomedical-research-
d212bbfae95d. Also available at: https://goo.gl/9CySeo
Standards for Genomic & Health Data
www.ga4gh.org
Six Options for Building a Data Commons
1. Add your project data to an existing data commons.
2. Join an existing partnership that serves multiple related diseases based
upon the Gen3 Data Commons platform or other platform designed with
similar goals
3. Build your own data commons and peer with an existing partnership of
data commons developed using the Gen3 Data Commons Platform or
other platform designed with similar goals.
4. Build your own data commons that follow standards for interoperating
data commons, such as the Commons Alliance standards.
5. Build your own data commons that follow standards for research data,
such as the FAIR standards.
6. Build your own data commons with a new architecture and new
standards and get others to adapt your standards.
4. Summary and Conclusion
Benefits of Data Commons and Data Sharing
1. The data is available to other researchers for discovery, which moves
the research field faster.
2. Data commons support repeatable, reproducible and open research.
3. Some diseases are dependent upon having a critical mass of data to
provide the required statistical power for the scientific evidence (e.g.
to study combinations of rare mutations in cancer)
4. Data commons can interoperate with each other so that over time
data sharing can benefit from a “network effect”
5. With more data, smaller effects can be studied (e.g. to understand
the effect of environmental factors on disease) and machine learning
techniques that require large data can developed.
Data Commons
2014 - 2024
Data Clouds
2010 - 2020
Data Ecosystems
2018 - 2028
Databases
1982 - present
Summary
1. Data commons co-locate data with cloud computing infrastructure and
commonly used software services, tools & apps for managing,
analyzing and sharing data to create an interoperable resource for the
research community.
2. Data commons provide a platform for open data, open science and
reproducible research that provide the data sharing infrastructure to
support precision medicine.
3. The Gen3 platform is an open source platform that is creating an
ecosystem of data commons, applications and resources.
4. The independent not-for-profit 501(c)(3) Open Commons Consortium
can help you set up your own data commons to support data sharing
for precision medicine.
Questions?
To get involved:
• Open Commons Consortium to help you build a data commons
o occ-data.org
• Gen3 Data Commons software stack
o cdis.uchicago.edu
• NCI Genomic Data Commons
o gdc.cancer.gov
• BRAIN Commons
o braincommons.org
To learn more about some of the data commons:
• BloodPAC
o bloodpac.org
• OCC-NOAA Environmental Data Commons
o edc.occ-data.org
For more information:
• To learn more about data commons: Robert L. Grossman, et. al. A Case for Data Commons: Toward Data Science
as a Service, Computing in Science & Engineering 18.5 (2016): 10-20. Also https://arxiv.org/abs/1604.02608
• To large more about large scale, secure compliant cloud based computing environments for biomedical data, see:
Heath, Allison P., et al. "Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets." Journal
of the American Medical Informatics Association 21.6 (2014): 969-975. This article describes Bionimbus Gen1.
• To learn more about the NCI Genomic Data Commons: Grossman, Robert L., et al. "Toward a shared vision for
cancer genomic data." New England Journal of Medicine 375.12 (2016): 1109-1112. The GDC was developed
using Bionimbus Gen2.
• To learn more about BloodPAC: Grossman, R. L., et al. "Collaborating to compete: Blood Profiling Atlas in Cancer
(BloodPAC) Consortium." Clinical Pharmacology & Therapeutics (2017). BloodPAC was developed using the GDC
Community Edition (CE) aka Bionimbus Gen3
• To learn about the GDC / Gen3 API: Shane Wilson, Michael Fitzsimons, Martin Ferguson, Allison Heath, Mark
Jensen, Josh Miller, Mark W. Murphy, James Porter, Himanso Sahni, Louis Staudt, Yajing Tang, Zhining Wang,
Christine Yu, Junjun Zhang, Vincent Ferretti and Robert L. Grossman, Developing Cancer Informatics Applications
and Tools Using the NCI Genomic Data Commons API, Cancer Research, volume 77, number 21, 2017, pages e15-
e18.
• To learn more about the de facto standards being developed by the Commons Alliance: Josh Denny, David Glazer,
Robert L. Grossman, Benedict Paten, Anthony Philippakis, A Data Biosphere for Biomedical Research,
https://medium.com/@benedictpaten/a-data-biosphere-for-biomedical-research-d212bbfae95d
Abstract
Biomedical data has grown too large for most research groups to host
and analyze the data from large projects themselves. Data commons
provide an alternative by co-locating data, storage and computing
resources with commonly used software services, applications and
tools for analyzing, harmonizing and sharing data to create an
interoperable resource for the research community. We give an
overview of data commons and describe some lessons learned from
the NCI Genomic Data Commons, the BloodPAC Data Commons and
the BRAIN Commons. We also give an overview of how an organization
can set up a commons themselves.
cdis.uchicago.edu
Robert L. Grossman
rgrossman.com
@BobGrossman
robert.grossman@uchicago.edu
Contact Information
occ-data.org

More Related Content

What's hot

Fine-tuning BERT for Question Answering
Fine-tuning BERT for Question AnsweringFine-tuning BERT for Question Answering
Fine-tuning BERT for Question AnsweringApache MXNet
 
Design and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative RebalancingDesign and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative Rebalancingconfluent
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsKetan Gote
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry confluent
 
Using Kafka to scale database replication
Using Kafka to scale database replicationUsing Kafka to scale database replication
Using Kafka to scale database replicationVenu Ryali
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewenconfluent
 
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and LinkerdService Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and LinkerdKai Wähner
 
Building a Chatbot with Amazon Lex and AWS Lambda Workshop
Building a Chatbot with Amazon Lex and AWS Lambda WorkshopBuilding a Chatbot with Amazon Lex and AWS Lambda Workshop
Building a Chatbot with Amazon Lex and AWS Lambda WorkshopAmazon Web Services
 
MySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentMySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentJean-François Gagné
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache KafkaPaul Brebner
 
RedHat Virtualization Manager
RedHat Virtualization ManagerRedHat Virtualization Manager
RedHat Virtualization ManagerRaz Tamir
 
Build an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfBuild an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfStephenAmell4
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
 
GPT and other Text Transformers: Black Swans and Stochastic Parrots
GPT and other Text Transformers:  Black Swans and Stochastic ParrotsGPT and other Text Transformers:  Black Swans and Stochastic Parrots
GPT and other Text Transformers: Black Swans and Stochastic ParrotsKonstantin Savenkov
 
Citi Tech Talk Disaster Recovery Solutions Deep Dive
Citi Tech Talk  Disaster Recovery Solutions Deep DiveCiti Tech Talk  Disaster Recovery Solutions Deep Dive
Citi Tech Talk Disaster Recovery Solutions Deep Diveconfluent
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language modelJiWenKim
 
Huawei - Making the World Smaller with Small Cell
Huawei  - Making the World Smaller with Small CellHuawei  - Making the World Smaller with Small Cell
Huawei - Making the World Smaller with Small CellSmall Cell Forum
 
HBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraphHBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraphHBaseCon
 
IBM Power9 Features and Specifications
IBM Power9 Features and SpecificationsIBM Power9 Features and Specifications
IBM Power9 Features and Specificationsinside-BigData.com
 

What's hot (20)

Fine-tuning BERT for Question Answering
Fine-tuning BERT for Question AnsweringFine-tuning BERT for Question Answering
Fine-tuning BERT for Question Answering
 
Design and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative RebalancingDesign and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative Rebalancing
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
 
Using Kafka to scale database replication
Using Kafka to scale database replicationUsing Kafka to scale database replication
Using Kafka to scale database replication
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
 
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and LinkerdService Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
 
Building a Chatbot with Amazon Lex and AWS Lambda Workshop
Building a Chatbot with Amazon Lex and AWS Lambda WorkshopBuilding a Chatbot with Amazon Lex and AWS Lambda Workshop
Building a Chatbot with Amazon Lex and AWS Lambda Workshop
 
MySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentMySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated Environment
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
RedHat Virtualization Manager
RedHat Virtualization ManagerRedHat Virtualization Manager
RedHat Virtualization Manager
 
Build an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfBuild an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdf
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
GPT and other Text Transformers: Black Swans and Stochastic Parrots
GPT and other Text Transformers:  Black Swans and Stochastic ParrotsGPT and other Text Transformers:  Black Swans and Stochastic Parrots
GPT and other Text Transformers: Black Swans and Stochastic Parrots
 
Citi Tech Talk Disaster Recovery Solutions Deep Dive
Citi Tech Talk  Disaster Recovery Solutions Deep DiveCiti Tech Talk  Disaster Recovery Solutions Deep Dive
Citi Tech Talk Disaster Recovery Solutions Deep Dive
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language model
 
Huawei - Making the World Smaller with Small Cell
Huawei  - Making the World Smaller with Small CellHuawei  - Making the World Smaller with Small Cell
Huawei - Making the World Smaller with Small Cell
 
HBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraphHBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraph
 
IBM Power9 Features and Specifications
IBM Power9 Features and SpecificationsIBM Power9 Features and Specifications
IBM Power9 Features and Specifications
 

Similar to How Data Commons Are Changing Biomedical Data Analysis

How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchRobert Grossman
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsVivien Bonazzi
 
Data commons bonazzi bd2 k fundamentals of science feb 2017
Data commons bonazzi   bd2 k fundamentals of science feb 2017Data commons bonazzi   bd2 k fundamentals of science feb 2017
Data commons bonazzi bd2 k fundamentals of science feb 2017Vivien Bonazzi
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Vivien Bonazzi
 
The Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big DataThe Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big DataPhilip Bourne
 
BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands Vivien Bonazzi
 
EMBL Australian Bioinformatics Resource AHM - Data Commons
EMBL Australian Bioinformatics Resource AHM   - Data CommonsEMBL Australian Bioinformatics Resource AHM   - Data Commons
EMBL Australian Bioinformatics Resource AHM - Data CommonsVivien Bonazzi
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonAfrican Open Science Platform
 
Komatsoulis internet2 executive track
Komatsoulis internet2 executive trackKomatsoulis internet2 executive track
Komatsoulis internet2 executive trackGeorge Komatsoulis
 
Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharingJisc RDM
 
NCI Cancer Research Data Commons - Overview
NCI Cancer Research Data Commons - OverviewNCI Cancer Research Data Commons - Overview
NCI Cancer Research Data Commons - Overviewimgcommcall
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformSanjay Padhi, Ph.D
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsRobert Grossman
 
A Framework for Geospatial Web Services for Public Health by Dr. Leslie Lenert
A Framework for Geospatial Web Services for Public Health by Dr. Leslie LenertA Framework for Geospatial Web Services for Public Health by Dr. Leslie Lenert
A Framework for Geospatial Web Services for Public Health by Dr. Leslie LenertWansoo Im
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptxmantatheralyasriy
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptxmantatheralyasriy
 
Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...
Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...
Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...Sky Bristol
 

Similar to How Data Commons Are Changing Biomedical Data Analysis (20)

How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
 
Data commons bonazzi bd2 k fundamentals of science feb 2017
Data commons bonazzi   bd2 k fundamentals of science feb 2017Data commons bonazzi   bd2 k fundamentals of science feb 2017
Data commons bonazzi bd2 k fundamentals of science feb 2017
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2
 
The Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big DataThe Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big Data
 
BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands
 
EMBL Australian Bioinformatics Resource AHM - Data Commons
EMBL Australian Bioinformatics Resource AHM   - Data CommonsEMBL Australian Bioinformatics Resource AHM   - Data Commons
EMBL Australian Bioinformatics Resource AHM - Data Commons
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon Hodson
 
Komatsoulis internet2 executive track
Komatsoulis internet2 executive trackKomatsoulis internet2 executive track
Komatsoulis internet2 executive track
 
Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharing
 
NCI Cancer Research Data Commons - Overview
NCI Cancer Research Data Commons - OverviewNCI Cancer Research Data Commons - Overview
NCI Cancer Research Data Commons - Overview
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data Platforms
 
A Framework for Geospatial Web Services for Public Health by Dr. Leslie Lenert
A Framework for Geospatial Web Services for Public Health by Dr. Leslie LenertA Framework for Geospatial Web Services for Public Health by Dr. Leslie Lenert
A Framework for Geospatial Web Services for Public Health by Dr. Leslie Lenert
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
 
Shifting the goal post – from high impact journals to high impact data
 Shifting the goal post – from high impact journals to high impact data Shifting the goal post – from high impact journals to high impact data
Shifting the goal post – from high impact journals to high impact data
 
Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...
Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...
Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...
 

More from Robert Grossman

Some Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanySome Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanyRobert Grossman
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataRobert Grossman
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016Robert Grossman
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Robert Grossman
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...Robert Grossman
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...Robert Grossman
 
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Robert Grossman
 
Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Robert Grossman
 
Practical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsPractical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsRobert Grossman
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? Robert Grossman
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Robert Grossman
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Robert Grossman
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?Robert Grossman
 
Adversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World TalkAdversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World TalkRobert Grossman
 
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataRobert Grossman
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceRobert Grossman
 
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)Robert Grossman
 

More from Robert Grossman (20)

Some Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanySome Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your Company
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
 
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
 
Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)
 
Practical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsPractical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large Datasets
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?
 
Adversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World TalkAdversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World Talk
 
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of Science
 
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
 

Recently uploaded

Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 

Recently uploaded (20)

Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 

How Data Commons Are Changing Biomedical Data Analysis

  • 1. How Data Commons Are Changing the Way That Large Biomedical Datasets are Analyzed and Shared Robert L. Grossman Center for Data Intensive Science University of Chicago & Open Commons Consortium February 12, 2018 Molecular Med Tri Conf
  • 2. 1. What is a Data Commons?
  • 3. NCI Genomic Data Commons* • The GDC makes over 2.5 PB of cancer genomics data available to the research community. • The data is harmonized by processing with a common set of bioinformatics pipelines. • Each month, the GDC is used by over 20,000 users and over 2 PB of data is downloaded. • The GDC is based upon an open source software stack that can be used to build other data commons.*See: NCI Genomic Data Commons: Grossman, Robert L., et al. "Toward a shared vision for cancer genomic data." New England Journal of Medicine 375.12 (2016): 1109-1112. The GDC consists of: 1) a data exploration & visualization portal (DAVE), 2) a data submission portal, 3) a data analysis and harmonization system system, 4) an API so third party can build applications.
  • 5. Systems 1 & 2: Data Portals to Explore and Submit Data See: NCI Genomic Data Commons: Grossman, Robert L., et al. "Toward a shared vision for cancer genomic data." New England Journal of Medicine 375.12 (2016): 1109-1112.
  • 6. • MuSE (MD Anderson) • VarScan2 (Washington Univ.) • SomaticSniper (Washington Univ.) • MuTect2 (Broad Institute) Source: Zhenyu Zhang, et. al. and the GDC Project Team, Uniform Genomic Data Analysis in the NCI Genomic Data Commons, to appear. System 3: Data Harmonization System To Analyze all of the Submitted Data with a Common Pipelines
  • 7. System 4: An API to Support User Defined Applications and Notebooks to Create a Data Ecosystem https://gdc-api.nci.nih.gov/files/5003adf1-1cfd-467d-8234-0d396422a4ee?fields=state API URL Endpoint Optional Entity ID Query parameters • Based upon a (graph-based) data model • Drives all internally developed applications, e.g. data portal • Allows third parties to develop their own applications • Can be used by other commons, by workspaces, by other systems, by user-developed applications and notebooks For more about the API, see: Shane Wilson, Michael Fitzsimons, Martin Ferguson, Allison Heath, Mark Jensen, Josh Miller, Mark W. Murphy, James Porter, Himanso Sahni, Louis Staudt, Yajing Tang, Zhining Wang, Christine Yu, Junjun Zhang, Vincent Ferretti and Robert L. Grossman, Developing Cancer Informatics Applications and Tools Using the NCI Genomic Data Commons API, Cancer Research, volume 77, number 21, 2017, pages e15-e18.
  • 8. What is a Data Commons? Data commons co-locate data with cloud computing infrastructure and commonly used software services, tools & apps for managing, analyzing and sharing data to create an interoperable resource for the research community.* *Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson and Walt Wells, A Case for Data Commons Towards Data Science as a Service, IEEE Computing in Science and Engineer, 2016. Source of image: The CDIS, GDC, & OCC data commons infrastructure at a University of Chicago data center.
  • 9. Research ethics committees (RECs) review the ethical acceptability of research involving human participants. Historically, the principal emphases of RECs have been to protect participants from physical harms and to provide assurance as to participants’ interests and welfare.* [The Framework] is guided by, Article 27 of the 1948 Universal Declaration of Human Rights. Article 27 guarantees the rights of every individual in the world "to share in scientific advancement and its benefits" (including to freely engage in responsible scientific inquiry)…* Protect patients The right of patients to benefit from research. *GA4GH Framework for Responsible Sharing of Genomic and Health-Related Data, see goo.gl/CTavQR Data sharing with protections provides the evidence so patients can benefit from advances in research. Data commons balance protecting patient data with open research that benefits patients:
  • 10. • Supports big data & data intensive computing with cloud computing • Researchers can analyze data with collaborative tools (workspaces) – i. e. data does not have to be downloaded) • Data repository • Researchers download data. Databases Data Clouds Data Commons • Supports big data • Workspaces • Common data models • Core data services • Data & Commons Governance • Harmonized data • Data sharing • Reproducible research 1982 - present 2010 - 2020 2014 - 2024
  • 11. 2. The Gen3 Data Commons Platform
  • 12. OCC Open Science Data Cloud (2010) OCC – NASA Project Matsu (2009) NCI Genomic Data Commons* (2016) OCC-NOAA Environmental Data Commons (2016) OCC BloodPAC (2017) Bionimbus Protected Data Cloud* (2013) *Operated under a subcontract from NCI / Leidos Biomedical to the University of Chicago with support from the OCC. ** CHOP is the lead, with the University of Chicago developing a Gen3 Data Commons for the project. Kids First (2017)** Gen3 Gen2 Gen1 OCC is the Open Commons Consortium NCI CRDC (2017) Brain Commons (2017)
  • 13. cdis.uchicago.edu • Open source • Designed to support project specific data commons • Designed to support an ecosystem of commons, workspaces, notebooks & applications. • We are starting to build an open source Gen3 community. • Cloud agnostic, including your own private cloud.
  • 14. The Gen3 is Based on Widely Used Open Source Tech
  • 15. Gen3 is Cloud Agnostic • The Gen3 service for Digital IDs (indexd) enables Gen3 managed digital objects to be on 1, 2, 3 or more private or public clouds. • Gen3 client applications can then retrieve the digital object from a local or remote cloud as required.
  • 16. The Gen3 Data Model is customizable & extensible. The Gen3 Data Model extends the GDC data model.
  • 17. Object-based storage with access control lists Scalable workflows Community data products Data Commons Framework Services (Digital ID, Metadata, Authentication, Auth., etc.) that support multiple data commons. Apps Database services Data Commons 1 Data Commons 2 Portals for accessing & submitting data Workspaces APIs Data Commons Framework Services Workspaces Workspaces Notebooks Apps Apps & Notebooks Gen3 Framework Services are designed to support multiple Gen3 Data Commons
  • 18. Core Gen3 Data Commons Framework Services • Digital ID services • Metadata services • Authentication services • Authorization services • Data model driven APIs for submitting, searching & accessing data • Designed to span multiple data commons • Designed to support multiple private and commercial clouds • In the future, we will support portable workspaces
  • 19. NCI Clouds Resources Compliant appsBionimbus, Cancer Collab., etc. FAIR Principles Your Data Commons Other data commons Commons Services Operations Center Commons services Commons Services Framework appapp app
  • 20. Data Commons (bloodpac.org) BloodPAC is a public-private consortium for liquid biopsy data developed using Gen3 technology by the Open Commons Consortium (OCC) containing: 1. Datasets from circulating tumor cells, circulating tumor DNA, and exosome assays 2. Relevant clinical data (e.g. clinical diagnosis, treatment history and outcomes) 3. Sample preparation and handling protocols from different studies.
  • 21.
  • 22. • The BRAIN Commons is a commons for PTSD, TBI, major depressive disorders, and other neurological diseases. • The images above show a meta-analysis of MRI structural measures in PTSD by Dr. R. Morey, Duke University performed in the commons. (www.braincommons.org)
  • 23. • Clinical data • Imaging data • Genomic data • Biospecimen data • Wearable data Data types supported by the BRAIN Commons Linked to CDISC & other relevant standards
  • 24. 3. Developing Your Own Data Commons Using the OCC Data Commons Framework
  • 25. www.occ-data.org • U.S based 501(c)(3) not-for-profit corporation founded in 2008. • The OCC manages data commons to support medical and health care research, including the BloodPAC Data Commons and the BRAIN Commons. • The OCC manages data commons and cloud computing infrastructure to support more general scientific research, including the OCC NOAA Environmental Data Commons and the Open Science Data Cloud. • It is international and includes universities, not-for-profits, companies and government agencies.
  • 26. Sharing Data with Data Commons – the Main Steps 1. Require data sharing. Put data sharing requirements into your project or consortium agreements. 2. Build a commons. Set up, work with others to set up, or join an existing data commons, fund it, and develop an operating plan, governance structure, and a sustainability plan. 3. Populate the commons. Provide resources to your data generators to get the data into data commons. 4. Interoperate with other commons. Interoperate with other commons that can accelerate research discoveries. 5. Support commons use. Support the development of third party apps that can make discoveries over your commons.
  • 27. Open Source Software for Data Commons Third party open source apps Third party vendor apps Sponsor developed apps Public Clouds Data Commons Governance & Standards On Premise Clouds Commons Services Operations Center Data managed by the data commons Sponsor or Co-Sponsors OCC Data Commons Framework occ-data.org 1 2 3 45
  • 28. Research groups submit data Clean and process the data following the standards Researchers use the commons for data analysis Adapt the data model to your project Example: Building the Data Commons Using the OCC Data Commons Framework Set up & configure the data commons (CSOC) Put in place the OCC data governance model New research discoveries
  • 29. Researcher / Working Group Embargo Consortium Embargo Broad Research Community Counts only Analyzed, higher level data Raw data What data? To whom & when? Infrastructure as a Service (virtual machines) Platform as a Service (containers) Software as a Service (software applications hosted by the commons) Query Gateway Counts Approved Queries Approved Tools & Services Approved Infrastructure What service model? Public Various Data sharing Models Are Supported by Data Commons
  • 30. • It is important to note that with the Gen3 platform, multiple geographically distributed data commons can interoperate in different ways: o through data peering o through a FAIR-based set of APIs for applications o through scattering queries/analyses and gathering the results o through a controlled and monitored query/analysis gateway
  • 31. The Commons Alliance: Three Large Scale Data Commons Working Towards Common APIs to Create to Create de Facto Standards 1. NCI Cloud CRDC Framework Services / NCI GDC (UChicago / Broad) 2. NIH All of Us (Broad / Verily) 3. CZI HCA Data Platform (UCSC/Broad) For more information, see: Josh Denny, David Glazer, Robert L. Grossman, Benedict Paten & Anthony Philippakis, A Data Biosphere for Biomedical Research, https://medium.com/@benedictpaten/a-data-biosphere-for-biomedical-research- d212bbfae95d. Also available at: https://goo.gl/9CySeo
  • 32. Standards for Genomic & Health Data www.ga4gh.org
  • 33. Six Options for Building a Data Commons 1. Add your project data to an existing data commons. 2. Join an existing partnership that serves multiple related diseases based upon the Gen3 Data Commons platform or other platform designed with similar goals 3. Build your own data commons and peer with an existing partnership of data commons developed using the Gen3 Data Commons Platform or other platform designed with similar goals. 4. Build your own data commons that follow standards for interoperating data commons, such as the Commons Alliance standards. 5. Build your own data commons that follow standards for research data, such as the FAIR standards. 6. Build your own data commons with a new architecture and new standards and get others to adapt your standards.
  • 34. 4. Summary and Conclusion
  • 35. Benefits of Data Commons and Data Sharing 1. The data is available to other researchers for discovery, which moves the research field faster. 2. Data commons support repeatable, reproducible and open research. 3. Some diseases are dependent upon having a critical mass of data to provide the required statistical power for the scientific evidence (e.g. to study combinations of rare mutations in cancer) 4. Data commons can interoperate with each other so that over time data sharing can benefit from a “network effect” 5. With more data, smaller effects can be studied (e.g. to understand the effect of environmental factors on disease) and machine learning techniques that require large data can developed.
  • 36. Data Commons 2014 - 2024 Data Clouds 2010 - 2020 Data Ecosystems 2018 - 2028 Databases 1982 - present
  • 37. Summary 1. Data commons co-locate data with cloud computing infrastructure and commonly used software services, tools & apps for managing, analyzing and sharing data to create an interoperable resource for the research community. 2. Data commons provide a platform for open data, open science and reproducible research that provide the data sharing infrastructure to support precision medicine. 3. The Gen3 platform is an open source platform that is creating an ecosystem of data commons, applications and resources. 4. The independent not-for-profit 501(c)(3) Open Commons Consortium can help you set up your own data commons to support data sharing for precision medicine.
  • 39. To get involved: • Open Commons Consortium to help you build a data commons o occ-data.org • Gen3 Data Commons software stack o cdis.uchicago.edu • NCI Genomic Data Commons o gdc.cancer.gov • BRAIN Commons o braincommons.org To learn more about some of the data commons: • BloodPAC o bloodpac.org • OCC-NOAA Environmental Data Commons o edc.occ-data.org
  • 40. For more information: • To learn more about data commons: Robert L. Grossman, et. al. A Case for Data Commons: Toward Data Science as a Service, Computing in Science & Engineering 18.5 (2016): 10-20. Also https://arxiv.org/abs/1604.02608 • To large more about large scale, secure compliant cloud based computing environments for biomedical data, see: Heath, Allison P., et al. "Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets." Journal of the American Medical Informatics Association 21.6 (2014): 969-975. This article describes Bionimbus Gen1. • To learn more about the NCI Genomic Data Commons: Grossman, Robert L., et al. "Toward a shared vision for cancer genomic data." New England Journal of Medicine 375.12 (2016): 1109-1112. The GDC was developed using Bionimbus Gen2. • To learn more about BloodPAC: Grossman, R. L., et al. "Collaborating to compete: Blood Profiling Atlas in Cancer (BloodPAC) Consortium." Clinical Pharmacology & Therapeutics (2017). BloodPAC was developed using the GDC Community Edition (CE) aka Bionimbus Gen3 • To learn about the GDC / Gen3 API: Shane Wilson, Michael Fitzsimons, Martin Ferguson, Allison Heath, Mark Jensen, Josh Miller, Mark W. Murphy, James Porter, Himanso Sahni, Louis Staudt, Yajing Tang, Zhining Wang, Christine Yu, Junjun Zhang, Vincent Ferretti and Robert L. Grossman, Developing Cancer Informatics Applications and Tools Using the NCI Genomic Data Commons API, Cancer Research, volume 77, number 21, 2017, pages e15- e18. • To learn more about the de facto standards being developed by the Commons Alliance: Josh Denny, David Glazer, Robert L. Grossman, Benedict Paten, Anthony Philippakis, A Data Biosphere for Biomedical Research, https://medium.com/@benedictpaten/a-data-biosphere-for-biomedical-research-d212bbfae95d
  • 41. Abstract Biomedical data has grown too large for most research groups to host and analyze the data from large projects themselves. Data commons provide an alternative by co-locating data, storage and computing resources with commonly used software services, applications and tools for analyzing, harmonizing and sharing data to create an interoperable resource for the research community. We give an overview of data commons and describe some lessons learned from the NCI Genomic Data Commons, the BloodPAC Data Commons and the BRAIN Commons. We also give an overview of how an organization can set up a commons themselves.