SlideShare a Scribd company logo
1 of 56
Cloud Accelerated Genomics
Allen Day, PhD // Science Advocate
@allenday // #genomics #ml #datascience
Table of Contents
Section 1
Section 2
Section 3
Throughout
Getting from Research to Application… Faster
What are the bottlenecks for translating research into products?
Emphasis on information processing.
From CompBio Research to CompBio Engineering
Getting results, more of them, and predictably improving
Data Integration - Cutting Edge Use Cases
What’s happening right now in industry and academia?
How to use Google Cloud?
I’ll introduce specific cloud services, along with examples of
how they’ve been used successfully. Compute Engine,
Kubernetes, Dataflow, Cloud ML, Genomics API
How to Understand?
Linear B is a syllabic script
that was used for writing
Mycenaean Greek, the
earliest attested form of
Greek. The script predates
the Greek alphabet by
several centuries. The oldest
Mycenaean writing dates to
about 1450 BC.
Hypothetico-Deductive
Method (Iterative)
Organize
Analyze,
Interpret, and
Plan
Choose Data
Acquire
Hypothetico-Deductive
Method (Iterative)
Organize
Analyze,
Interpret, and
Plan
Choose Data
Acquire
Situation:
Not enough data.
No means to get more.
Dead Language.
Outcome:
Cannot understand.
Also:
Passive learning.
No feedback.
DNA Sequencing Value Chain
%Effort
0
100
Pre-NGS
~2000
Future
~2020
Now
Sboner, et al, 2011. The real cost of sequencing: higher than you think!
Secondary
Analytics
Analytics,
Intepretation,
Planning
Experiment
Design
DNA
Sequencing
Human Genetics Scenario
Sboner, et al, 2011. The real cost of sequencing: higher than you think!
Secondary
Analytics
Analytics,
Intepretation,
Planning
Experiment
Design
%Effort
0
100
DNA
Sequencing
Situation:
Unlimited Free DNA
Result:
Slow to understand.
Pre-NGS
~2000
Future
~2020
Now
Q: Why Slow to Understand? A1: Data Processing
Sboner, et al, 2011. The real cost of sequencing: higher than you think!
Secondary
Analytics
Analytics,
Intepretation,
Planning
Experiment
Design
%Effort
0
100
DNA
Sequencing
Situation:
We still have an
analysis bottleneck
Result:
Slow to understand.
Pre-NGS
~2000
Future
~2020
Now
00:20 - Connecting…
01:22 - Link Established
GOOGLE CONFIDENTIAL
Google Cloud Platform lets you run your apps on the
same system as Google
GOOGLE CONFIDENTIAL
So you can focus on what matters
to your science
Google confidential │ Do not distribute
Google is good at handling massive volumes of data
uploads per minute
users
search index
query response time
300hrs
500M+
100PB+
0.25s
Google confidential │ Do not distribute
Google can is good at handleing massive volumes of genomic data
uploads per minute
users
search index
query response time
300hrs
500M+
100PB+
0.25s
~6WGS
>100x US PhDs
~1M WGS
0.25s
Google confidential │ Do not distributeGoogle confidential │ Do not distribute
Google Genomics
August 2015
Google confidential │ Do not distribute
Google Genomics is more than infrastructure
General-purpose
cloud infrastructure
Genomics-specific
featuresGenomics API
Virtual Machines & Storage
Data Services & Tools
Google confidential │ Do not distribute
BioQuery Analysis Engine
Medical Records Genomics Devices Imaging Patient Reports
Baseline Study Data Private Data
Pharma Health Providers …
Google’s vision to tackle complex health data
Public Data
Google confidential │ Do not distribute
BioQuery Analysis Engine
Medical Records Genomics Devices Imaging Patient Reports
Baseline Study Data Private Data
Pharma Health Providers …
Google’s vision to tackle complex health data
Public Data
CONFIDENTIAL & PROPRIETARY
3.75 TERABYTES PER HUMAN
1.00 TB GENOME
2.00 TB EPIGENOME
0.70 TB TRANSCRIPTOME
0.06 TB METABOLOME
0.04 TB PROTEOME
~1 MB STANDARD LAB TESTS
5-YR LONGITUDINAL STUDY
BASELINE STUDY: BIG DATA ANALYSIS
Validate a pipeline to process complex phenotypic, biochemical,
and genomic data
● Pilot Study (N=200)
○ Determine optimal biospecimen collection strategy for stable sampling
and reproducible assays
○ Determine optimal assay methodology
○ Validate quality control methods
○ Validate device data against surrogate and primary endpoints
● Baseline Study (N=10,000+)
○ 6 cohorts from low to high risk for cardiovascular and cancer
○ Characterize human systems biology
○ Define normal values for a given parameter in heterogeneous states
○ Predict meaningful events
○ Validate wearable devices for human monitoring
○ Characterize transitions in disease state
Public Datasets Project
https://cloud.google.com/bigquery/public-data/
A public dataset is any dataset that is stored in BigQuery and made available to the general public. This URL lists a
special group of public datasets that Google BigQuery hosts for you to access and integrate into your applications.
Google pays for the storage of these data sets and provides public access to the data via BigQuery. You pay only for the
queries that you perform on the data (the first 1TB per month is free)
Confidential & ProprietaryGoogle Cloud Platform 21
Platinum Genomes
1000 Genomes
Medical (Human)
Population-scale Genome Projects
1000 Bulls
10K Dog Genomes
Veterinary / Agriculture
Open Cannabis Project
Genome To Fields
Panzea (1000 Maize)
AgriculturePersonal Genome Project
Human Microbiome Project
NCBI GEO Human 100K
Cancer Genome Atlas
Many Other
Interesting
Datasets...
Google confidential │ Do not distribute
PI / Biologist : variant calls for the 1,000 genomes
Google confidential │ Do not distribute
Information: principal coordinates analysis (1000 genomes)
Google confidential │ Do not distribute
Knowledge: populations cluster together
Bioinformatics scientist: BigQuery enables fast tertiary analysis
Google Cloud Platform
Dataflow + BigQuery
Used for Extract, Transform,
Load (ETL), analytics,
real-time computation and
process orchestration.
cloud.google.com/dataflow
Dataflow
Run SQL queries against
multi-terabyte datasets in
seconds.
cloud.google.com/bigquery
BigQuery
Google Cloud Platform
Dataflow + BigQuery
Used for Extract, Transform,
Load (ETL), analytics,
real-time computation and
process orchestration.
cloud.google.com/dataflow
Dataflow
Run SQL queries against
multi-terabyte datasets in
seconds.
cloud.google.com/bigquery
BigQuery
Google Cloud Platform
Dataflow + BigQuery
Google confidential │ Do not distribute
Example: GATK
Analysis Pipeline
Old way: install
applications on host
kernel
libs
app
app app
app
Makefiles,
CWL, WDL
(on a virtual machine)
Google confidential │ Do not distribute
Example: GATK
Analysis Pipeline
Old way: install
applications on host
kernel
libs
app
app app
app
Makefiles,
CWL, WDL
(on a virtual machine)
Google confidential │ Do not distribute
Example: GATK
Analysis Pipeline
● Decouple process
management from
host configuration
● Portable across OS
distros and clouds
● Consistent
environment from
development to
production
● Immutable images
New way: deploy
containers
Old way: install
applications on host
kernel
libs
app
app app
app
libs
app
kernel
libs
app
libs
app
libs
app
Makefiles,
CWL, WDL
(on a virtual machine)
Dockerflow:
Dataflow + Docker
Benefits
Google confidential │ Do not distribute
Use Case:
Reproducible Science with Docker
● Objective: Build a mutation-detection pipeline
● Provided to competitors
○ Training data set
○ Evalutation data set
● Competitors submit pipelines as Docker images to DREAM Challenge host, Sage Bionetworks
● Submitted pipelines were used to process unseen data set
● Post-competition, Docker images made public
● Incidentally, Google won this competition with a deep-learning based variant caller called
DeepVariant cloud.google.com/genomics/v1alpha2/deepvariant
Confidential & ProprietaryGoogle Cloud Platform 35
An idealized version of the
hypothetico-deductive
model of the scientific
method is shown. Various
potential threats to this
model exist (indicated in
red), including
hypothesizing after the
results are known
(HARKing) and lack of
data sharing. Together
these undermine the
robustness of results, and
may impact on the ability
of science to self-correct.
Threats to
reproducible
science.
http://www.nature.com/articles/s41562-016-0021
> java -jar target/dockerflow*dependencies.jar
--project=YOUR_PROJECT
--workflow-file=hello.yaml
--workspace=gs://YOUR_BUCKET/YOUR_FOLDER
--runner=DataflowPipelineRunner
To run it:
Variant Calls
Your Variant Caller
36PubSub
Queue
Sequencer
DNA Reads
Genomics
API
Genomics
API
BigQuery
Your Other Tool
GraphConnect SF 2015 / Graphs Are Feeding The World, Tim Williamson, Data Scientist, Monsanto
https://www.youtube.com/watch?v=6KEvLURBenM
GraphConnect SF 2015 / Graphs Are Feeding The World, Tim Williamson, Data Scientist, Monsanto
https://www.youtube.com/watch?v=6KEvLURBenM
Marker-assisted selection for quantitative traits
Marker-assisted selection for quantitative traits
https://www.sec.gov/Archives/edgar/data/1110783/0000950134
02011773/c71992exv99w2.htm
Google Cloud Platform
Marker-Assisted Breeding Rapidly Increases Frequency of
Favorable Genes
https://www.slideshare.net/finance28/monsanto-082305a
Q: Why Slow to Understand? A1: Data Processing
Sboner, et al, 2011. The real cost of sequencing: higher than you think!
Secondary
Analytics
Analytics,
Intepretation,
Planning
Experiment
Design
%Effort
0
100
DNA
Sequencing
Situation:
We still have an
analysis bottleneck
Result:
Slow to understand.
Pre-NGS
~2000
Future
~2020
Now
Q: Why Slow to Understand? A2: Limited Feedback
Sboner, et al, 2011. The real cost of sequencing: higher than you think!
Secondary
Analytics
Analytics,
Intepretation,
Planning
Experiment
Design
DNA
Sequencing
Situation:
Data acquisition cost approaches zero
However, still slow to understand, because:
1. Restricted choice of what can be observed, i.e. controlled
modifications and artificial selection
2. Passive Learning. Limited feedback => Low rate of learning
Contrast with active learning...
Act
Observe
Observe
Act
Orient Decide
Decide Act
Biological System
Scientist
Molecular Sensors:
DNA sequencer,
Mass spectrometer,
Etc
However...
(Technology)-Limited
Experimental Capability
Google Cloud Platform
Even Moore’s Law / Carlson Curve
Google Cloud Platform
Even Moore’s Law / Carlson Curve - also applies to writing DNA
Act
Observe
Observe
Act
Orient Decide
Decide Act
Biological System
Scientist
Molecular Sensors:
DNA sequencer,
Mass spectrometer,
Etc
Bioengineering Tech:
DNA synthesizers,
CRISPR/Cas9,
Etc
Act
Observe
Observe
Act
Orient Decide
Decide Act
Biological System
Scientist
Molecular Sensors:
DNA sequencer,
Mass spectrometer,
Etc
Environmental Sensors:
Laser scanners,
Hyperspectral scanners,
UAVs
Etc
Bioengineering Tech:
DNA synthesizers,
CRISPR/Cas9,
Etc
Regulate/Measure
System I/O
Google Cloud Platform
Integration with Geospatial, Management, and Terrestrial Sensor Data
anezconsulting.com/precision-agronomy/
Google Cloud Platform
Descartes Labs - Google Cloud Customer
medium.com/@stevenpbrumby/corn-in-the-usa-d487dce84ee1
Cloud ML
Engine
TensorFlow
Google Cloud Platform
Phenomobile, http://www.mdpi.com/2073-4395/4/3/349/htm
See also: http://www.genomes2fields.org/
Google Cloud Platform
Temporo-Spatial Imaging of Growing Plants
Google Cloud Platform
Verily: Assisting Pathologists in Detecting Cancer with Deep Learning
research.googleblog.com/2017/03/assisting-pathologists-in-detecting.html
Prediction heatmaps produced by the algorithm had
improved so much that the localization score (FROC)
for the algorithm reached 89%, which significantly
exceeded the score of 73% for a pathologist with no
time constraint2
. We were not the only ones to see
promising results, as other groups were getting scores
as high as 81% with the same dataset.
Model generalized very well, even to images that were
acquired from a different hospital using different
scanners. For full details, see our paper “Detecting
Cancer Metastases on Gigapixel Pathology Images”.
00:20 - Connecting…
01:22 - Link Established
Google Cloud Platform
~~)( ,
Cloud VisionTensorFlowGoogle Genomics Dataflow Cloud ML Engine Docker
Baseline Study Data Private DataPublic Data
Build What’s Next
Thank You!
Allen Day, PhD // Science Advocate // @allenday // #genomics #ml #datascience

More Related Content

What's hot

Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Robert Grossman
 
AdClickFraud_Bigdata-Apic-Ist-2019
AdClickFraud_Bigdata-Apic-Ist-2019AdClickFraud_Bigdata-Apic-Ist-2019
AdClickFraud_Bigdata-Apic-Ist-2019Neha gupta
 
Machine Learning in Healthcare Diagnostics
Machine Learning in Healthcare DiagnosticsMachine Learning in Healthcare Diagnostics
Machine Learning in Healthcare DiagnosticsLarry Smarr
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? Robert Grossman
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knimeGreg Landrum
 
Edge-based Discovery of Training Data for Machine Learning
Edge-based Discovery of Training Data for Machine LearningEdge-based Discovery of Training Data for Machine Learning
Edge-based Discovery of Training Data for Machine LearningZiqiang Feng
 
Multipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendationMultipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendationKan Yuenyong
 
ICIC 2017: The Next Era: Deep Learning for Biomedical Research
ICIC 2017: The Next Era: Deep Learning for Biomedical ResearchICIC 2017: The Next Era: Deep Learning for Biomedical Research
ICIC 2017: The Next Era: Deep Learning for Biomedical ResearchDr. Haxel Consult
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astrowebuploader
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
03 why is deep learning taking off
03 why is deep learning taking off 03 why is deep learning taking off
03 why is deep learning taking off Edgar Guevara
 

What's hot (20)

Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
AdClickFraud_Bigdata-Apic-Ist-2019
AdClickFraud_Bigdata-Apic-Ist-2019AdClickFraud_Bigdata-Apic-Ist-2019
AdClickFraud_Bigdata-Apic-Ist-2019
 
Machine Learning in Healthcare Diagnostics
Machine Learning in Healthcare DiagnosticsMachine Learning in Healthcare Diagnostics
Machine Learning in Healthcare Diagnostics
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knime
 
Reproducibility for IR evaluation
Reproducibility for IR evaluationReproducibility for IR evaluation
Reproducibility for IR evaluation
 
Cri big data
Cri big dataCri big data
Cri big data
 
Big Data
Big Data Big Data
Big Data
 
Edge-based Discovery of Training Data for Machine Learning
Edge-based Discovery of Training Data for Machine LearningEdge-based Discovery of Training Data for Machine Learning
Edge-based Discovery of Training Data for Machine Learning
 
Multipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendationMultipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendation
 
2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
ICIC 2017: The Next Era: Deep Learning for Biomedical Research
ICIC 2017: The Next Era: Deep Learning for Biomedical ResearchICIC 2017: The Next Era: Deep Learning for Biomedical Research
ICIC 2017: The Next Era: Deep Learning for Biomedical Research
 
SB'12 - Sean Gourley - Quid
SB'12 - Sean Gourley - Quid SB'12 - Sean Gourley - Quid
SB'12 - Sean Gourley - Quid
 
2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astro
 
Future of hpc
Future of hpcFuture of hpc
Future of hpc
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
03 why is deep learning taking off
03 why is deep learning taking off 03 why is deep learning taking off
03 why is deep learning taking off
 

Viewers also liked

Genome Analysis Pipelines with Spark and ADAM
Genome Analysis Pipelines with Spark and ADAMGenome Analysis Pipelines with Spark and ADAM
Genome Analysis Pipelines with Spark and ADAMAllen Day, PhD
 
Cloud Accelerated Genomics
Cloud Accelerated GenomicsCloud Accelerated Genomics
Cloud Accelerated GenomicsIdan Tohami
 
Huawei - Zal Hybrid Cloud de toekomst zijn van de business van een onderneming?
Huawei - Zal Hybrid Cloud de toekomst zijn van de business van een onderneming?Huawei - Zal Hybrid Cloud de toekomst zijn van de business van een onderneming?
Huawei - Zal Hybrid Cloud de toekomst zijn van de business van een onderneming?VITO - Securitas
 
Mark Johnson's AWS Chicago Healthcare Slides - 2016
Mark Johnson's AWS Chicago Healthcare Slides - 2016Mark Johnson's AWS Chicago Healthcare Slides - 2016
Mark Johnson's AWS Chicago Healthcare Slides - 2016AWS Chicago
 
Understanding Kubernetes
Understanding KubernetesUnderstanding Kubernetes
Understanding KubernetesTu Pham
 
Watson genomics
Watson genomicsWatson genomics
Watson genomicsInsideDNA
 
Prospection de textes scientifiques : vision prospective
Prospection de textes scientifiques : vision prospectiveProspection de textes scientifiques : vision prospective
Prospection de textes scientifiques : vision prospectiveGuillaume Cabanac
 
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen ChinaAllen Day, PhD
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesGuy Coates
 
Life sciences big data use cases
Life sciences big data use casesLife sciences big data use cases
Life sciences big data use casesGuy Coates
 
Future Architectures for genomics
Future Architectures for genomicsFuture Architectures for genomics
Future Architectures for genomicsGuy Coates
 
Declaring a TB outbreak over with genomics
Declaring a TB outbreak over with genomicsDeclaring a TB outbreak over with genomics
Declaring a TB outbreak over with genomicsJennifer Gardy
 
Genomics in Public Health
Genomics in Public HealthGenomics in Public Health
Genomics in Public HealthJennifer Gardy
 
Windows Azure Media Services : des API pour encoder, multiplexer et difuser v...
Windows Azure Media Services : des API pour encoder, multiplexer et difuser v...Windows Azure Media Services : des API pour encoder, multiplexer et difuser v...
Windows Azure Media Services : des API pour encoder, multiplexer et difuser v...Microsoft Technet France
 
La diffusion vidéo avec le Cloud Azure
La diffusion vidéo avec le Cloud AzureLa diffusion vidéo avec le Cloud Azure
La diffusion vidéo avec le Cloud AzureMicrosoft
 
Contrôler les usages de vos informations dans le Cloud avec Windows Azure AD ...
Contrôler les usages de vos informations dans le Cloud avec Windows Azure AD ...Contrôler les usages de vos informations dans le Cloud avec Windows Azure AD ...
Contrôler les usages de vos informations dans le Cloud avec Windows Azure AD ...Microsoft Technet France
 
Why amazon Web Services?
Why amazon Web Services?Why amazon Web Services?
Why amazon Web Services?Bogdan Naydenov
 
동북아 국제 정세(박인휘 교수)
동북아 국제 정세(박인휘 교수)동북아 국제 정세(박인휘 교수)
동북아 국제 정세(박인휘 교수)gilforum
 

Viewers also liked (20)

Genome Analysis Pipelines with Spark and ADAM
Genome Analysis Pipelines with Spark and ADAMGenome Analysis Pipelines with Spark and ADAM
Genome Analysis Pipelines with Spark and ADAM
 
Cloud Accelerated Genomics
Cloud Accelerated GenomicsCloud Accelerated Genomics
Cloud Accelerated Genomics
 
Huawei - Zal Hybrid Cloud de toekomst zijn van de business van een onderneming?
Huawei - Zal Hybrid Cloud de toekomst zijn van de business van een onderneming?Huawei - Zal Hybrid Cloud de toekomst zijn van de business van een onderneming?
Huawei - Zal Hybrid Cloud de toekomst zijn van de business van een onderneming?
 
Mark Johnson's AWS Chicago Healthcare Slides - 2016
Mark Johnson's AWS Chicago Healthcare Slides - 2016Mark Johnson's AWS Chicago Healthcare Slides - 2016
Mark Johnson's AWS Chicago Healthcare Slides - 2016
 
Understanding Kubernetes
Understanding KubernetesUnderstanding Kubernetes
Understanding Kubernetes
 
Watson genomics
Watson genomicsWatson genomics
Watson genomics
 
Prospection de textes scientifiques : vision prospective
Prospection de textes scientifiques : vision prospectiveProspection de textes scientifiques : vision prospective
Prospection de textes scientifiques : vision prospective
 
Genomics
GenomicsGenomics
Genomics
 
Analysis of high grade prostate cancer microarray data
Analysis of high grade prostate cancer microarray dataAnalysis of high grade prostate cancer microarray data
Analysis of high grade prostate cancer microarray data
 
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciences
 
Life sciences big data use cases
Life sciences big data use casesLife sciences big data use cases
Life sciences big data use cases
 
Future Architectures for genomics
Future Architectures for genomicsFuture Architectures for genomics
Future Architectures for genomics
 
Declaring a TB outbreak over with genomics
Declaring a TB outbreak over with genomicsDeclaring a TB outbreak over with genomics
Declaring a TB outbreak over with genomics
 
Genomics in Public Health
Genomics in Public HealthGenomics in Public Health
Genomics in Public Health
 
Windows Azure Media Services : des API pour encoder, multiplexer et difuser v...
Windows Azure Media Services : des API pour encoder, multiplexer et difuser v...Windows Azure Media Services : des API pour encoder, multiplexer et difuser v...
Windows Azure Media Services : des API pour encoder, multiplexer et difuser v...
 
La diffusion vidéo avec le Cloud Azure
La diffusion vidéo avec le Cloud AzureLa diffusion vidéo avec le Cloud Azure
La diffusion vidéo avec le Cloud Azure
 
Contrôler les usages de vos informations dans le Cloud avec Windows Azure AD ...
Contrôler les usages de vos informations dans le Cloud avec Windows Azure AD ...Contrôler les usages de vos informations dans le Cloud avec Windows Azure AD ...
Contrôler les usages de vos informations dans le Cloud avec Windows Azure AD ...
 
Why amazon Web Services?
Why amazon Web Services?Why amazon Web Services?
Why amazon Web Services?
 
동북아 국제 정세(박인휘 교수)
동북아 국제 정세(박인휘 교수)동북아 국제 정세(박인휘 교수)
동북아 국제 정세(박인휘 교수)
 

Similar to 20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix

Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Robert Grossman
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Sanjay Padhi, Ph.D
 
Google Cloud Platform: Prototype ->Production-> Planet scale
Google Cloud Platform: Prototype ->Production-> Planet scaleGoogle Cloud Platform: Prototype ->Production-> Planet scale
Google Cloud Platform: Prototype ->Production-> Planet scaleIdan Tohami
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...Bonnie Hurwitz
 
The crusade for big data in the AAL domain
The crusade for big data in the AAL domainThe crusade for big data in the AAL domain
The crusade for big data in the AAL domainAALForum
 
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...GigaScience, BGI Hong Kong
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleEnis Afgan
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science researchDenis C. Bauer
 
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013Amazon Web Services
 
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQueryIntro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQueryChris Schalk
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker, Inc.
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobus
 
Appistry WGDAS Presentation
Appistry WGDAS PresentationAppistry WGDAS Presentation
Appistry WGDAS Presentationelasticdave
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchBlue BRIDGE
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Blue BRIDGE
 

Similar to 20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix (20)

Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
 
Google Cloud Platform: Prototype ->Production-> Planet scale
Google Cloud Platform: Prototype ->Production-> Planet scaleGoogle Cloud Platform: Prototype ->Production-> Planet scale
Google Cloud Platform: Prototype ->Production-> Planet scale
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
The crusade for big data in the AAL domain
The crusade for big data in the AAL domainThe crusade for big data in the AAL domain
The crusade for big data in the AAL domain
 
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
 
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQueryIntro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
 
On Big Data
On Big DataOn Big Data
On Big Data
 
Pine education-platform
Pine education-platformPine education-platform
Pine education-platform
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce Hoff
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 Keynote
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
Appistry WGDAS Presentation
Appistry WGDAS PresentationAppistry WGDAS Presentation
Appistry WGDAS Presentation
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
 

More from Allen Day, PhD

20170424 - Big Data in Biology - Vancouver - Simon Fraser University
20170424 - Big Data in Biology - Vancouver - Simon Fraser University20170424 - Big Data in Biology - Vancouver - Simon Fraser University
20170424 - Big Data in Biology - Vancouver - Simon Fraser UniversityAllen Day, PhD
 
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGIHadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGIAllen Day, PhD
 
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBIHadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBIAllen Day, PhD
 
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17Allen Day, PhD
 
Hadoop as a Platform for Genomics - Strata 2015, San Jose
Hadoop as a Platform for Genomics - Strata 2015, San JoseHadoop as a Platform for Genomics - Strata 2015, San Jose
Hadoop as a Platform for Genomics - Strata 2015, San JoseAllen Day, PhD
 
Genomics isn't Special
Genomics isn't SpecialGenomics isn't Special
Genomics isn't SpecialAllen Day, PhD
 
Renaissance in Medicine - Strata - NoSQL and Genomics
Renaissance in Medicine - Strata - NoSQL and GenomicsRenaissance in Medicine - Strata - NoSQL and Genomics
Renaissance in Medicine - Strata - NoSQL and GenomicsAllen Day, PhD
 
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...Allen Day, PhD
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseAllen Day, PhD
 
Human Genetics & Big Data [sans Ethics]
Human Genetics & Big Data [sans Ethics]Human Genetics & Big Data [sans Ethics]
Human Genetics & Big Data [sans Ethics]Allen Day, PhD
 
Building Data Science Teams, Abbreviated
Building Data Science Teams, AbbreviatedBuilding Data Science Teams, Abbreviated
Building Data Science Teams, AbbreviatedAllen Day, PhD
 
Genomics Crash Course for Data Engineers
Genomics Crash Course for Data EngineersGenomics Crash Course for Data Engineers
Genomics Crash Course for Data EngineersAllen Day, PhD
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production SuccessAllen Day, PhD
 
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
20131212 - Sydney - Garvan Institute - Human Genetics and Big DataAllen Day, PhD
 
2013.12.12 - Sydney - Big Data Analytics
2013.12.12 - Sydney - Big Data Analytics2013.12.12 - Sydney - Big Data Analytics
2013.12.12 - Sydney - Big Data AnalyticsAllen Day, PhD
 
20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design PatternsAllen Day, PhD
 
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design PatternsAllen Day, PhD
 

More from Allen Day, PhD (17)

20170424 - Big Data in Biology - Vancouver - Simon Fraser University
20170424 - Big Data in Biology - Vancouver - Simon Fraser University20170424 - Big Data in Biology - Vancouver - Simon Fraser University
20170424 - Big Data in Biology - Vancouver - Simon Fraser University
 
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGIHadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
 
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBIHadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
Hadoop and Genomics - What you need to know - Cambridge - Sanger Center and EBI
 
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
Hadoop and Genomics - What You Need to Know - London - Viadex RCC - 2015.03.17
 
Hadoop as a Platform for Genomics - Strata 2015, San Jose
Hadoop as a Platform for Genomics - Strata 2015, San JoseHadoop as a Platform for Genomics - Strata 2015, San Jose
Hadoop as a Platform for Genomics - Strata 2015, San Jose
 
Genomics isn't Special
Genomics isn't SpecialGenomics isn't Special
Genomics isn't Special
 
Renaissance in Medicine - Strata - NoSQL and Genomics
Renaissance in Medicine - Strata - NoSQL and GenomicsRenaissance in Medicine - Strata - NoSQL and Genomics
Renaissance in Medicine - Strata - NoSQL and Genomics
 
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
2014.06.30 - Renaissance in Medicine - Singapore Management University - Data...
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
 
Human Genetics & Big Data [sans Ethics]
Human Genetics & Big Data [sans Ethics]Human Genetics & Big Data [sans Ethics]
Human Genetics & Big Data [sans Ethics]
 
Building Data Science Teams, Abbreviated
Building Data Science Teams, AbbreviatedBuilding Data Science Teams, Abbreviated
Building Data Science Teams, Abbreviated
 
Genomics Crash Course for Data Engineers
Genomics Crash Course for Data EngineersGenomics Crash Course for Data Engineers
Genomics Crash Course for Data Engineers
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
 
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
20131212 - Sydney - Garvan Institute - Human Genetics and Big Data
 
2013.12.12 - Sydney - Big Data Analytics
2013.12.12 - Sydney - Big Data Analytics2013.12.12 - Sydney - Big Data Analytics
2013.12.12 - Sydney - Big Data Analytics
 
20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns
 
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
 

Recently uploaded

9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 

Recently uploaded (20)

9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 

20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix

  • 1. Cloud Accelerated Genomics Allen Day, PhD // Science Advocate @allenday // #genomics #ml #datascience
  • 2. Table of Contents Section 1 Section 2 Section 3 Throughout Getting from Research to Application… Faster What are the bottlenecks for translating research into products? Emphasis on information processing. From CompBio Research to CompBio Engineering Getting results, more of them, and predictably improving Data Integration - Cutting Edge Use Cases What’s happening right now in industry and academia? How to use Google Cloud? I’ll introduce specific cloud services, along with examples of how they’ve been used successfully. Compute Engine, Kubernetes, Dataflow, Cloud ML, Genomics API
  • 3. How to Understand? Linear B is a syllabic script that was used for writing Mycenaean Greek, the earliest attested form of Greek. The script predates the Greek alphabet by several centuries. The oldest Mycenaean writing dates to about 1450 BC.
  • 5. Hypothetico-Deductive Method (Iterative) Organize Analyze, Interpret, and Plan Choose Data Acquire Situation: Not enough data. No means to get more. Dead Language. Outcome: Cannot understand. Also: Passive learning. No feedback.
  • 6. DNA Sequencing Value Chain %Effort 0 100 Pre-NGS ~2000 Future ~2020 Now Sboner, et al, 2011. The real cost of sequencing: higher than you think! Secondary Analytics Analytics, Intepretation, Planning Experiment Design DNA Sequencing
  • 7. Human Genetics Scenario Sboner, et al, 2011. The real cost of sequencing: higher than you think! Secondary Analytics Analytics, Intepretation, Planning Experiment Design %Effort 0 100 DNA Sequencing Situation: Unlimited Free DNA Result: Slow to understand. Pre-NGS ~2000 Future ~2020 Now
  • 8. Q: Why Slow to Understand? A1: Data Processing Sboner, et al, 2011. The real cost of sequencing: higher than you think! Secondary Analytics Analytics, Intepretation, Planning Experiment Design %Effort 0 100 DNA Sequencing Situation: We still have an analysis bottleneck Result: Slow to understand. Pre-NGS ~2000 Future ~2020 Now
  • 9. 00:20 - Connecting… 01:22 - Link Established
  • 10.
  • 11. GOOGLE CONFIDENTIAL Google Cloud Platform lets you run your apps on the same system as Google
  • 12. GOOGLE CONFIDENTIAL So you can focus on what matters to your science
  • 13. Google confidential │ Do not distribute Google is good at handling massive volumes of data uploads per minute users search index query response time 300hrs 500M+ 100PB+ 0.25s
  • 14. Google confidential │ Do not distribute Google can is good at handleing massive volumes of genomic data uploads per minute users search index query response time 300hrs 500M+ 100PB+ 0.25s ~6WGS >100x US PhDs ~1M WGS 0.25s
  • 15. Google confidential │ Do not distributeGoogle confidential │ Do not distribute Google Genomics August 2015
  • 16. Google confidential │ Do not distribute Google Genomics is more than infrastructure General-purpose cloud infrastructure Genomics-specific featuresGenomics API Virtual Machines & Storage Data Services & Tools
  • 17. Google confidential │ Do not distribute BioQuery Analysis Engine Medical Records Genomics Devices Imaging Patient Reports Baseline Study Data Private Data Pharma Health Providers … Google’s vision to tackle complex health data Public Data
  • 18. Google confidential │ Do not distribute BioQuery Analysis Engine Medical Records Genomics Devices Imaging Patient Reports Baseline Study Data Private Data Pharma Health Providers … Google’s vision to tackle complex health data Public Data
  • 19. CONFIDENTIAL & PROPRIETARY 3.75 TERABYTES PER HUMAN 1.00 TB GENOME 2.00 TB EPIGENOME 0.70 TB TRANSCRIPTOME 0.06 TB METABOLOME 0.04 TB PROTEOME ~1 MB STANDARD LAB TESTS 5-YR LONGITUDINAL STUDY BASELINE STUDY: BIG DATA ANALYSIS Validate a pipeline to process complex phenotypic, biochemical, and genomic data ● Pilot Study (N=200) ○ Determine optimal biospecimen collection strategy for stable sampling and reproducible assays ○ Determine optimal assay methodology ○ Validate quality control methods ○ Validate device data against surrogate and primary endpoints ● Baseline Study (N=10,000+) ○ 6 cohorts from low to high risk for cardiovascular and cancer ○ Characterize human systems biology ○ Define normal values for a given parameter in heterogeneous states ○ Predict meaningful events ○ Validate wearable devices for human monitoring ○ Characterize transitions in disease state
  • 20. Public Datasets Project https://cloud.google.com/bigquery/public-data/ A public dataset is any dataset that is stored in BigQuery and made available to the general public. This URL lists a special group of public datasets that Google BigQuery hosts for you to access and integrate into your applications. Google pays for the storage of these data sets and provides public access to the data via BigQuery. You pay only for the queries that you perform on the data (the first 1TB per month is free)
  • 21. Confidential & ProprietaryGoogle Cloud Platform 21 Platinum Genomes 1000 Genomes Medical (Human) Population-scale Genome Projects 1000 Bulls 10K Dog Genomes Veterinary / Agriculture Open Cannabis Project Genome To Fields Panzea (1000 Maize) AgriculturePersonal Genome Project Human Microbiome Project NCBI GEO Human 100K Cancer Genome Atlas Many Other Interesting Datasets...
  • 22. Google confidential │ Do not distribute PI / Biologist : variant calls for the 1,000 genomes
  • 23. Google confidential │ Do not distribute Information: principal coordinates analysis (1000 genomes)
  • 24. Google confidential │ Do not distribute Knowledge: populations cluster together
  • 25. Bioinformatics scientist: BigQuery enables fast tertiary analysis
  • 26. Google Cloud Platform Dataflow + BigQuery Used for Extract, Transform, Load (ETL), analytics, real-time computation and process orchestration. cloud.google.com/dataflow Dataflow Run SQL queries against multi-terabyte datasets in seconds. cloud.google.com/bigquery BigQuery
  • 27. Google Cloud Platform Dataflow + BigQuery Used for Extract, Transform, Load (ETL), analytics, real-time computation and process orchestration. cloud.google.com/dataflow Dataflow Run SQL queries against multi-terabyte datasets in seconds. cloud.google.com/bigquery BigQuery
  • 29. Google confidential │ Do not distribute Example: GATK Analysis Pipeline Old way: install applications on host kernel libs app app app app Makefiles, CWL, WDL (on a virtual machine)
  • 30.
  • 31.
  • 32. Google confidential │ Do not distribute Example: GATK Analysis Pipeline Old way: install applications on host kernel libs app app app app Makefiles, CWL, WDL (on a virtual machine)
  • 33. Google confidential │ Do not distribute Example: GATK Analysis Pipeline ● Decouple process management from host configuration ● Portable across OS distros and clouds ● Consistent environment from development to production ● Immutable images New way: deploy containers Old way: install applications on host kernel libs app app app app libs app kernel libs app libs app libs app Makefiles, CWL, WDL (on a virtual machine) Dockerflow: Dataflow + Docker Benefits
  • 34. Google confidential │ Do not distribute Use Case: Reproducible Science with Docker ● Objective: Build a mutation-detection pipeline ● Provided to competitors ○ Training data set ○ Evalutation data set ● Competitors submit pipelines as Docker images to DREAM Challenge host, Sage Bionetworks ● Submitted pipelines were used to process unseen data set ● Post-competition, Docker images made public ● Incidentally, Google won this competition with a deep-learning based variant caller called DeepVariant cloud.google.com/genomics/v1alpha2/deepvariant
  • 35. Confidential & ProprietaryGoogle Cloud Platform 35 An idealized version of the hypothetico-deductive model of the scientific method is shown. Various potential threats to this model exist (indicated in red), including hypothesizing after the results are known (HARKing) and lack of data sharing. Together these undermine the robustness of results, and may impact on the ability of science to self-correct. Threats to reproducible science. http://www.nature.com/articles/s41562-016-0021
  • 36. > java -jar target/dockerflow*dependencies.jar --project=YOUR_PROJECT --workflow-file=hello.yaml --workspace=gs://YOUR_BUCKET/YOUR_FOLDER --runner=DataflowPipelineRunner To run it: Variant Calls Your Variant Caller 36PubSub Queue Sequencer DNA Reads Genomics API Genomics API BigQuery Your Other Tool
  • 37. GraphConnect SF 2015 / Graphs Are Feeding The World, Tim Williamson, Data Scientist, Monsanto https://www.youtube.com/watch?v=6KEvLURBenM
  • 38. GraphConnect SF 2015 / Graphs Are Feeding The World, Tim Williamson, Data Scientist, Monsanto https://www.youtube.com/watch?v=6KEvLURBenM
  • 39. Marker-assisted selection for quantitative traits
  • 40. Marker-assisted selection for quantitative traits https://www.sec.gov/Archives/edgar/data/1110783/0000950134 02011773/c71992exv99w2.htm
  • 41. Google Cloud Platform Marker-Assisted Breeding Rapidly Increases Frequency of Favorable Genes https://www.slideshare.net/finance28/monsanto-082305a
  • 42. Q: Why Slow to Understand? A1: Data Processing Sboner, et al, 2011. The real cost of sequencing: higher than you think! Secondary Analytics Analytics, Intepretation, Planning Experiment Design %Effort 0 100 DNA Sequencing Situation: We still have an analysis bottleneck Result: Slow to understand. Pre-NGS ~2000 Future ~2020 Now
  • 43. Q: Why Slow to Understand? A2: Limited Feedback Sboner, et al, 2011. The real cost of sequencing: higher than you think! Secondary Analytics Analytics, Intepretation, Planning Experiment Design DNA Sequencing Situation: Data acquisition cost approaches zero However, still slow to understand, because: 1. Restricted choice of what can be observed, i.e. controlled modifications and artificial selection 2. Passive Learning. Limited feedback => Low rate of learning Contrast with active learning...
  • 44. Act Observe Observe Act Orient Decide Decide Act Biological System Scientist Molecular Sensors: DNA sequencer, Mass spectrometer, Etc However... (Technology)-Limited Experimental Capability
  • 45. Google Cloud Platform Even Moore’s Law / Carlson Curve
  • 46. Google Cloud Platform Even Moore’s Law / Carlson Curve - also applies to writing DNA
  • 47. Act Observe Observe Act Orient Decide Decide Act Biological System Scientist Molecular Sensors: DNA sequencer, Mass spectrometer, Etc Bioengineering Tech: DNA synthesizers, CRISPR/Cas9, Etc
  • 48. Act Observe Observe Act Orient Decide Decide Act Biological System Scientist Molecular Sensors: DNA sequencer, Mass spectrometer, Etc Environmental Sensors: Laser scanners, Hyperspectral scanners, UAVs Etc Bioengineering Tech: DNA synthesizers, CRISPR/Cas9, Etc Regulate/Measure System I/O
  • 49. Google Cloud Platform Integration with Geospatial, Management, and Terrestrial Sensor Data anezconsulting.com/precision-agronomy/
  • 50. Google Cloud Platform Descartes Labs - Google Cloud Customer medium.com/@stevenpbrumby/corn-in-the-usa-d487dce84ee1 Cloud ML Engine TensorFlow
  • 51. Google Cloud Platform Phenomobile, http://www.mdpi.com/2073-4395/4/3/349/htm See also: http://www.genomes2fields.org/
  • 52. Google Cloud Platform Temporo-Spatial Imaging of Growing Plants
  • 53. Google Cloud Platform Verily: Assisting Pathologists in Detecting Cancer with Deep Learning research.googleblog.com/2017/03/assisting-pathologists-in-detecting.html Prediction heatmaps produced by the algorithm had improved so much that the localization score (FROC) for the algorithm reached 89%, which significantly exceeded the score of 73% for a pathologist with no time constraint2 . We were not the only ones to see promising results, as other groups were getting scores as high as 81% with the same dataset. Model generalized very well, even to images that were acquired from a different hospital using different scanners. For full details, see our paper “Detecting Cancer Metastases on Gigapixel Pathology Images”.
  • 54. 00:20 - Connecting… 01:22 - Link Established
  • 55. Google Cloud Platform ~~)( , Cloud VisionTensorFlowGoogle Genomics Dataflow Cloud ML Engine Docker Baseline Study Data Private DataPublic Data
  • 56. Build What’s Next Thank You! Allen Day, PhD // Science Advocate // @allenday // #genomics #ml #datascience