1. 1
Scaling Galaxy on GCP
LynnLangit
Cloud and Data Architect
Google Developer Cloud Expert, AWS Community Hero, Microsoft Data Platform MVP
2. 2
Agenda
• Scaling Up
• Virtual Machines
• Hello Galaxy
• Adding Tools to Galaxy
• Genomic Data on GCP
• Scaling Out
• Docker Containers
• Google Persistent Disks
• Pipelines
• Google Genomics APIs
• Big Query
Galaxy on Google Cloud Platform
10. 1010
Genomic Data
• Files at GCS
• gs://genomics-public-data
• Query via BigQuery
• https://bigquery.cloud.google.com/queries/genomics-public-
data
• Code via Genomics API
• Implements Global Alliance for Genomics and Health
APIs
• Genome browser - https://gabrowse.appspot.com
• Google Genomics example code on GitHub
24. 2424
• Cloud Storage (files) -- here
• Compute Engine (VMs) -- here
• Container Engine (Docker) -- here
• Big Query (SQL) -- here
• Cloud Dataflow (pipelines) -- here
• Genomics API-- here
• Genomics Cookbook– here
• Public datasets on GCP-- here
• Google’s Genomic code samples – here
• Lynn’s GitHub code samples -- here
Resources
30. 30
GCE Persistence Options – Disks,
etc.… Created From Notes
Image GCS File or Disk File path <bucket>/<folder>/<file>
Disk must detached from VM
Snapshot Disk or Instance (boot) Can create an Instance FROM a Snapshot
Persistent
Disk
Image –or-
Snapshot –or-
Blank
Blank disk must be formatted
Can create an Instance or Snapshot FROM a Disk
Bucket GCS console for file Access via path gs://<bucketName>/<fileName>
VM Instance
Boot Disk
Image –or-
Snapshot –or-
Disk
Images -> OS, Application or Custom Image
N/A
From Saved Disk
VM Instance
Additional
Disk
Local Scratch –or-
Standard persistent –or-
SSD persistent
Max 8 at 375 GB each.
500 GB 64 TB
Read/Write or Read Only
Attach up to 16 Disks* per VM
Documentation to share an image across multiple GCP projects -- https://cloud.google.com/compute/docs/images/sharing-images-across-projects
Documentation to export an image as a file -- https://cloud.google.com/compute/docs/images/export-image