Science and research touches all of our lives in various ways, whether it's the latest breaking research in personalised medicine to find a cure for cancer, running complex financial forecasts, or remotely managing and controlling robots on distant planets, the impact of the research behind these endeavours is far reaching and changing the world.
Researchers need fast, efficient, and cost effective access to the computational resources they need to do their analysis, simulations, and experiments at scale. They need to securely manage their sensitive data and share that across their existing platforms as well as AWS. There is continual pressure to deliver and scale outcomes aligned to cost. This session will also showcase a medical research customer where the use of AWS has dramatically improved their ability to do this.
Speaker: Adrian White, Solutions Architect, Amazon Web Services
Featured Customer - University of Tasmania IMOS
2. “It is the tension between creativity and
skepticism that has produced the stunning
and unexpected findings of science.”
Carl Sagan (1934-1996)
3. Why AWS for Research?
Time to Science
Access research
infrastructure in minutes
Low Cost
Pay-as-you-go pricing
Elastic
Easily add or remove capacity
Globally Accessible
Easily Collaborate with
researchers around the world
Secure
A collection of tools to
protect data and privacy
Scalable
Access to effectively
limitless capacity
4. Time to Science Scalable
Perhaps the Top Three Reasons?
Elastic
5. Researchers Are Using AWS For…
Life Sciences
& Genomics
Space Research
& Astronomy
High Energy
Physics
Open Data
e.g. Satellite
Imagery
HPC & Grid
computing
6. Researchers Are Using AWS For…
Life Sciences
& Genomics
Space Research
& Astronomy
High Energy
Physics
Open Data
e.g. Satellite
Imagery
HPC & Grid
computing
and much more….
7. Walter and Eliza Hall Institute of Medical Research
DiUS & WEHI have built Two Medical
Research Solutions:
• Building an auto-scaling R cluster using
CfnCluster
• Scientific image processing in the cloud
with Fiji/ImageJ
This provides on-demand and templated
research services to do science.
8. • The Australian Museum, University of
New South Wales, and others are
sequencing and assembling the Koala
Genome using AWS
• They are running large clusters using
Spot instances
• Have used over 300,000 core hours and
can use new processing techniques
previously unavailable to them.
Sequencing the Koala Genome
9. Solving the Mysteries of the Universe
• Fermilab is one of the Tier 1
data centers for the CMS
experiment
• Looking for the Higgs Boson to
understand mass.
• Launched the HEP Cloud
Project in June, 2015
• Recently added 58,000 cores
(or 4x increase in Fermilab
capacity) to simulate 500 million
events over 10 days
Source: https://aws.amazon.com/blogs/aws/experiment-that-discovered-the-higgs-boson-uses-aws-to-probe-nature/
10. Seals with Sensors
Dr Peter Blain
IMOS is a national collaborative research infrastructure, supported by Australian Government. It is led by University of Tasmania in partnership with
the Australian marine & climate science community.
11. Argo Floats Ships Moorings Gliders
Autonomous
Vehicles RadarAnimalsSatellite
… and others
IMOS Observations are Taken by Sensors on Multiple
Platforms...
14. IMOS data is targeted at the scientific community.
Research themes:
• Long-term ocean change.
• Climate variability and weather extremes.
• Boundary currents.
• Continental shelf and coastal processes.
• Ecosystem responses.
Thousands of publications have been based on IMOS data.
IMOS data is also available to government, industry and the general public.
End Users of IMOS Data
18. Backend Systems:
● Ingest Tens of Millions of Fragmented and Highly
Heterogeneous Data Files.
● Check each file for Conformance.
● Store and Index each file (spatial and temporal).
● Extract, Transform, Load.
● Publish via Standardised (OGC) Web Services.
● Provide Graphical User Interfaces.
20. Why We Migrated to AWS
1. Data Durability
2. System Reliability
3. Freedom to Innovate
4. Cost Effectiveness
21. The Challenges
1. Making the Case
2. Data Migration
3. Modify Applications to Serve Data from S3
4. Redesign Data Ingest Process
22. THREDDSS3
Web Services
WMS, WCS, OPeNDAP, etc
IMOS Enhancements to THREDDS
• THREDDS is an Open Source Application that is popular in the
Geospatial Community.
• It serves Scientific Datasets through a Variety of Standards Compliant
Web Services.
• IMOS Enhancements to Support S3 will be Merged into Core.
23. The Future
• Continue to Innovate
• Eg. Improve processing of large gridded datasets (like satellite data) using
MapReduce.
• Build the Australian Ocean Data Network (AODN)
• Starting in July the IMOS infrastructure running at AWS will become the
foundation of the AODN, which is a collaboration between government
agencies including:
• Royal Australian Navy
• GeoScience Australia
• CSIRO
• Bureau of Meteorology
• Australian Institute for Marine Science
• Australian Antarctic Division
24. Dr Peter Blain
Contact: peter.blain@utas.edu.au
IMOS is a national collaborative research infrastructure, supported by Australian Government. It is led by University of Tasmania in partnership with
the Australian marine & climate science community.
26. CfnCluster is Familiar
• An HPC Head Node
• Shared NFS Storage
• Compute Nodes
• Common Schedulers
• SGE, Torque, Slurm etc
• Bootstrap Mechanism
27. But CfnCluster is Also Different…
• Elastic Compute Nodes
• Amazon S3 Integration
• Scheduler is Integrated
with Auto Scaling
Takes 15 minutes to build new
clusters.
28. Bootstrapping Our Own Software on CfnCluster
#!/bin/bash
sudo apt-get -y update
sudo apt-get -y install python-pip python-numpy
python-scipy libgdal-dev libatlas-base-dev
gfortran libfreetype6-devsudo
pip install landsat-utilsudo apt-get -y install
imagemagick
• CfnCluster supports Chef
to configure the master and
compute nodes
• Or we can write simple scripts
and hook into different
lifecycle events
• This is how we bootstrap
our demo cluster with Landsat
imagery tools – we’ll see this
later
30. Scientific Computing with AWS Lambda
• Approaches to solve common data processing patterns
can be simplified, made more scalable and at much
lower cost
• AWS Lambda can simplify the compute architecture
used for scientific computing as well.
31. Data Processing with MARVEL
MARVEL is the Mars Australian
Remote Virtual Experiment
Laboratory.
It processes radar data from the
Mars Express satellite
No servers are used in this solution
www.marvelstem.org
S3 event fires
on PUT
Ingest data
Process the
payload using
Lambda
Store secondary
data product in S3
for consumption
Send status to
an SNS topic
Send event to
an SNS topic
32. Scientific Computing with AWS Lambda
CRISPR Cas9
allows researchers
to edit a genome.
An important
technique to model
disease, and cure
genetic disorders.
This is a large
search problem,
looking for a 20
base string or
“guide”.
Benchling run C++
in Lambda to
search multiple
genomes at once.
$60/month vs
thousands of
dollars using
traditional
architectures.
http://benchling.engineering/crispr-aws-lambda/
33. Open Data with AWS
Sharing data on AWS makes it accessible to a
large and growing community of researchers who
use the AWS cloud.
The Big Data Challenge
It’s typically consuming and expensive to acquire, store, and analyse
large data sets.
Accessing the full historical archive on demand has been almost
impossible.
Our Solution – Shared Open Data on AWS
34. Public Data Sets on AWS
Several high-value datasets are Available for Anyone to Access for Free on AWS.
Landsat on AWS3K Rice Genome NEXRAD on AWS
35. Demo: Landsat-util on CfnCluster
Image source: https://developmentseed.org/blog/2014/08/29/landsat-util/
36. Research Data Egress Waiver
Why?
Researchers
strongly need
Predictable
Budgets
Who?
Available to
Degree-granting
/ Research
Institutions in
APAC (and
elsewhere)
What?
Waives data
egress charges
from Qualified
Accounts
capped at 15%
of Total Spend
How?
Contract
Addendum
Required.
Talk to your
Account Team.
All qualifying research customers should use this!
37. Spike Your Ideas with Research Credits
AWS Cloud Credits for Research Supports Researchers:
• Proof of concepts or benchmark tests
• Contribute results, code, solutions
• Train the broader community
Apply via https://aws.amazon.com/research-credits/
38. In Review…
• Come and Talk to us About
your Research Ideas.
• We Want to Help you
Experiment on the AWS
Platform.
Test your Ideas at Scale!
39. AWS Training & Certification
Intro Videos & Labs
Free videos and labs to
help you learn to work
with 30+ AWS services
– in minutes!
Training Classes
In-person and online
courses to build
technical skills –
taught by accredited
AWS instructors
Online Labs
Practice working with
AWS services in live
environment –
Learn how related
services work
together
AWS Certification
Validate technical
skills and expertise –
identify qualified IT
talent or show you
are AWS cloud ready
Learn more: aws.amazon.com/training
40. Your Training Next Steps:
ü Visit the AWS Training & Certification pod to discuss your
training plan & AWS Summit training offer
ü Register & attend AWS instructor led training
ü Get Certified
AWS Certified? Visit the AWS Summit Certification Lounge to pick up your swag
Learn more: aws.amazon.com/training