Amazon EC2 F1 instances with field programmable gate arrays (FPGAs), combined with improved cloud-based FPGA programming tools, provides researchers, application developers, and startups with a well-tested, standardized, and accessible platform for hardware-accelerated computing. This session introduces you to Amazon EC2 F1 instances with FPGAs, walks you through a typical development and deployment process, and highlights a number of use cases in different domains, including genomics, video processing, text search, and financial computing.
1. FPGA Accelerated Computing Using
Amazon EC2 F1 Instances
D a v i d P e l l e r i n
H e a d o f W W B u s i n e s s D e v e l o p m e n t , I n f o t e c h , A W S
P i e t e r v a n R o o y e n
C E O a n d F o u n d e r , E d i c o G e n o m e
R a m i M e h i o
V P o f E n g i n e e r i n g , E d i c o G e n o m e
C M P 3 0 8
N o v e m b e r 3 0 , 2 0 1 7
AWS re:INVENT
18. DRAGEN on AWS
Marketplace
P i e te r va n R ooy e n, C EO a nd F ounde r
R a mi M e hi o, V P of Eng i ne e r i ng
19. Edico Genome overview
50
Employees
World Record for
Fastest Genetic
Diagnosis
Founded in
Jan 2013
Located in
San Diego, CA
11Issued
20Pending
Patents
17PetaBytes
Processed by
Customers to Date
Lead Investors
Qualcomm
Dell EMC
Cloud
App
Major Tech
Partnerships
20. Genomic big data
By 2025, genomics could well represent the biggest of big
data fields
Source: Challenges For Genomics In The Age Of Big Data, July 2015,
Forbes
Twitter GenomicsYouTube Astronomy
1 Zettabyte
21. Genomic data and Moore’s Law
2016 2017 2018 2019 2020
Genomic Data
Doubles Every Seven
Months
Moore’s Law
Doubles Every Two
Years?
Alternative
technologies are
needed to
address big data
challenges
23. DRAGEN Complete Suite
Somatic V2 RNA
Tumor-Only
and
Tumor/Normal
Analysis
Transcriptome
Analysis with
Splice Junction
Alignment
Germline V2
Clinical Grade
End-to-End
BCLàVCF
Including
Advanced PCR
Error
Correction
Available
Today!
GATK Best
Practices
100% GATK
Concordance
Population
Flexible Family
Trio or Large
Scale Joint
Genotyping
Cohort Analysis
VLRD
Virtual Long
Read Detection
on
CNV
Copy Number
Variant Analysis
for Somatic
Exome
Methylation
Methyl-Seq
or BS-Seq
Available
Soon
RNA V2
Transcriptome
Analysis with
Splice Junction
Alignment
Coming Soon:
Differential
Expression
24. Acceleration: How do we do it?
DRAGEN FPGA platform enables massive parallel processing resulting in revolutionary data analysis
capabilities
25. DRAGEN software/hardware stack
FPGA accelerator is the foundation and the key driver of revolutionary compute+storage platform applications
User Interface Layer
HAL
DMA Driver
IO Layer
Pipeline Layer
SW Stack
Arbiter
CROSSBAR
4x
DDR4
Ctrlr
Accelerator
Engine 2
Accelerator
Engine 4
Accelerator
Engine 1
Accelerator
Engine 3
4x16 GB
DDR4
Memory
PCIe 3.0 x8 Interface
N channel DMA
Application Host Memory
APPLICATION
AppspecificGeneric
26. DRAGEN architecture
a n d h a r d w a r e p o r t t o F 1
Specificity
Architecture key points
• SW HAL to insulate application code
from the platform
• Edico DMA SW driver and HW DMA
channel to be independent of FPGA
device vendor
• Separate HW infrastructure layer from
acceleration layer
• Integrate DRAGEN HW infrastructure
layer with F1 instance HDK
• Size acceleration clusters for VU9P
device
• Tradeoff cluster size as opposed to clock
speed
27. DRAGEN run time acceleration
o v e r C P U - o n l y s o l u t i o n s
Mapping/Aligning MAP/A/Sort/Dedup/VC
Onsite AWS Onsite AWS F1.2X AWS F1.16X
30X Whole
Human Genome
8 min 4 min 20 min 59 min 17 min
Exome 1 min 30 sec 2 min 3 min 1.5 min
Acceleration over CPU Only Normalized by Number of Cores
Current
Times
Acceleration over CPU
only solution
Projected
Times
Acceleration Over CPU
Only Solution
F1 – 2X 59 min 32x 44 min 43x
F1 – 16X 17 min 26x 10-13 min 40x
Onsite 20 min 29x 14 min 40x
28. DRAGEN Germline Pipeline: Analysis
Time for Genomes
FASTQ BAM
VCF/gVCF
DRAGEN Complete Suite
Whole Genome, Exome & Panels
Version 2
DRAGEN Execution Time
FASTQ
on
S3
FASTQ
on
Instance
Disk
Input file
download
BAM/VCF
on
instance
Disk
BAM/VCF
on
S3
Output file upload
Hash
Table
S3
Hash
Table on
Instance
Disk
Reference
download
29. DRAGEN Genome Pipeline execution:
F1.2Xlarge
DRAGEN Complete Suite
Whole Genome, Exome & Panels
Version 2
10 Min20 Min 60 min 15 min
FASTQ BAM
VCF/gVCF
DRAGEN Execution Time
FASTQ
on
S3
FASTQ
on
Instance
Disk
Input file
download
BAM/VCF
on
instance
Disk
BAM/VCF
on
S3
Output file upload
Hash
Table
S3
Hash
Table on
Instance
Disk
Reference
download
30. Input streaming: F1.2Xlarge
DRAGEN execution time
S3
streaming Output file
upload
Reference
download
10 Min
60 min 15 min
30
s
FASTQ BAM
VCF/gVCF
BAM/VCF
on
instance
Disk
BAM/VCF
on
S3
Hash
Table
S3
Hash
Table on
Instance
Disk
Reference
download
10 Min
60 min 15 min
31. Output file streaming to Amazon S3
FASTQ BAM
VCF/gVCF
DRAGEN execution time
Input S3
streaming
Output file
streaming
Reference
download
2 min
60 min30s 30s
32. Optimized solution on F1.16Xlarge
FASTQ BAM
VCF/gVCF
DRAGEN
execution time
Input S3
streaming
Output file
streaming
Reference
download
1 min
17 min30s 30s
34. Product release roadmap
• Map/Align
• Sort/Dedup
• Variant Calling
Complete Suite
• Alt-Aware
Mapping
• Adv. Error
Detection
• Next
Generation
Accuracy
• Discrete VLRD
• Integrated VLRD
• Integrated FRD
• CNV
V1 V2 V3
Previous Available Today! Q1 2018
For Genomes and Exomes
Somatic V2 RNA Germline V2 GATK Best
Practices
Population VLRD
35. DRAGEN Germline V2 pipeline
gain in SNP detection performance large gain in indel detection
performance
Comparison against best-performing GATK-HC mode (BQSR)
36. DRAGEN Somatic V2 pipeline
DRAGEN Somatic v. 2
Mutect2
DRAGEN Somatic v. 2
Mutect2
DRAGEN Somatic v. 2
Mutect2
DRAGEN Somatic v. 2
Mutect2
41. Network
architecture
Control
• Web VPC + Database VPC
• No customer data
Compute (region specific)
• Auto scaled Dragen instances
• Dragen receives job description
from control channel
• Dragen streams data from
Amazon S3, performs
computation and uploads it
back to S3
• All Dragen <=> S3
communication is over HTTPS
• No inter-Dragen instance
communication
43. Guinness World Record: Analysis Overview
DRAGEN Germline Pipeline V2
1000x f1.2xlarge instances
Upload VCF files to
S3
Download FASTQs from
S3 to EBS
Average: 111 min
1,020 Genomes Analyzed
44. Summary
§ FPGA acceleration results in up to 43X improvement for genomics
applications
§ Streaming I/O using Amazon S3 greatly increases throughput
§ Parallelizing across multiple FPGAs using F1.16xlarge results in
another 4X+ acceleration
§ Per-second billing and Spot instances provide opportunities for
additional cost savings
§ Deployment to F1 FPGA instances via Marketplace makes accelerated
genomics widely available