SlideShare a Scribd company logo
1 of 22
Download to read offline
Revolutionizing the Datacenter
Join the Conversation #OpenPOWERSummit
Accelerating Genome Assembly with
Power8
Seung-Jong Park, Ph.D.
School of EECS, CCT, Louisiana State University
Join the Conversation #OpenPOWERSummit
Agenda
The Genome Assembly Problem
Accelerating Graph Construction with POWER8
Accelerating Graph Simplification with IBM CAPIĀ®
Flash and Redis NoSQL database.
25/8/2016
The Genome Assembly Problem
35/8/2016
NGS Technologies Outpaced Mooreā€™s Law
Software with Extreme Scalability
HPC Platform
ā€¢ More Compute Cycles
ā€¢ Extreme I/O Performance
ā€¢ Huge Storage Space
Challenges for Genome Assemblers
45/8/2016
Genome
NGS
Reads
(TBs)
HPC
Re-
constructed
Genome
(MBs/GBs)Data and
Compute
Intensive
MapReduce-based Graph Construction
55/8/2016
TAGTCGAGG
CT
TAGTCGAGG
CTGGCTTTAGAT
C
GGCTTTAGAT
CTGAGGCTTTA
G
TGAGGCTTTA
G
Map
TTTAGAGACA
G
TTTAGAGACA
GGATCCGATGA
G
GATCCGATGA
GTAGTCGAGG
CT
TAGTCGAGG
CT
Map
TTTA:G
TAGT:C
TTAG:A
TAGA:G
TCCG:
A
TCCG:
ATGAG:
N
TGAG:
N
TCGA:
G
TCGA:
G
AGAG:
A
AGAG:
A
AGAC:A
ACAG:
N
ACAG:
NATCC:
G
ATCC:
GCCGA:
T
CCGA:
TCGAT:
G
CGAT:
G
ATGA:G
AGTC:
G
AGTC:
GCGAG:
G
CGAG:
GAGGC:
T
AGGC:
T
GATC:
C
GATC:
C
GAGA:
C
GAGA:
CGACA:
G
GACA:
G
GATG:A
GTCG:
A
GTCG:
AGAGG:
C
GAGG:
CGGCT:
N
GGCT:
N
GGCT:
T
GGCT:
T
GTCG:
A
GTCG:
AGAGG:
C
GAGG:
CGGCT:
N
GGCT:
N
GCTT:T
GATC:
N
GATC:
NGAGG:
C
GAGG:
CGGCT:
T
GGCT:
T
GCTT:T
AGTC:
G
AGTC:
GCGAG:
G
CGAG:
GAGGC:
T
AGGC:
T
CTTT:A
AGAT:C
AGGC:
T
AGGC:
T
CTTT:A
TAGT:C
TGAG:
G
TGAG:
G
TCGA:
G
TCGA:
G
TTTA:G
TTAG:A
TAGA:T
TTTA:G
TTAG:N
Reduce
Reduce
Reduce
TAGA:G,T
TAGT:C
TCCG:A
TCGA:G
TGAG:G
TTAG:A
TTTA:G
ACAG:N
AGAC:A
AGAG:A
AGAT:C
AGGC:T
AGTC:G
ATCC:G
ATGA:G
CCGA:T
CGAG:G
CGAT:G
CTTT:A
GACA:G
GAGA:C
GAGG:C
GATC:C
GATG:A
GCTT:T
GGCT:T
GTCG:A
Accelerating Graph Construction with POWER8
65/8/2016
Experimental Test Beds
75/8/2016
System Type IBM PKY Cluster LSU SuperMikeII
Processor Two 10-core IBM Power8 Two 8-core Intel SandyBridge Xeon
Maximum #Nodes used in various
experiments
40 120
#Physical cores/node 20 (8 Simultaneous Multi-Thread) 16 (Hyper threading disabled)
#vcores/node 160 16
RAM/node (GB) 256 32
#Disks/node 5 3
#Disks/node used for shuffled data 3 1
Total Storage space/node used for shuffled
data
1.8 0.5
Network 56Gbps InfiniBand (non-blocking) 40Gbps InfiniBand (2:1 blockings)
Datasets
85/8/2016
Genome data set Input size Shuffle data
size
Output size
Rice genome 12GB 70GB 50GB
Bumble bee genome 90GB 600GB 95GB
Metagenome 3.2TB 20TB 8.6TB
Input data set to stage 2 Key-value Stores
With Redis NoSql and IBM Power8-CAPI -Flash
Hadoop Configurations
95/8/2016
Hadoop Parameters IBM Power8 SuperMikeII
Yarn.nodemanager.cpu.resource.vcore 120 16
Yarn.nodemanager.memory.mb 231000 29000
Mapreduce.map/reduce.cpu.vcore 4 2
Mapreduce.map/reduce.memory.mb 7000 3500
Mapreduce.map/reduce.java.opts 6500m 3000m
Hadoop Scalability with POWER8 SMTs
Tested with small size rice genome data on 2 node
Almost linear scalability with increasing SMTs
105/8/2016
Rice Genome
Analyzing small size (12GB) data
Eliminate the impact of network and disk I/O
7.5X performance improvement per server
115/8/2016
Bumble Bee Genome
Analyzing Medium size (90GB) Bumble Bee genome
7.5x improvement in terms of Performance/server
125/8/2016
Metagenome Stage 1
Analyzing huge (3.2TB) metagenome data
Only 6.5 hours on 40-node IBM Power8 cluster
More than 9x improvement in terms of performance
per server
135/8/2016
IBM Data Engine for NoSQL
Performance and Value
Stage 2 Requires Large Memory access that isnā€™t readily available via
traditional compute processing.
Custom
Hardware
Application
POWER8
CAPP
Coherence Bus
PSL
FPGA or ASIC
Customizable Hardware
Application Accelerator
ā€¢ Specific system SW, middleware, or user application
ā€¢ Written to durable interface provided by PSL
POWER8
PCIe Gen 3
Transport for encapsulated messages
Processor Service Layer (PSL)
ā€¢ Present robust, durable interfaces to applications
ā€¢ Offload complexity / content from CAPP
Virtual Addressing
ā€¢ Accelerator can work with same memory addresses that the processors use
ā€¢ Pointers de-referenced same as the host application
ā€¢ Removes OS & device driver overhead
Hardware Managed Cache Coherence
ā€¢ Enables the accelerator to participate in ā€œLocksā€ as a normal thread Lowers
latency over IO communication model
POWER8 CAPI (Coherent Accelerator Processor
Interface)
Redis Labs Exploits the IBM Data Engine for NoSQL
Redis stores key-value pairs
ā€¢ Key-value pairs may be variable size, in any
format (Text, Document, JPEG, Video, etc.)
Basic operations are ā€œSETā€ and ā€œGETā€
> SET 100001 ā€œCAPI is Fastā€
> GET 100001
ā€œCAPI is Fastā€
> ...
Database Characteristics
ā€¢ 90 GB MAX Capacity, up to 10 GB RAM, and 80 GB Flash
ā€¢ key-value pairs are 1,000 bytes of random data
ā€¢ DB filled with ~50GB of data (42.5 million keys)
Client Characteristics
ā€¢ 288 clients, randomly issuing Redis GETs or SETs
ā€¢ ~50% of keys from RAM, ~50% from CAPI-Accelerated Flash
Demo System:
ā€¢ IBM Power System S812L
ā€¢ 1 POWER8 Socket
ā€¢ 2 IBM DataEngine for NoSQL CAPI Accelerators
ā€¢ 1 FlashSystem 840
ā€¢ Ubuntu 14.10
ā€¢ Redis Labs Enterprise Cluster (Beta)
Set Key = Value
Retrieve Key
10Gb Uplinks
Power8 Server
Flash Array w/ up
To 56TB
Demonstration Platform
(POWER8 + CAPI Flash)
Infrastructure Attributes
- up to 192 threads in 2U Server drawer
- up to 56 TB of memory based Flash per 2U Drawer
- Shared Memory & Cache for dynamic tuning
WWW
OpenPower Partner Redis Labsā€™s highly-differentiated product
offering built on CAPI is available today.
Demo Link
IBM Data Engine for NoSQL + Redis Labs Value
Built on Open APIs
ā€¢ Leverages IBM DataEngine for NoSQL APIs
Redis Labs Enterprise Cluster provides
near Speed of RAM, with the Capacity of
Flash
ā€¢ Leverages IBM DataEngine for NoSQL CAPI Accelerator for
high-speed, low-latency link to Flash
Controls use of Memory, Flash, and Cost!
ā€¢ Hot Data Maintained in RAM
ā€¢ Provides ISPs and MSPs up to 72% Cost Savings
When 80% of Data is in Flash
Redis Labs Enterprise Cluster allows the user to select the ratio of
RAM and flash with a simple slider, when using POWER8 with the
IBM Data Engine for NoSQL.
Load Balancer
500GB Cache
Node
10Gb Uplink
POWER8 Server
Flash Array w/ up
to 56TB
Differentiated NoSQL
(POWER8 + FlashSystem with CAPI)
Infrastructure Attributes
- 192 threads in 4U server drawer
- 56 TB of flash per 2U drawer
- Shared Memory & cache for dynamic tuning
- Elimination of I/O and network overhead
- Cluster solution in a box
Todayā€™s NoSQL in memory (x86)
Infrastructure Requirements
- Large distributed (Scale out)
- Large memory per node
- Networking bandwidth needs
- Load balancing
Power CAPI-attached FlashSystem for NoSQL regains
infrastructure control and reigns in the cost to deliver services.
WWW10Gb Uplink
WWW
Backup Nodes
500GB Cache
Node
500GB Cache
Node
500GB Cache
Node500GB Cache
Node
What CAPI Means for NoSQL Solutions
Big Redis w/ CAPI Flash Offers New Performance / Cost Points
Users pick the performance / cost point that meets their solution
needs, be it IOPs Rate or Latency requirements.
*typical workload
0% 18% 45% 72% 81%
AverageLatency(ms)
1
5
8
9
10
% Implementation Savings
100% 80% 50% 20% 10%
IOPS at 1 ms Latency
382K 208K 188K 175K
2.5M
366-750K
1.35M
483-950K
671-1250K
IOPS at Max Throughput
DRAM / FLASH Ratio
Stage 2 Graph Simplification with Distributed NoSQL
205/8/2016
TAGA:G,T
TAGT:C
TCCG:A
TCGA:G
TGAG:G
TTAG:A
TTTA:G
ACAG:N
AGAC:A
AGAG:A
AGAT:C
AGGC:T
AGTC:G
ATCC:G
ATGA:G
GACA:G
GAGA:C
GAGG:C
GATC:C
GATG:A
GCTT:T
GGCT:T
GTCG:A
CCGA:T
CGAG:G
CGAT:G
CTTT:A
TAGTCGAG GAGGCTTTAGA
Accelerating Simplification with IBM CAPI Flash
215/8/2016
NoSQL I/O
Throughput
(keys/sec)
CAPI Flash I/O
Throughput
(bytes/sec)
Only 20 Power8 Cores + CAPI :
500GB Graph traversal in
7.5 Hrs
LSU Project Contributors
Arghaya Kasuum Das, PhD Stident
Sayan Goswami, PhD Student
Richard Platania, PhD student
Terry Leatherland, IBM Systems Architect.
Fall/Winter 2015 project
5/8/2016 22

More Related Content

What's hot

RedisConf17 - Home Depot - Turbo charging existing applications with Redis
RedisConf17 - Home Depot - Turbo charging existing applications with RedisRedisConf17 - Home Depot - Turbo charging existing applications with Redis
RedisConf17 - Home Depot - Turbo charging existing applications with Redis
Redis Labs
Ā 
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
Michael Stack
Ā 
RedisConf18 - Ultra Scaling with Redis Enterprise
RedisConf18 - Ultra Scaling with Redis EnterpriseRedisConf18 - Ultra Scaling with Redis Enterprise
RedisConf18 - Ultra Scaling with Redis Enterprise
Redis Labs
Ā 
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
Michael Stack
Ā 

What's hot (20)

Š’Š¾Š»Š¾Š“ŠøŠ¼Šøр Š¦Š°Šæ "Constraint driven infrastructure - scale or tune?"
Š’Š¾Š»Š¾Š“ŠøŠ¼Šøр Š¦Š°Šæ "Constraint driven infrastructure - scale or tune?"Š’Š¾Š»Š¾Š“ŠøŠ¼Šøр Š¦Š°Šæ "Constraint driven infrastructure - scale or tune?"
Š’Š¾Š»Š¾Š“ŠøŠ¼Šøр Š¦Š°Šæ "Constraint driven infrastructure - scale or tune?"
Ā 
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
Ā 
Š Š¾Š¼Š°Š½ ŠŠ¾Š²ŠøŠŗŠ¾Š² "Best Practices for MySQL Performance & Troubleshooting with th...
Š Š¾Š¼Š°Š½ ŠŠ¾Š²ŠøŠŗŠ¾Š² "Best Practices for MySQL Performance & Troubleshooting with th...Š Š¾Š¼Š°Š½ ŠŠ¾Š²ŠøŠŗŠ¾Š² "Best Practices for MySQL Performance & Troubleshooting with th...
Š Š¾Š¼Š°Š½ ŠŠ¾Š²ŠøŠŗŠ¾Š² "Best Practices for MySQL Performance & Troubleshooting with th...
Ā 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Ā 
Scaling HDFS at Xiaomi
Scaling HDFS at XiaomiScaling HDFS at Xiaomi
Scaling HDFS at Xiaomi
Ā 
RedisConf17 - Home Depot - Turbo charging existing applications with Redis
RedisConf17 - Home Depot - Turbo charging existing applications with RedisRedisConf17 - Home Depot - Turbo charging existing applications with Redis
RedisConf17 - Home Depot - Turbo charging existing applications with Redis
Ā 
HBaseConAsia2018 Keynote1: Apache HBase Project Status
HBaseConAsia2018 Keynote1: Apache HBase Project StatusHBaseConAsia2018 Keynote1: Apache HBase Project Status
HBaseConAsia2018 Keynote1: Apache HBase Project Status
Ā 
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
Ā 
RedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power SystemsRedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power Systems
Ā 
25 snowflake
25 snowflake25 snowflake
25 snowflake
Ā 
HBaseConAsia2018 Track3-2: HBase at China Telecom
HBaseConAsia2018 Track3-2:  HBase at China TelecomHBaseConAsia2018 Track3-2:  HBase at China Telecom
HBaseConAsia2018 Track3-2: HBase at China Telecom
Ā 
Espresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom QuiggleEspresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom Quiggle
Ā 
HBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at XiaomiHBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at Xiaomi
Ā 
HBaseConAsia2018 Track3-6: HBase at Meituan
HBaseConAsia2018 Track3-6: HBase at MeituanHBaseConAsia2018 Track3-6: HBase at Meituan
HBaseConAsia2018 Track3-6: HBase at Meituan
Ā 
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a Flurry
Ā 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
Ā 
RedisConf18 - Ultra Scaling with Redis Enterprise
RedisConf18 - Ultra Scaling with Redis EnterpriseRedisConf18 - Ultra Scaling with Redis Enterprise
RedisConf18 - Ultra Scaling with Redis Enterprise
Ā 
Hadoop engineering bo_f_final
Hadoop engineering bo_f_finalHadoop engineering bo_f_final
Hadoop engineering bo_f_final
Ā 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Ā 
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
Ā 

Similar to Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry Leatherland, IBM

Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bk
IBM Switzerland
Ā 

Similar to Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry Leatherland, IBM (20)

POWER9 for AI & HPC
POWER9 for AI & HPCPOWER9 for AI & HPC
POWER9 for AI & HPC
Ā 
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems SpecialistOWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
Ā 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
Ā 
@IBM Power roadmap 8
@IBM Power roadmap 8 @IBM Power roadmap 8
@IBM Power roadmap 8
Ā 
Power overview 2018 08-13b
Power overview 2018 08-13bPower overview 2018 08-13b
Power overview 2018 08-13b
Ā 
Demystify OpenPOWER
Demystify OpenPOWERDemystify OpenPOWER
Demystify OpenPOWER
Ā 
IBM Power Systems: Designed for Data
IBM Power Systems: Designed for DataIBM Power Systems: Designed for Data
IBM Power Systems: Designed for Data
Ā 
OpenCAPI next generation accelerator
OpenCAPI next generation accelerator OpenCAPI next generation accelerator
OpenCAPI next generation accelerator
Ā 
IBM Power9 Features and Specifications
IBM Power9 Features and SpecificationsIBM Power9 Features and Specifications
IBM Power9 Features and Specifications
Ā 
HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand Challenge
Ā 
Understanding the IBM Power Systems Advantage
Understanding the IBM Power Systems AdvantageUnderstanding the IBM Power Systems Advantage
Understanding the IBM Power Systems Advantage
Ā 
April 2014 IBM announcement webcast
April 2014 IBM announcement webcastApril 2014 IBM announcement webcast
April 2014 IBM announcement webcast
Ā 
IBM Power8 announce
IBM Power8 announceIBM Power8 announce
IBM Power8 announce
Ā 
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmarkThe Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
Ā 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ā 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
Ā 
IBM POWER - An ideal platform for scale-out deployments
IBM POWER - An ideal platform for scale-out deploymentsIBM POWER - An ideal platform for scale-out deployments
IBM POWER - An ideal platform for scale-out deployments
Ā 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud Datacenters
Ā 
OpenCAPI Technology Ecosystem
OpenCAPI Technology EcosystemOpenCAPI Technology Ecosystem
OpenCAPI Technology Ecosystem
Ā 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
Ā 

More from Redis Labs

Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Redis Labs
Ā 
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
Redis Labs
Ā 
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Redis Labs
Ā 
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
Ā 
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
Redis Labs
Ā 

More from Redis Labs (20)

Redis Day Bangalore 2020 - Session state caching with redis
Redis Day Bangalore 2020 - Session state caching with redisRedis Day Bangalore 2020 - Session state caching with redis
Redis Day Bangalore 2020 - Session state caching with redis
Ā 
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Ā 
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
Ā 
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
Ā 
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Ā 
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis for Data Science and Engineering by Dmitry Polyakovsky of OracleRedis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Ā 
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Ā 
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Ā 
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Ā 
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
Ā 
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Ā 
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Ā 
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Ā 
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
Ā 
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
Ā 
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
Ā 
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
Ā 
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Ā 
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Ā 
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Ā 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
Ā 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Ā 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
Ā 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Ā 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Ā 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
Ā 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Ā 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Ā 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Ā 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜
Ā 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Ā 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Ā 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
Ā 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Ā 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Ā 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
Ā 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Ā 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Ā 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Ā 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Ā 

Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry Leatherland, IBM

  • 1. Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit Accelerating Genome Assembly with Power8 Seung-Jong Park, Ph.D. School of EECS, CCT, Louisiana State University Join the Conversation #OpenPOWERSummit
  • 2. Agenda The Genome Assembly Problem Accelerating Graph Construction with POWER8 Accelerating Graph Simplification with IBM CAPIĀ® Flash and Redis NoSQL database. 25/8/2016
  • 3. The Genome Assembly Problem 35/8/2016
  • 4. NGS Technologies Outpaced Mooreā€™s Law Software with Extreme Scalability HPC Platform ā€¢ More Compute Cycles ā€¢ Extreme I/O Performance ā€¢ Huge Storage Space Challenges for Genome Assemblers 45/8/2016 Genome NGS Reads (TBs) HPC Re- constructed Genome (MBs/GBs)Data and Compute Intensive
  • 5. MapReduce-based Graph Construction 55/8/2016 TAGTCGAGG CT TAGTCGAGG CTGGCTTTAGAT C GGCTTTAGAT CTGAGGCTTTA G TGAGGCTTTA G Map TTTAGAGACA G TTTAGAGACA GGATCCGATGA G GATCCGATGA GTAGTCGAGG CT TAGTCGAGG CT Map TTTA:G TAGT:C TTAG:A TAGA:G TCCG: A TCCG: ATGAG: N TGAG: N TCGA: G TCGA: G AGAG: A AGAG: A AGAC:A ACAG: N ACAG: NATCC: G ATCC: GCCGA: T CCGA: TCGAT: G CGAT: G ATGA:G AGTC: G AGTC: GCGAG: G CGAG: GAGGC: T AGGC: T GATC: C GATC: C GAGA: C GAGA: CGACA: G GACA: G GATG:A GTCG: A GTCG: AGAGG: C GAGG: CGGCT: N GGCT: N GGCT: T GGCT: T GTCG: A GTCG: AGAGG: C GAGG: CGGCT: N GGCT: N GCTT:T GATC: N GATC: NGAGG: C GAGG: CGGCT: T GGCT: T GCTT:T AGTC: G AGTC: GCGAG: G CGAG: GAGGC: T AGGC: T CTTT:A AGAT:C AGGC: T AGGC: T CTTT:A TAGT:C TGAG: G TGAG: G TCGA: G TCGA: G TTTA:G TTAG:A TAGA:T TTTA:G TTAG:N Reduce Reduce Reduce TAGA:G,T TAGT:C TCCG:A TCGA:G TGAG:G TTAG:A TTTA:G ACAG:N AGAC:A AGAG:A AGAT:C AGGC:T AGTC:G ATCC:G ATGA:G CCGA:T CGAG:G CGAT:G CTTT:A GACA:G GAGA:C GAGG:C GATC:C GATG:A GCTT:T GGCT:T GTCG:A
  • 6. Accelerating Graph Construction with POWER8 65/8/2016
  • 7. Experimental Test Beds 75/8/2016 System Type IBM PKY Cluster LSU SuperMikeII Processor Two 10-core IBM Power8 Two 8-core Intel SandyBridge Xeon Maximum #Nodes used in various experiments 40 120 #Physical cores/node 20 (8 Simultaneous Multi-Thread) 16 (Hyper threading disabled) #vcores/node 160 16 RAM/node (GB) 256 32 #Disks/node 5 3 #Disks/node used for shuffled data 3 1 Total Storage space/node used for shuffled data 1.8 0.5 Network 56Gbps InfiniBand (non-blocking) 40Gbps InfiniBand (2:1 blockings)
  • 8. Datasets 85/8/2016 Genome data set Input size Shuffle data size Output size Rice genome 12GB 70GB 50GB Bumble bee genome 90GB 600GB 95GB Metagenome 3.2TB 20TB 8.6TB Input data set to stage 2 Key-value Stores With Redis NoSql and IBM Power8-CAPI -Flash
  • 9. Hadoop Configurations 95/8/2016 Hadoop Parameters IBM Power8 SuperMikeII Yarn.nodemanager.cpu.resource.vcore 120 16 Yarn.nodemanager.memory.mb 231000 29000 Mapreduce.map/reduce.cpu.vcore 4 2 Mapreduce.map/reduce.memory.mb 7000 3500 Mapreduce.map/reduce.java.opts 6500m 3000m
  • 10. Hadoop Scalability with POWER8 SMTs Tested with small size rice genome data on 2 node Almost linear scalability with increasing SMTs 105/8/2016
  • 11. Rice Genome Analyzing small size (12GB) data Eliminate the impact of network and disk I/O 7.5X performance improvement per server 115/8/2016
  • 12. Bumble Bee Genome Analyzing Medium size (90GB) Bumble Bee genome 7.5x improvement in terms of Performance/server 125/8/2016
  • 13. Metagenome Stage 1 Analyzing huge (3.2TB) metagenome data Only 6.5 hours on 40-node IBM Power8 cluster More than 9x improvement in terms of performance per server 135/8/2016
  • 14. IBM Data Engine for NoSQL Performance and Value Stage 2 Requires Large Memory access that isnā€™t readily available via traditional compute processing.
  • 15. Custom Hardware Application POWER8 CAPP Coherence Bus PSL FPGA or ASIC Customizable Hardware Application Accelerator ā€¢ Specific system SW, middleware, or user application ā€¢ Written to durable interface provided by PSL POWER8 PCIe Gen 3 Transport for encapsulated messages Processor Service Layer (PSL) ā€¢ Present robust, durable interfaces to applications ā€¢ Offload complexity / content from CAPP Virtual Addressing ā€¢ Accelerator can work with same memory addresses that the processors use ā€¢ Pointers de-referenced same as the host application ā€¢ Removes OS & device driver overhead Hardware Managed Cache Coherence ā€¢ Enables the accelerator to participate in ā€œLocksā€ as a normal thread Lowers latency over IO communication model POWER8 CAPI (Coherent Accelerator Processor Interface)
  • 16. Redis Labs Exploits the IBM Data Engine for NoSQL Redis stores key-value pairs ā€¢ Key-value pairs may be variable size, in any format (Text, Document, JPEG, Video, etc.) Basic operations are ā€œSETā€ and ā€œGETā€ > SET 100001 ā€œCAPI is Fastā€ > GET 100001 ā€œCAPI is Fastā€ > ... Database Characteristics ā€¢ 90 GB MAX Capacity, up to 10 GB RAM, and 80 GB Flash ā€¢ key-value pairs are 1,000 bytes of random data ā€¢ DB filled with ~50GB of data (42.5 million keys) Client Characteristics ā€¢ 288 clients, randomly issuing Redis GETs or SETs ā€¢ ~50% of keys from RAM, ~50% from CAPI-Accelerated Flash Demo System: ā€¢ IBM Power System S812L ā€¢ 1 POWER8 Socket ā€¢ 2 IBM DataEngine for NoSQL CAPI Accelerators ā€¢ 1 FlashSystem 840 ā€¢ Ubuntu 14.10 ā€¢ Redis Labs Enterprise Cluster (Beta) Set Key = Value Retrieve Key 10Gb Uplinks Power8 Server Flash Array w/ up To 56TB Demonstration Platform (POWER8 + CAPI Flash) Infrastructure Attributes - up to 192 threads in 2U Server drawer - up to 56 TB of memory based Flash per 2U Drawer - Shared Memory & Cache for dynamic tuning WWW OpenPower Partner Redis Labsā€™s highly-differentiated product offering built on CAPI is available today. Demo Link
  • 17. IBM Data Engine for NoSQL + Redis Labs Value Built on Open APIs ā€¢ Leverages IBM DataEngine for NoSQL APIs Redis Labs Enterprise Cluster provides near Speed of RAM, with the Capacity of Flash ā€¢ Leverages IBM DataEngine for NoSQL CAPI Accelerator for high-speed, low-latency link to Flash Controls use of Memory, Flash, and Cost! ā€¢ Hot Data Maintained in RAM ā€¢ Provides ISPs and MSPs up to 72% Cost Savings When 80% of Data is in Flash Redis Labs Enterprise Cluster allows the user to select the ratio of RAM and flash with a simple slider, when using POWER8 with the IBM Data Engine for NoSQL.
  • 18. Load Balancer 500GB Cache Node 10Gb Uplink POWER8 Server Flash Array w/ up to 56TB Differentiated NoSQL (POWER8 + FlashSystem with CAPI) Infrastructure Attributes - 192 threads in 4U server drawer - 56 TB of flash per 2U drawer - Shared Memory & cache for dynamic tuning - Elimination of I/O and network overhead - Cluster solution in a box Todayā€™s NoSQL in memory (x86) Infrastructure Requirements - Large distributed (Scale out) - Large memory per node - Networking bandwidth needs - Load balancing Power CAPI-attached FlashSystem for NoSQL regains infrastructure control and reigns in the cost to deliver services. WWW10Gb Uplink WWW Backup Nodes 500GB Cache Node 500GB Cache Node 500GB Cache Node500GB Cache Node What CAPI Means for NoSQL Solutions
  • 19. Big Redis w/ CAPI Flash Offers New Performance / Cost Points Users pick the performance / cost point that meets their solution needs, be it IOPs Rate or Latency requirements. *typical workload 0% 18% 45% 72% 81% AverageLatency(ms) 1 5 8 9 10 % Implementation Savings 100% 80% 50% 20% 10% IOPS at 1 ms Latency 382K 208K 188K 175K 2.5M 366-750K 1.35M 483-950K 671-1250K IOPS at Max Throughput DRAM / FLASH Ratio
  • 20. Stage 2 Graph Simplification with Distributed NoSQL 205/8/2016 TAGA:G,T TAGT:C TCCG:A TCGA:G TGAG:G TTAG:A TTTA:G ACAG:N AGAC:A AGAG:A AGAT:C AGGC:T AGTC:G ATCC:G ATGA:G GACA:G GAGA:C GAGG:C GATC:C GATG:A GCTT:T GGCT:T GTCG:A CCGA:T CGAG:G CGAT:G CTTT:A TAGTCGAG GAGGCTTTAGA
  • 21. Accelerating Simplification with IBM CAPI Flash 215/8/2016 NoSQL I/O Throughput (keys/sec) CAPI Flash I/O Throughput (bytes/sec) Only 20 Power8 Cores + CAPI : 500GB Graph traversal in 7.5 Hrs
  • 22. LSU Project Contributors Arghaya Kasuum Das, PhD Stident Sayan Goswami, PhD Student Richard Platania, PhD student Terry Leatherland, IBM Systems Architect. Fall/Winter 2015 project 5/8/2016 22