SlideShare a Scribd company logo
1 of 20
From Gust To Tempest: Scaling Storm
P R E S E N T E D B Y B o b b y E v a n s
Hi I’m Bobby Evans
bobby@apache.org @bobbydata
2
 Low Latency Data Processing Architect @ Yahoo
 Apache Storm
 Apache Spark
 Apache Kafka
 Committer and PMC member for
 Apache Storm
 Apache Hadoop
 Apache Spark
 Apache TEZ
Agenda
3
 Apache Storm Architecture
 What Was Done Already
 Current/Future Work
background: https://www.flickr.com/photos/gsfc/15072362777
Storm Concepts
1. Streams
 Unbounded sequence of tuples
2. Spout
 Source of Stream
 E.g. Read from Twitter streaming API
3. Bolts
 Processes input streams and produces new
streams
 E.g. Functions, Filters, Aggregation, Joins
4. Topologies
 Network of spouts and bolts
Routing of tuples
 Shuffle grouping: pick a random task
(but with load balancing)
 Fields grouping: consistent hashing on
a subset of tuple fields
 All grouping: send to all tasks
 Global grouping: pick task with lowest
id
 Shuffle or Local grouping: If there is a
local bolt (in the same worker process)
use it otherwise use shuffle
 Partial Key grouping: Fields grouping
but with 2 choices for load balancing.
Storm Architecture
Master
Node
Cluster
Coordination
Worker
processes
Worker
Nimbus
Zookeeper
Zookeeper
Zookeeper
Supervisor
Supervisor
Supervisor
Supervisor Worker
Worker
Worker
Launches
workers
Worker
Task
(Spout A-1)
Task
(Spout A-5)
Task
(Spout A-9)
Task
(Bolt B-3)
Other
Workers
Task
(Acker)
Routing
Current State
w hat w as done alr eady
background: https://www.flickr.com/photos/maf04/14392794749
Largest Topology Growth at Yahoo
9
2013 2014 2015
Executors 100 3000 4000
Workers 40 400 1500
0
500
1000
1500
2000
2500
3000
3500
4000
4500
background: https://www.flickr.com/photos/68942208@N02/16242761551
Cluster Growth at Yahoo
10
0
500
1000
1500
2000
2500
Jun-12
Aug-12
Oct-12
Dec-12
Feb-13
Apr-13
Jun-13
Aug-13
Oct-13
Dec-13
Feb-14
Apr-14
Jun-14
Aug-14
Oct-14
Dec-14
Feb-15
Apr-15
Jun-15
Jun-12 Jan-13 Jan-14 Jan-15 Jun-15
Total Nodes 40 170 600 1100 2300
Largest Cluster 20 60 120 250 300
background: http://bit.ly/1KypnCN
In the Beginning…
11
 Mid 2011:
 Storm is released as open source
 Early 2012:
 Yahoo evaluation begins
 https://github.com/yahoo/storm-perf-test
 Mid 2012:
 Purpose built clusters 10+ nodes
 Early 2013:
 60-node cluster, largest topology 40 workers, 100 executors
 ZooKeeper config -Djute.maxbuffer=4194304
 May 2013:
 Netty messaging layer
 http://yahooeng.tumblr.com/post/64758709722/making-storm-fly-with-netty
 Oct 2013:
 ZooKeeper heartbeat timeout checks
background: https://www.flickr.com/photos/gedas/3618792161
So Far…
 Late 2013:
 ZooKeeper config -Dzookeeper.forceSync=no
 Storm enters Apache Incubator
 Early 2014:
 250-node cluster, largest topology 400 workers, 3,000 executors
 June 2014:
 STORM-376 – Compress ZooKeeper data
 STORM-375 – Check for changes before reading data from ZooKeeper
 Sep 2014
 Storm becomes an Apache Top Level Project
 Early 2015:
 STORM-632 Better grouping for data skew
 STORM-634 Thrift serialization for ZooKeeper data.
 300-node cluster (Tested 400 nodes, 1,200 theoretical maximum)
 Largest topology 1,500 workers, 4,000 executors
background: http://s0.geograph.org.uk/geophotos/02/27/03/2270317_7653a833.jpg
We still have a ways to go
13
Hadoop 5400
Storm 300
Nodes
Largest Cluster Size
We want to get to a
4,000-node Storm
cluster.
Hadoop 41000
Storm 2300
Nodes
Total Nodes
background: https://www.flickr.com/photos/68397968@N07/14600216228
Future and Current Work
how w e ar e going to get to 4,000
background: https://www.flickr.com/photos/12567713@N00/2859921414
Why Can’t Storm Scale?
It’s all about the data.
State Storage (ZooKeeper):
 Limited to disk write speed (80MB/sec typically)
 Scheduling
O(num_execs * resched_rate)
 Supervisor
O(num_supervisors * hb_rate)
 Topology Metrics (worst case)
O(num_execs * num_comps * num_streams * hb_rate)
On one 240-node Yahoo Storm cluster, ZK writes 16 MB/sec, about
99.2% of that is worker heartbeats
Theoretical Limit:
80 MB/sec / 16 MB/sec * 240 nodes = 1,200 nodes
background: http://cnx.org/resources/8ab472b9b2bc2e90bb15a2a7b2182ca45a883e0f/Figure_45_07_02.jpg
Pacemaker
heartbeat server
Simple Secure In-Memory Store for Worker Heartbeats.
 Removes Disk Limitation
 Writes Scale Linearly
(but nimbus still needs to read it all, ideally in 10 sec or less)
240 node cluster’s complete HB state is 48MB, Gigabit is about 125 MB/s
10 s / (48 MB / 125 MB/s) * 240 nodes = 6,250 nodes
1200
6250
Theoretical Maximum Cluster Size
Zookeeper PaceMaker Gigabit
Highly-connected
topologies dominate data
volume.
10 GigE helps
Why Can’t Storm Scale?
It’s all about the data.
All raw data serialized, transferred to UI, de-serialized and aggregated
per page load
Our largest topology uses about 400 MB in memory
Aggregate stats for UI/REST in Nimbus
 10+ min page load to 7 seconds
DDOS on Nimbus for jar download
Distributed Cache/Blob Store (STORM-411)
 Pluggable backend with HDFS support
background: https://www.flickr.com/photos/oregondot/15799498927
Why Can’t Storm Scale?
It’s all about the data.
Storm round-robin scheduling
 R-1/R % of traffic will be off rack where R is
the number of racks
 N-1/N % of traffic will be off node where N is
the number of nodes
 Does not know when resources are full (i.e.
network)
Resource & Network Topography Aware Scheduling
One slow node slows the entire topology.
Load Aware Routing (STORM-162)
Intelligent network aware routing
How does this compare to…
Heron (Twitter) and Apex (DataTorrent)?
 Code not released yet (June 9, 2015 at 6 am Pacific)
› So I have not seen it
 And we are not done yet either
 So, it is hard to tell
Google Cloud Dataflow?
 Open Source API, not implementation
 I have not tested it for scale
 Great stream processing concepts
background: http://www.publicdomainpictures.net/view-image.php?image=38889&picture=heron-2&large=1
Questions?
https://www.flickr.com/photos/51029297@N00/5275403364
bobby@apache.org

More Related Content

What's hot

Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaAndrew Montalenti
 
Spark vs storm
Spark vs stormSpark vs storm
Spark vs stormTrong Ton
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureP. Taylor Goetz
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Stormviirya
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormEugene Dvorkin
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormNati Shalom
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopDataWorks Summit
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceP. Taylor Goetz
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQXin Wang
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
 
Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleDataWorks Summit/Hadoop Summit
 
Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter StormUwe Printz
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormMd. Shamsur Rahim
 
Storm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataStorm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataDataWorks Summit
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignMichael Noll
 

What's hot (20)

Storm and Cassandra
Storm and Cassandra Storm and Cassandra
Storm and Cassandra
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and Kafka
 
Storm
StormStorm
Storm
 
Spark vs storm
Spark vs stormSpark vs storm
Spark vs storm
 
Storm
StormStorm
Storm
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using Storm
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQ
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as example
 
Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter Storm
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
 
Storm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataStorm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-Data
 
Introduction to Apache Storm
Introduction to Apache StormIntroduction to Apache Storm
Introduction to Apache Storm
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
 
Introduction to Storm
Introduction to StormIntroduction to Storm
Introduction to Storm
 

Similar to Scaling Apache Storm (Hadoop Summit 2015)

From Gust To Tempest: Scaling Storm
From Gust To Tempest: Scaling StormFrom Gust To Tempest: Scaling Storm
From Gust To Tempest: Scaling StormDataWorks Summit
 
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013Strata Stinger Talk October 2013
Strata Stinger Talk October 2013alanfgates
 
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종NAVER D2
 
Can we run the Whole Web on Apache Sling?
Can we run the Whole Web on Apache Sling?Can we run the Whole Web on Apache Sling?
Can we run the Whole Web on Apache Sling?Bertrand Delacretaz
 
DrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceDrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceAshok Modi
 
Everything you wanted to know about writing async, concurrent http apps in java
Everything you wanted to know about writing async, concurrent http apps in java Everything you wanted to know about writing async, concurrent http apps in java
Everything you wanted to know about writing async, concurrent http apps in java Baruch Sadogursky
 
Apache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesdayApache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesdayAndrei Savu
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environmentDelhi/NCR HUG
 
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Kyle Hailey
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09Chris Purrington
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Databricks
 
Resource planning on the (Amazon) cloud
Resource planning on the (Amazon) cloudResource planning on the (Amazon) cloud
Resource planning on the (Amazon) cloudEnis Afgan
 
Clug 2011 March web server optimisation
Clug 2011 March  web server optimisationClug 2011 March  web server optimisation
Clug 2011 March web server optimisationgrooverdan
 
Zookeeper Introduce
Zookeeper IntroduceZookeeper Introduce
Zookeeper Introducejhao niu
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network ProcessingRyousei Takano
 
Sector Sphere 2009
Sector Sphere 2009Sector Sphere 2009
Sector Sphere 2009lilyco
 
sector-sphere
sector-spheresector-sphere
sector-spherexlight
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...Glenn K. Lockwood
 
Redundancy for Big Hadoop Clusters is hard - Stuart Pook
Redundancy for Big Hadoop Clusters is hard  - Stuart PookRedundancy for Big Hadoop Clusters is hard  - Stuart Pook
Redundancy for Big Hadoop Clusters is hard - Stuart PookEvention
 
2014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part12014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part1Adam Muise
 

Similar to Scaling Apache Storm (Hadoop Summit 2015) (20)

From Gust To Tempest: Scaling Storm
From Gust To Tempest: Scaling StormFrom Gust To Tempest: Scaling Storm
From Gust To Tempest: Scaling Storm
 
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
 
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
 
Can we run the Whole Web on Apache Sling?
Can we run the Whole Web on Apache Sling?Can we run the Whole Web on Apache Sling?
Can we run the Whole Web on Apache Sling?
 
DrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceDrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performance
 
Everything you wanted to know about writing async, concurrent http apps in java
Everything you wanted to know about writing async, concurrent http apps in java Everything you wanted to know about writing async, concurrent http apps in java
Everything you wanted to know about writing async, concurrent http apps in java
 
Apache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesdayApache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesday
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environment
 
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
 
Resource planning on the (Amazon) cloud
Resource planning on the (Amazon) cloudResource planning on the (Amazon) cloud
Resource planning on the (Amazon) cloud
 
Clug 2011 March web server optimisation
Clug 2011 March  web server optimisationClug 2011 March  web server optimisation
Clug 2011 March web server optimisation
 
Zookeeper Introduce
Zookeeper IntroduceZookeeper Introduce
Zookeeper Introduce
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
 
Sector Sphere 2009
Sector Sphere 2009Sector Sphere 2009
Sector Sphere 2009
 
sector-sphere
sector-spheresector-sphere
sector-sphere
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
 
Redundancy for Big Hadoop Clusters is hard - Stuart Pook
Redundancy for Big Hadoop Clusters is hard  - Stuart PookRedundancy for Big Hadoop Clusters is hard  - Stuart Pook
Redundancy for Big Hadoop Clusters is hard - Stuart Pook
 
2014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part12014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part1
 

Recently uploaded

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Recently uploaded (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Scaling Apache Storm (Hadoop Summit 2015)

  • 1. From Gust To Tempest: Scaling Storm P R E S E N T E D B Y B o b b y E v a n s
  • 2. Hi I’m Bobby Evans bobby@apache.org @bobbydata 2  Low Latency Data Processing Architect @ Yahoo  Apache Storm  Apache Spark  Apache Kafka  Committer and PMC member for  Apache Storm  Apache Hadoop  Apache Spark  Apache TEZ
  • 3. Agenda 3  Apache Storm Architecture  What Was Done Already  Current/Future Work background: https://www.flickr.com/photos/gsfc/15072362777
  • 4. Storm Concepts 1. Streams  Unbounded sequence of tuples 2. Spout  Source of Stream  E.g. Read from Twitter streaming API 3. Bolts  Processes input streams and produces new streams  E.g. Functions, Filters, Aggregation, Joins 4. Topologies  Network of spouts and bolts
  • 5. Routing of tuples  Shuffle grouping: pick a random task (but with load balancing)  Fields grouping: consistent hashing on a subset of tuple fields  All grouping: send to all tasks  Global grouping: pick task with lowest id  Shuffle or Local grouping: If there is a local bolt (in the same worker process) use it otherwise use shuffle  Partial Key grouping: Fields grouping but with 2 choices for load balancing.
  • 7. Worker Task (Spout A-1) Task (Spout A-5) Task (Spout A-9) Task (Bolt B-3) Other Workers Task (Acker) Routing
  • 8. Current State w hat w as done alr eady background: https://www.flickr.com/photos/maf04/14392794749
  • 9. Largest Topology Growth at Yahoo 9 2013 2014 2015 Executors 100 3000 4000 Workers 40 400 1500 0 500 1000 1500 2000 2500 3000 3500 4000 4500 background: https://www.flickr.com/photos/68942208@N02/16242761551
  • 10. Cluster Growth at Yahoo 10 0 500 1000 1500 2000 2500 Jun-12 Aug-12 Oct-12 Dec-12 Feb-13 Apr-13 Jun-13 Aug-13 Oct-13 Dec-13 Feb-14 Apr-14 Jun-14 Aug-14 Oct-14 Dec-14 Feb-15 Apr-15 Jun-15 Jun-12 Jan-13 Jan-14 Jan-15 Jun-15 Total Nodes 40 170 600 1100 2300 Largest Cluster 20 60 120 250 300 background: http://bit.ly/1KypnCN
  • 11. In the Beginning… 11  Mid 2011:  Storm is released as open source  Early 2012:  Yahoo evaluation begins  https://github.com/yahoo/storm-perf-test  Mid 2012:  Purpose built clusters 10+ nodes  Early 2013:  60-node cluster, largest topology 40 workers, 100 executors  ZooKeeper config -Djute.maxbuffer=4194304  May 2013:  Netty messaging layer  http://yahooeng.tumblr.com/post/64758709722/making-storm-fly-with-netty  Oct 2013:  ZooKeeper heartbeat timeout checks background: https://www.flickr.com/photos/gedas/3618792161
  • 12. So Far…  Late 2013:  ZooKeeper config -Dzookeeper.forceSync=no  Storm enters Apache Incubator  Early 2014:  250-node cluster, largest topology 400 workers, 3,000 executors  June 2014:  STORM-376 – Compress ZooKeeper data  STORM-375 – Check for changes before reading data from ZooKeeper  Sep 2014  Storm becomes an Apache Top Level Project  Early 2015:  STORM-632 Better grouping for data skew  STORM-634 Thrift serialization for ZooKeeper data.  300-node cluster (Tested 400 nodes, 1,200 theoretical maximum)  Largest topology 1,500 workers, 4,000 executors background: http://s0.geograph.org.uk/geophotos/02/27/03/2270317_7653a833.jpg
  • 13. We still have a ways to go 13 Hadoop 5400 Storm 300 Nodes Largest Cluster Size We want to get to a 4,000-node Storm cluster. Hadoop 41000 Storm 2300 Nodes Total Nodes background: https://www.flickr.com/photos/68397968@N07/14600216228
  • 14. Future and Current Work how w e ar e going to get to 4,000 background: https://www.flickr.com/photos/12567713@N00/2859921414
  • 15. Why Can’t Storm Scale? It’s all about the data. State Storage (ZooKeeper):  Limited to disk write speed (80MB/sec typically)  Scheduling O(num_execs * resched_rate)  Supervisor O(num_supervisors * hb_rate)  Topology Metrics (worst case) O(num_execs * num_comps * num_streams * hb_rate) On one 240-node Yahoo Storm cluster, ZK writes 16 MB/sec, about 99.2% of that is worker heartbeats Theoretical Limit: 80 MB/sec / 16 MB/sec * 240 nodes = 1,200 nodes background: http://cnx.org/resources/8ab472b9b2bc2e90bb15a2a7b2182ca45a883e0f/Figure_45_07_02.jpg
  • 16. Pacemaker heartbeat server Simple Secure In-Memory Store for Worker Heartbeats.  Removes Disk Limitation  Writes Scale Linearly (but nimbus still needs to read it all, ideally in 10 sec or less) 240 node cluster’s complete HB state is 48MB, Gigabit is about 125 MB/s 10 s / (48 MB / 125 MB/s) * 240 nodes = 6,250 nodes 1200 6250 Theoretical Maximum Cluster Size Zookeeper PaceMaker Gigabit Highly-connected topologies dominate data volume. 10 GigE helps
  • 17. Why Can’t Storm Scale? It’s all about the data. All raw data serialized, transferred to UI, de-serialized and aggregated per page load Our largest topology uses about 400 MB in memory Aggregate stats for UI/REST in Nimbus  10+ min page load to 7 seconds DDOS on Nimbus for jar download Distributed Cache/Blob Store (STORM-411)  Pluggable backend with HDFS support background: https://www.flickr.com/photos/oregondot/15799498927
  • 18. Why Can’t Storm Scale? It’s all about the data. Storm round-robin scheduling  R-1/R % of traffic will be off rack where R is the number of racks  N-1/N % of traffic will be off node where N is the number of nodes  Does not know when resources are full (i.e. network) Resource & Network Topography Aware Scheduling One slow node slows the entire topology. Load Aware Routing (STORM-162) Intelligent network aware routing
  • 19. How does this compare to… Heron (Twitter) and Apex (DataTorrent)?  Code not released yet (June 9, 2015 at 6 am Pacific) › So I have not seen it  And we are not done yet either  So, it is hard to tell Google Cloud Dataflow?  Open Source API, not implementation  I have not tested it for scale  Great stream processing concepts background: http://www.publicdomainpictures.net/view-image.php?image=38889&picture=heron-2&large=1