SlideShare a Scribd company logo
1 of 26
Download to read offline
Low-Latency Streaming
Data Processing in Hadoop
InSemble Inc.
http://www.insemble.com
Agenda
Reference Architecture for Low Latency Streaming1
Storm4
Kafka3
Flume2
Demo5
Hadoop Ecosystem
Source: Apache Hadoop Documentation
Cloudera Platform
Hortonworks Data Platform(HDP)
Real time Stream Processing
Architecture with Hadoop
Flume Architecture
• Distributed system for
collecting and
aggregating from
multiple data stores to a
centralized data store
• Agent is a JVM that
hosts the Flume
components
• Channel will store
message until
picked by a sink
• Different types of
Flume sources
• Source and Sink are
decoupled
Consolidation Architecture
Multiplexing Architecture
Kafka Introduction
• Messaging System which is distributed, partitioned and replicated
• Kafka brokers run as a cluster
• Producers and Consumers can be written in any language
Topic
• Ordered, immutable sequence numbers
• Retains messages until a period of time
• “Offset” of where they are is controlled by the consumer
• Each partition is replicated and has “leader” and 0 or more “follower”. R/W
only done on leader
Producers and Consumers
• Producer controls which partition messages goes to
• Supports both Queuing and Pub/Sub
– Abstraction called Consumer group
• Ordering within Partition
– Ordering for subscriber has to be done with only one subscriber to that
partition
Storm Introduction
• Distributed real time computational system
– Process unbounded streams of data
– Can use multiple programming languages
– Scalable, fault-tolerant and guarantees that data will be processed
• Use Cases
– Real time analytics, online machine learning
– Continuous Computation
– Distributed RPC
– ETL
• Concepts
– Topology
– Spouts
– Bolts
Concepts
• Storm Cluster
– Master node(Nimbus)
• Distributing code
• Assigns tasks to machines
• Monitors for failures
– Worker nodes(Supervisor)
• Starts/stops worker processes
• Each worker process executes subset of a topology
– Zookeeper
• Coordinates between Nimbus and Supervisors
• Nimbus and Supervisors completely stateless
• State maintained by Zookeeper or local disks
Details
• Stream
– Unbounded sequence of tuples
• Spout(write logic)
– Source of stream. Emits tuples
• Bolt(write logic)
– Processes streams and emits tuples
• Topology
– DAG of spouts and bolts
– Submit a topology to a Storm cluster
– Each node runs in parallel and parallelism is controlled
Stream groupings
• Tells a topology how to send tuples between two components
• Since tasks are executed in parallel, how do we control which tasks the
tuples are being sent to
Demo - Twitter TopN Trending Topic
• Use Flume Twitter Source to ingest data
and publish event to Kafka topic
• Use Storm as an Real-Time event
processing system to calculate TopN
trending topic
• Use Redis to store the TopN Result
• Use Node.js/JQuery for visualization
Flow Chart
Twitter
Twitter Source
Flume Agent
Mem Channel Kafka Sink
KafkaKafka SpoutParse Twitter BoltCount Bolt
TopN Ranker Bolt Report Bolt
Storm
RedisNode.js + JQuery
Twitter Source Mem Channel Kafka Sink
Flume Agent — Source
Flume Agent — Channel
Flume Agent — Sink
Storm Topology Design
Submit Topology to Cluster
ParseTweetBolt Code
ParseTweetBolt Code
ParseTweetBolt Code

More Related Content

What's hot

Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Christopher Curtin
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...confluent
 
Apache pulsar - storage architecture
Apache pulsar - storage architectureApache pulsar - storage architecture
Apache pulsar - storage architectureMatteo Merli
 
Deep dive into Apache Kafka consumption
Deep dive into Apache Kafka consumptionDeep dive into Apache Kafka consumption
Deep dive into Apache Kafka consumptionAlexandre Tamborrino
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafkaemreakis
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streamingdatamantra
 
Introduction to Apache BookKeeper Distributed Storage
Introduction to Apache BookKeeper Distributed StorageIntroduction to Apache BookKeeper Distributed Storage
Introduction to Apache BookKeeper Distributed StorageStreamlio
 
Apache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! JapanApache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! JapanStreamNative
 
Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free FridayOtávio Carvalho
 
Pulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scalePulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scaleMatteo Merli
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Gwen (Chen) Shapira
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 

What's hot (20)

Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
Apache pulsar - storage architecture
Apache pulsar - storage architectureApache pulsar - storage architecture
Apache pulsar - storage architecture
 
Deep dive into Apache Kafka consumption
Deep dive into Apache Kafka consumptionDeep dive into Apache Kafka consumption
Deep dive into Apache Kafka consumption
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streaming
 
Introduction to Apache BookKeeper Distributed Storage
Introduction to Apache BookKeeper Distributed StorageIntroduction to Apache BookKeeper Distributed Storage
Introduction to Apache BookKeeper Distributed Storage
 
Apache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! JapanApache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! Japan
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka aws
Kafka awsKafka aws
Kafka aws
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free Friday
 
Pulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scalePulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scale
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Message queues
Message queuesMessage queues
Message queues
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 

Viewers also liked

CST-2015_MWJ-Most Valuable Product
CST-2015_MWJ-Most Valuable ProductCST-2015_MWJ-Most Valuable Product
CST-2015_MWJ-Most Valuable ProductJack Adamson
 
Go beyond analytics
Go beyond analyticsGo beyond analytics
Go beyond analyticsEvalyzer
 
What is user experience
What is user experienceWhat is user experience
What is user experienceEvalyzer
 
Public Sector Show - Speakers Presentation
Public Sector Show  - Speakers PresentationPublic Sector Show  - Speakers Presentation
Public Sector Show - Speakers Presentationacademiesshow
 
Hssc i objective workbook
Hssc i objective workbookHssc i objective workbook
Hssc i objective workbookEngin Basturk
 
VCR Presentation Jessup
VCR Presentation JessupVCR Presentation Jessup
VCR Presentation JessupJames Jessup
 
hivve.me למידה שיתופית
hivve.me   למידה שיתופיתhivve.me   למידה שיתופית
hivve.me למידה שיתופיתhivve
 
JessupPTRComprehensiveAssignmentPt2
JessupPTRComprehensiveAssignmentPt2JessupPTRComprehensiveAssignmentPt2
JessupPTRComprehensiveAssignmentPt2James Jessup
 
Seniors' Stories event - keepsake booklet
Seniors' Stories event - keepsake bookletSeniors' Stories event - keepsake booklet
Seniors' Stories event - keepsake bookletAna-Maria Gheorghiu
 
hivve.me - Collaborative messeneger
hivve.me - Collaborative messeneger hivve.me - Collaborative messeneger
hivve.me - Collaborative messeneger hivve
 

Viewers also liked (19)

Bunga
BungaBunga
Bunga
 
CST-2015_MWJ-Most Valuable Product
CST-2015_MWJ-Most Valuable ProductCST-2015_MWJ-Most Valuable Product
CST-2015_MWJ-Most Valuable Product
 
Go beyond analytics
Go beyond analyticsGo beyond analytics
Go beyond analytics
 
What is user experience
What is user experienceWhat is user experience
What is user experience
 
NEGOSIASI
NEGOSIASINEGOSIASI
NEGOSIASI
 
FINAL RESUME AS ON 010816
FINAL RESUME AS ON 010816FINAL RESUME AS ON 010816
FINAL RESUME AS ON 010816
 
Q distance
Q distanceQ distance
Q distance
 
Public Sector Show - Speakers Presentation
Public Sector Show  - Speakers PresentationPublic Sector Show  - Speakers Presentation
Public Sector Show - Speakers Presentation
 
Bunga
BungaBunga
Bunga
 
Hssc i objective workbook
Hssc i objective workbookHssc i objective workbook
Hssc i objective workbook
 
VCR Presentation Jessup
VCR Presentation JessupVCR Presentation Jessup
VCR Presentation Jessup
 
hivve.me למידה שיתופית
hivve.me   למידה שיתופיתhivve.me   למידה שיתופית
hivve.me למידה שיתופית
 
Apac Digital Trends May 2015
Apac Digital Trends May 2015Apac Digital Trends May 2015
Apac Digital Trends May 2015
 
For reprint
For reprintFor reprint
For reprint
 
MATT CV ROEVIN
MATT CV ROEVINMATT CV ROEVIN
MATT CV ROEVIN
 
ankita cv final (2)
ankita cv final (2)ankita cv final (2)
ankita cv final (2)
 
JessupPTRComprehensiveAssignmentPt2
JessupPTRComprehensiveAssignmentPt2JessupPTRComprehensiveAssignmentPt2
JessupPTRComprehensiveAssignmentPt2
 
Seniors' Stories event - keepsake booklet
Seniors' Stories event - keepsake bookletSeniors' Stories event - keepsake booklet
Seniors' Stories event - keepsake booklet
 
hivve.me - Collaborative messeneger
hivve.me - Collaborative messeneger hivve.me - Collaborative messeneger
hivve.me - Collaborative messeneger
 

Similar to Low Latency Streaming Data Processing in Hadoop

Apache flume - an Introduction
Apache flume - an IntroductionApache flume - an Introduction
Apache flume - an IntroductionErik Schmiegelow
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Cleveland HUG - Storm
Cleveland HUG - StormCleveland HUG - Storm
Cleveland HUG - Stormjustinjleet
 
Introduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationIntroduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationKnoldus Inc.
 
Modern Distributed Messaging and RPC
Modern Distributed Messaging and RPCModern Distributed Messaging and RPC
Modern Distributed Messaging and RPCMax Alexejev
 
From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...P. Taylor Goetz
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.pptvijayapraba1
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDataWorks Summit
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesDavid Martínez Rego
 

Similar to Low Latency Streaming Data Processing in Hadoop (20)

Apache flume - an Introduction
Apache flume - an IntroductionApache flume - an Introduction
Apache flume - an Introduction
 
Apache storm
Apache stormApache storm
Apache storm
 
Storm
StormStorm
Storm
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Mhug apache storm
Mhug apache stormMhug apache storm
Mhug apache storm
 
Follow the White Rabbit - Message Queues with PHP
Follow the White Rabbit - Message Queues with PHPFollow the White Rabbit - Message Queues with PHP
Follow the White Rabbit - Message Queues with PHP
 
Apache Spark Components
Apache Spark ComponentsApache Spark Components
Apache Spark Components
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Cleveland HUG - Storm
Cleveland HUG - StormCleveland HUG - Storm
Cleveland HUG - Storm
 
Introduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationIntroduction to Kafka Streams Presentation
Introduction to Kafka Streams Presentation
 
From Device to Data Center to Insights
From Device to Data Center to InsightsFrom Device to Data Center to Insights
From Device to Data Center to Insights
 
Modern Distributed Messaging and RPC
Modern Distributed Messaging and RPCModern Distributed Messaging and RPC
Modern Distributed Messaging and RPC
 
From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analytics
 
Apache Storm Internals
Apache Storm InternalsApache Storm Internals
Apache Storm Internals
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 

Recently uploaded

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 

Recently uploaded (20)

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 

Low Latency Streaming Data Processing in Hadoop

  • 1. Low-Latency Streaming Data Processing in Hadoop InSemble Inc. http://www.insemble.com
  • 2. Agenda Reference Architecture for Low Latency Streaming1 Storm4 Kafka3 Flume2 Demo5
  • 3. Hadoop Ecosystem Source: Apache Hadoop Documentation
  • 6. Real time Stream Processing Architecture with Hadoop
  • 7. Flume Architecture • Distributed system for collecting and aggregating from multiple data stores to a centralized data store • Agent is a JVM that hosts the Flume components • Channel will store message until picked by a sink • Different types of Flume sources • Source and Sink are decoupled
  • 10. Kafka Introduction • Messaging System which is distributed, partitioned and replicated • Kafka brokers run as a cluster • Producers and Consumers can be written in any language
  • 11. Topic • Ordered, immutable sequence numbers • Retains messages until a period of time • “Offset” of where they are is controlled by the consumer • Each partition is replicated and has “leader” and 0 or more “follower”. R/W only done on leader
  • 12. Producers and Consumers • Producer controls which partition messages goes to • Supports both Queuing and Pub/Sub – Abstraction called Consumer group • Ordering within Partition – Ordering for subscriber has to be done with only one subscriber to that partition
  • 13. Storm Introduction • Distributed real time computational system – Process unbounded streams of data – Can use multiple programming languages – Scalable, fault-tolerant and guarantees that data will be processed • Use Cases – Real time analytics, online machine learning – Continuous Computation – Distributed RPC – ETL • Concepts – Topology – Spouts – Bolts
  • 14. Concepts • Storm Cluster – Master node(Nimbus) • Distributing code • Assigns tasks to machines • Monitors for failures – Worker nodes(Supervisor) • Starts/stops worker processes • Each worker process executes subset of a topology – Zookeeper • Coordinates between Nimbus and Supervisors • Nimbus and Supervisors completely stateless • State maintained by Zookeeper or local disks
  • 15. Details • Stream – Unbounded sequence of tuples • Spout(write logic) – Source of stream. Emits tuples • Bolt(write logic) – Processes streams and emits tuples • Topology – DAG of spouts and bolts – Submit a topology to a Storm cluster – Each node runs in parallel and parallelism is controlled
  • 16. Stream groupings • Tells a topology how to send tuples between two components • Since tasks are executed in parallel, how do we control which tasks the tuples are being sent to
  • 17. Demo - Twitter TopN Trending Topic • Use Flume Twitter Source to ingest data and publish event to Kafka topic • Use Storm as an Real-Time event processing system to calculate TopN trending topic • Use Redis to store the TopN Result • Use Node.js/JQuery for visualization
  • 18. Flow Chart Twitter Twitter Source Flume Agent Mem Channel Kafka Sink KafkaKafka SpoutParse Twitter BoltCount Bolt TopN Ranker Bolt Report Bolt Storm RedisNode.js + JQuery Twitter Source Mem Channel Kafka Sink
  • 19. Flume Agent — Source
  • 20. Flume Agent — Channel