SlideShare a Scribd company logo
1 of 14
Rahul Jain
Software Engineer
www.linkedin.com/in/rahuldausa
   Why Kafka
   Introduction
   Design
   Q&A
*Apache ActiveMQ, JBoss HornetQ, Zero MQ, RabbitMQ are respective brands of Apache Software Foundation,
JBoss Inc, iMatix Corporation and Vmware Inc.
   Transportation of logs
   Activity Stream in Real time.
   Collection of Performance Metrics
    ◦ CPU/IO/Memory usage
    ◦ Application Specific
        Time taken to load a web-page.
        Time taken by Multiple Services while building a web-page.
        No of requests.
        No of hits on a particular page/url.
   Scalable: Need to be Highly Scalable. A lot         of Data. It can be
    billions of message.
   Reliability     of messages, What If, I loose a small no. of
    messages. Is it fine with me ?.
   Distributed : Multiple Producers, Multiple Consumers
   High-throughput: Does not require to have JMS Standards,
    as it may be overkill for some use-cases like transportation of logs.
    ◦ As per JMS, each message has to be acknowledged back.
    ◦ Exactly one delivery guarantee requires two-phase commit.
   An Apache Project, initially developed by
    LinkedIn's SNA team.
   A High-throughput distributed Publish-
    Subscribe based messaging system.
   A Kind of Data Pipeline
   Written in Scala.
   Does not follow JMS Standards, neither uses
    JMS APIs.
   Supports both queue and topic semantics.
Credit : http://kafka.apache.org/design.html
Handshake
Producer                Zookeeper

                                                     Consumer




                                      Coordination
Producer
                       Kafka Broker


                          .
                          .
Producer                  .                          Consumer
                          .
                          .


Producer               Kafka Broker
Coordination


           Handshake
                                                   Store Consumed Offset
Producer                    Zookeeper              and Watch for Cluster        Consumer 1
                                                   event                         (groupId1)



Producer
                            Kafka Broker
                            (Partition 1)
                                                                                Consumer 2
                                 .                                               (groupId1)
                                 .
Producer                         .
                                 .
                                 .


Producer                     Kafka Broker
                             (Partition 2)                                  *   Consumer 3
                                                                                 (groupId1)



                       * Consumer 3 would not receive any data, as number
                       of consumers are more than number of partitions.
   Filesystem Cache
   Zero-copy transfer of messages
   Batching of Messages
   Batch Compression
   Automatic Producer Load balancing.
   Broker does not Push messages to Consumer,
    Consumer Polls messages from Broker.
   And Some others.
        Cluster formation of Broker/Consumer using Zookeeper, So on the fly more consumer, broker
         can be introduced. The new cluster rebalancing will be taken care by Zookeeper
        Data is persisted in broker and is not removed on consumption (till retention period), so if one
         consumer fails while consuming, same message can be re-consume again later from broker.
        Simplified storage mechanism for message, not for each message per consumer.
Producer Performance                                               Consumer Performance


Credit : http://research.microsoft.com/en-us/UM/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
Credit: https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
Apache kafka

More Related Content

What's hot

What's hot (20)

Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Kafka pub sub demo
Kafka pub sub demoKafka pub sub demo
Kafka pub sub demo
 
Kafka basics
Kafka basicsKafka basics
Kafka basics
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Messaging queue - Kafka
Messaging queue - KafkaMessaging queue - Kafka
Messaging queue - Kafka
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
 
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
 

Similar to Apache kafka

Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Joe Stein
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Shameera Rathnayaka
 
Consuming, providing and publishing Web Services
Consuming, providing and publishing Web ServicesConsuming, providing and publishing Web Services
Consuming, providing and publishing Web Services
Ioannis Baltopoulos
 

Similar to Apache kafka (20)

Kafka RealTime Streaming
Kafka RealTime StreamingKafka RealTime Streaming
Kafka RealTime Streaming
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code Meetup
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
BDW Chicago 2016 - Jayesh Thakrar, Sr. Software Engineer, Conversant - Data...
BDW Chicago 2016 -  Jayesh Thakrar, Sr. Software Engineer, Conversant -  Data...BDW Chicago 2016 -  Jayesh Thakrar, Sr. Software Engineer, Conversant -  Data...
BDW Chicago 2016 - Jayesh Thakrar, Sr. Software Engineer, Conversant - Data...
 
Flume-Cassandra Log Processor
Flume-Cassandra Log ProcessorFlume-Cassandra Log Processor
Flume-Cassandra Log Processor
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
 
KAFKA Quickstart
KAFKA QuickstartKAFKA Quickstart
KAFKA Quickstart
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Session 23 - Kafka and Zookeeper
Session 23 - Kafka and ZookeeperSession 23 - Kafka and Zookeeper
Session 23 - Kafka and Zookeeper
 
Fault Tolerance with Kafka
Fault Tolerance with KafkaFault Tolerance with Kafka
Fault Tolerance with Kafka
 
Consuming, providing and publishing Web Services
Consuming, providing and publishing Web ServicesConsuming, providing and publishing Web Services
Consuming, providing and publishing Web Services
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Kafka Deep Dive
Kafka Deep DiveKafka Deep Dive
Kafka Deep Dive
 
Kafka overview
Kafka overviewKafka overview
Kafka overview
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
 

More from Rahul Jain

Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
Rahul Jain
 

More from Rahul Jain (14)

Flipkart Strategy Analysis and Recommendation
Flipkart Strategy Analysis and RecommendationFlipkart Strategy Analysis and Recommendation
Flipkart Strategy Analysis and Recommendation
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache Solr
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Introduction to Scala
Introduction to ScalaIntroduction to Scala
Introduction to Scala
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 
Hibernate tutorial for beginners
Hibernate tutorial for beginnersHibernate tutorial for beginners
Hibernate tutorial for beginners
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Apache kafka

  • 2. Why Kafka  Introduction  Design  Q&A
  • 3. *Apache ActiveMQ, JBoss HornetQ, Zero MQ, RabbitMQ are respective brands of Apache Software Foundation, JBoss Inc, iMatix Corporation and Vmware Inc.
  • 4.
  • 5. Transportation of logs  Activity Stream in Real time.  Collection of Performance Metrics ◦ CPU/IO/Memory usage ◦ Application Specific  Time taken to load a web-page.  Time taken by Multiple Services while building a web-page.  No of requests.  No of hits on a particular page/url.
  • 6. Scalable: Need to be Highly Scalable. A lot of Data. It can be billions of message.  Reliability of messages, What If, I loose a small no. of messages. Is it fine with me ?.  Distributed : Multiple Producers, Multiple Consumers  High-throughput: Does not require to have JMS Standards, as it may be overkill for some use-cases like transportation of logs. ◦ As per JMS, each message has to be acknowledged back. ◦ Exactly one delivery guarantee requires two-phase commit.
  • 7. An Apache Project, initially developed by LinkedIn's SNA team.  A High-throughput distributed Publish- Subscribe based messaging system.  A Kind of Data Pipeline  Written in Scala.  Does not follow JMS Standards, neither uses JMS APIs.  Supports both queue and topic semantics.
  • 9. Handshake Producer Zookeeper Consumer Coordination Producer Kafka Broker . . Producer . Consumer . . Producer Kafka Broker
  • 10. Coordination Handshake Store Consumed Offset Producer Zookeeper and Watch for Cluster Consumer 1 event (groupId1) Producer Kafka Broker (Partition 1) Consumer 2 . (groupId1) . Producer . . . Producer Kafka Broker (Partition 2) * Consumer 3 (groupId1) * Consumer 3 would not receive any data, as number of consumers are more than number of partitions.
  • 11. Filesystem Cache  Zero-copy transfer of messages  Batching of Messages  Batch Compression  Automatic Producer Load balancing.  Broker does not Push messages to Consumer, Consumer Polls messages from Broker.  And Some others.  Cluster formation of Broker/Consumer using Zookeeper, So on the fly more consumer, broker can be introduced. The new cluster rebalancing will be taken care by Zookeeper  Data is persisted in broker and is not removed on consumption (till retention period), so if one consumer fails while consuming, same message can be re-consume again later from broker.  Simplified storage mechanism for message, not for each message per consumer.
  • 12. Producer Performance Consumer Performance Credit : http://research.microsoft.com/en-us/UM/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf