SlideShare a Scribd company logo
1 of 25
Download to read offline
Kinesis vs. Kafka –
Kafka Deep Dive
Yifeng Jiang
Solutions Engineer, Hortonworks
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
自己紹介
蒋  逸峰  (Yifeng  Jiang)
•  Solutions  Engineer,  Hortonworks
•  HBase  book  author
•  ⽇日本に来て10年年経ちました…
•  趣味は⼭山登り
•  Twitter:  @uprush
About Hortonworks
Customer Momentum
•  556 customers (as of August 5, 2015)
•  119 customers added in Q2 2015
•  Publicly traded on NASDAQ: HDP
Hortonworks Data Platform
•  Completely open multi-tenant platform
for any app and any data
•  Consistent enterprise services for security,
operations, and governance
Partner for Customer Success
•  Leader in open-source community, focused on
innovation to meet enterprise needs
•  Unrivaled Hadoop support subscriptions
Founded in 2011
Original 24 architects, developers,
operators of Hadoop from Yahoo!
740+
E M P L O Y E E S
1350+
E C O S Y S T E M
PA R T N E R S
Hortonworks Data Plateform (HDP)
Deploy on premises and cloud
Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kinesis vs. Kafka
Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Amazon Kinesis -- Introduction
Amazon Kinesis is a fully managed, cloud-based service for real-time data
processing over large, distributed data streams.
Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Kafka -- Introduction
Messaging systems
Real-time
Scalable to handle large data volume
Low Latency
Fault tolerant
Originated at LinkedIn
Aimed at solving data movement across systems
Scala and Java
Open Source (Apache 2.0)
Adapted at many companies
Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kinesis vs. Kafka – Future
Similar Futures
•  Messaging system for large scale
real-time data processing
•  High performance, highly scalable,
low latency
•  Fault tolerant
Difference
•  Full managed cloud service vs. OSS
•  Data durability and performance
trade off
•  Interface
•  AWS service integration vs. OSS or
single platform (e.g., HDP)
integration
Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kinesis vs. Kafka – Data Durability
Kinesis
•  Synchronously replicates data
across three facilities
•  High durability for free
Kafka
•  Replication across servers in the
same DC/AZ. Configurable min # in-
sync replica and ACKs.
•  Asynchronously mirror data across
clusters across datacenters / AZs
Performance trade off
Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kinesis vs. Kafka – Interface
Kinesis
•  REST only
•  Client library wraps REST API
Kafka
•  Low level API
•  REST API available (wrapping low
level API).
Impact throughput and latency
Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kinesis vs. Kafka – Processing
Kafka
•  Custom consumers
•  Event monitoring and alerting use case
•  Strom
•  Fraud detection, Simple aggregation
•  Spark Streaming / Storm Trident
•  Micro-batch, near real-time
•  Camus
•  Batch hadoop ingestion
Kinesis
•  KCL applications on EC2
•  Storm
•  Spark streaming
•  EMR for batch ingestion, e.g., write to S3
Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kinesis vs. Kafka – Deployment & Operation
Kafka
•  HDP: almost one-click deploy with Ambari
•  Basic monitoring with Ambari
•  Expand and rebalance: partition assignment
and consumer rebalance
•  Zookeeper can also be managed by Ambari
Kinesis
•  Fully managed, one-click deploy
•  CloudWatch monitoring
•  Expand and rebalance: resharding a stream
•  Easy operation
Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kafka Deep Dive
Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kafka – Concepts
* ZK is used by Broker, Consumer
Broker-0
P0.R0 (L)
P1.R0
Broker-1
P0.R1
P2.R1 (L)
Broker-2
P1.R2 (L)
P2.R2
Topic with 3 partition and Replica factor 2
Producer
Consumer
Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kafka -- Concepts
Topics
Partitions
•  Offset
•  Ordered
Replication
•  Prevents data loss
•  Never read or written to
•  Does not increase throughput
•  Tolerates Replica-1 failures
$[ambari-­‐qa@c6401	
  bin]$	
  kafka-­‐topics.sh	
  -­‐-­‐zookeeper	
  c6401:2181	
  -­‐-­‐describe	
  -­‐-­‐topic	
  page_visits	
  
Topic:page_visits 	
  PartitionCount:4 	
  ReplicationFactor:2 	
  Configs:	
  
	
  Topic:	
  page_visits	
  Partition:	
  0 	
  Leader:	
  1	
   	
  Replicas:	
  0,1	
  	
   	
  Isr:	
  1,0	
  
	
  Topic:	
  page_visits	
  Partition:	
  1 	
  Leader:	
  0	
   	
  Replicas:	
  1,0	
  	
  	
  	
  	
  	
  Isr:	
  0,1	
  
	
  Topic:	
  page_visits	
  Partition:	
  2 	
  Leader:	
  1	
   	
  Replicas:	
  0,1	
   	
  Isr:	
  1,0	
  
	
  Topic:	
  page_visits	
  Partition:	
  3 	
  Leader:	
  0	
   	
  Replicas:	
  1,0	
   	
  Isr:	
  0,1	
  
Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kafka Broker
Store messages (logs) on local disk
•  Messages are appended to log file
•  Log Retention – time and size based
Controller
•  Cluster management
•  Runs on each broker machine
•  One leader, others follower
Leader Partition
•  Broker that is the leader for certain partitions
Use ZK for coordination
Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kafka Producer
New Producer API in 0.8.2
•  Kafka-client.jar
•  New Java API
•  Default Asynchronous mode
Create a new message and publish to a Topic and Partition
•  Takes topic, value and optional key and partition id
Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kafka Producer API (0.8.2) – Cont.
•  Original messages are partitioned and then split into batches
•  Each split batch is sent to leader broker (and then replicated to ISR)
•  Each send is acknowledged by either leader broker and/or all ISR
p3 p2 p1 p2 p1m5 m4 m3 m2 m1
Broker-0
P0.R0 (L)
P1.R0
Broker-1
P0.R1
P2.R1 (L)
Broker-2
P1.R2 (L)
P2.R2
Topic with 3 partition and Replica factor 2
App Producer
Lib
partitioner Split
batch
Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kafka Consumer
Read data from Kafka brokers
•  JVM APIs supported out of box by project
•  Consumers pull data from brokers
•  Consumer apps have to keep track of the topic-partition offset read
Consumer API
Simple API
•  Greater control over consumption of topic/partitions
•  Consumer apps will be complex as they need to handle things like offset handling.
High-level
•  Uses Simple API internally
•  Consumer apps will be simple to implement as offset tracking is out of box
•  But not flexible in terms of what partitions to read.
Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kafka Consumer – Cont.
Consumer Groups
•  Allow multiple hosts to form a group to access a topic
•  Consume hosts join a group by using same group.id
•  Guarantees a message is read by only one consumer in a group
•  Partitions are assigned to consumers in a group
•  A consumer node may get one or more partitions
•  But one partition is assigned to only one consumer host
•  Order of the message is guaranteed with in a partition
•  Max parallelism – determined by topic partitions
•  More consumers than partitions – some consumers will be idle
P0
Broker-0
P3
Broker-1
P1 P2
C1 C2
Consumer Group - 1
C3 C4
Consumer Group - 2
C5 C6
Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kafka – Why Kafka is fast
Fast Writes
Writes are appends to file system
Partitions improve performance and throughput
Uses OS buffer cache
Lots of memory on the machine helps
Fast Reads
Memory mapped files
File descriptor to socket descriptor efficient transfer
Linux sendfile(), JVM transferTo() implementation
Why Performance?
Disk flushes are delayed
Durability is guaranteed via replication
When consumers are reading the latest data, it reads from page cache
Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kafka – Cluster Mirroring
Mirror Maker
•  Mirror data across clusters even in different DCs / AZs
•  Stand alone tool uses Consumer and Producer API
•  Reads from one or more source cluster and writes to a target cluster
•  Whitelist/blacklist topic
Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kafka REST Interface
REST Interface
•  Wraps Producer and Consumer API
Performance Overhead
•  Two hops
•  Extra REST server to maintain
•  Parse JSON payload
Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kinesis vs. Kafka -- Terms
Amazon Kinesis Apache Kafka
Streams Topics
Data Records Messages
Producers Producers
Kinesis Producer Library Producer API
Consumers Consumers
Kinesis Applications Consumer Applications
Kinesis Client Library Consumer – High level API
N/A Consumer – Simple API
Shards Partitions
N/A (built in MD5 hash on partition
keys)
Custom partitioner
Sequence Numbers Offset
Application Name Consumer Group ID
Page 25 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage 25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Tweet: #hadooproadshow
More About Apache Kafka:
http://hortonworks.com/hadoop/kafka/

More Related Content

What's hot

Real-Time Streaming Data Solution on AWS with Beeswax
Real-Time Streaming Data Solution on AWS with BeeswaxReal-Time Streaming Data Solution on AWS with Beeswax
Real-Time Streaming Data Solution on AWS with BeeswaxAmazon Web Services
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSAmazon Web Services
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014Amazon Web Services
 
(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduce(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduceAmazon Web Services
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauSam Palani
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudAmazon Web Services
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformconfluent
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterconfluent
 
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...StreamNative
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJim Plush
 
Data Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache KafkaData Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache Kafkaconfluent
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBconfluent
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Amazon Web Services
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSAmazon Web Services
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Amazon Web Services
 

What's hot (20)

Amazon Redshift Deep Dive
Amazon Redshift Deep Dive Amazon Redshift Deep Dive
Amazon Redshift Deep Dive
 
Real-Time Streaming Data Solution on AWS with Beeswax
Real-Time Streaming Data Solution on AWS with BeeswaxReal-Time Streaming Data Solution on AWS with Beeswax
Real-Time Streaming Data Solution on AWS with Beeswax
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWS
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
 
(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduce(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduce
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platform
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matter
 
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Data Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache KafkaData Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache Kafka
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWS
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Big Data Tools in AWS
Big Data Tools in AWSBig Data Tools in AWS
Big Data Tools in AWS
 

Viewers also liked

Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaJoe Stein
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaJoe Stein
 
Kafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier ArchitecturesKafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier ArchitecturesTodd Palino
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignMichael Noll
 

Viewers also liked (6)

Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Kafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier ArchitecturesKafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier Architectures
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 

Similar to Kinesis vs-kafka-and-kafka-deep-dive

Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks
 
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...StreamNative
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
LinuxONE cavemen mmit 20160505 v1.0
LinuxONE cavemen mmit 20160505 v1.0LinuxONE cavemen mmit 20160505 v1.0
LinuxONE cavemen mmit 20160505 v1.0Marcel Mitran
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleYifeng Jiang
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerDataWorks Summit
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Timothy Spann
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesTimothy Spann
 
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsTimothy Spann
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseDataWorks Summit
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...LINE Corporation
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksData Con LA
 

Similar to Kinesis vs-kafka-and-kafka-deep-dive (20)

Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
 
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
LinuxONE cavemen mmit 20160505 v1.0
LinuxONE cavemen mmit 20160505 v1.0LinuxONE cavemen mmit 20160505 v1.0
LinuxONE cavemen mmit 20160505 v1.0
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data Lakes
 
intro-kafka
intro-kafkaintro-kafka
intro-kafka
 
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
 

More from Yifeng Jiang

Hive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsHive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsYifeng Jiang
 
introduction-to-apache-kafka
introduction-to-apache-kafkaintroduction-to-apache-kafka
introduction-to-apache-kafkaYifeng Jiang
 
Hive2 Introduction -- Interactive SQL for Big Data
Hive2 Introduction -- Interactive SQL for Big DataHive2 Introduction -- Interactive SQL for Big Data
Hive2 Introduction -- Interactive SQL for Big DataYifeng Jiang
 
Introduction to Streaming Analytics Manager
Introduction to Streaming Analytics ManagerIntroduction to Streaming Analytics Manager
Introduction to Streaming Analytics ManagerYifeng Jiang
 
HDF 3.0 IoT Platform for Everyone
HDF 3.0 IoT Platform for EveryoneHDF 3.0 IoT Platform for Everyone
HDF 3.0 IoT Platform for EveryoneYifeng Jiang
 
Hortonworks Data Cloud for AWS 1.11 Updates
Hortonworks Data Cloud for AWS 1.11 UpdatesHortonworks Data Cloud for AWS 1.11 Updates
Hortonworks Data Cloud for AWS 1.11 UpdatesYifeng Jiang
 
Introduction to Hortonworks Data Cloud for AWS
Introduction to Hortonworks Data Cloud for AWSIntroduction to Hortonworks Data Cloud for AWS
Introduction to Hortonworks Data Cloud for AWSYifeng Jiang
 
Real-time Analytics in Financial
Real-time Analytics in FinancialReal-time Analytics in Financial
Real-time Analytics in FinancialYifeng Jiang
 
sparksql-hive-bench-by-nec-hwx-at-hcj16
sparksql-hive-bench-by-nec-hwx-at-hcj16sparksql-hive-bench-by-nec-hwx-at-hcj16
sparksql-hive-bench-by-nec-hwx-at-hcj16Yifeng Jiang
 
Yifeng hadoop-present-public
Yifeng hadoop-present-publicYifeng hadoop-present-public
Yifeng hadoop-present-publicYifeng Jiang
 
Hive-sub-second-sql-on-hadoop-public
Hive-sub-second-sql-on-hadoop-publicHive-sub-second-sql-on-hadoop-public
Hive-sub-second-sql-on-hadoop-publicYifeng Jiang
 
Yifeng spark-final-public
Yifeng spark-final-publicYifeng spark-final-public
Yifeng spark-final-publicYifeng Jiang
 
Hive present-and-feature-shanghai
Hive present-and-feature-shanghaiHive present-and-feature-shanghai
Hive present-and-feature-shanghaiYifeng Jiang
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopYifeng Jiang
 
Apache Hiveの今とこれから
Apache Hiveの今とこれからApache Hiveの今とこれから
Apache Hiveの今とこれからYifeng Jiang
 
Hadoop Trends & Hadoop on EC2
Hadoop Trends & Hadoop on EC2Hadoop Trends & Hadoop on EC2
Hadoop Trends & Hadoop on EC2Yifeng Jiang
 
Apache Ambari Overview -- Hadoop for Everyone
Apache Ambari Overview -- Hadoop for EveryoneApache Ambari Overview -- Hadoop for Everyone
Apache Ambari Overview -- Hadoop for EveryoneYifeng Jiang
 

More from Yifeng Jiang (20)

Hive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsHive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfs
 
introduction-to-apache-kafka
introduction-to-apache-kafkaintroduction-to-apache-kafka
introduction-to-apache-kafka
 
Hive2 Introduction -- Interactive SQL for Big Data
Hive2 Introduction -- Interactive SQL for Big DataHive2 Introduction -- Interactive SQL for Big Data
Hive2 Introduction -- Interactive SQL for Big Data
 
Introduction to Streaming Analytics Manager
Introduction to Streaming Analytics ManagerIntroduction to Streaming Analytics Manager
Introduction to Streaming Analytics Manager
 
HDF 3.0 IoT Platform for Everyone
HDF 3.0 IoT Platform for EveryoneHDF 3.0 IoT Platform for Everyone
HDF 3.0 IoT Platform for Everyone
 
Hortonworks Data Cloud for AWS 1.11 Updates
Hortonworks Data Cloud for AWS 1.11 UpdatesHortonworks Data Cloud for AWS 1.11 Updates
Hortonworks Data Cloud for AWS 1.11 Updates
 
Spark Security
Spark SecuritySpark Security
Spark Security
 
Introduction to Hortonworks Data Cloud for AWS
Introduction to Hortonworks Data Cloud for AWSIntroduction to Hortonworks Data Cloud for AWS
Introduction to Hortonworks Data Cloud for AWS
 
Real-time Analytics in Financial
Real-time Analytics in FinancialReal-time Analytics in Financial
Real-time Analytics in Financial
 
sparksql-hive-bench-by-nec-hwx-at-hcj16
sparksql-hive-bench-by-nec-hwx-at-hcj16sparksql-hive-bench-by-nec-hwx-at-hcj16
sparksql-hive-bench-by-nec-hwx-at-hcj16
 
Nifi workshop
Nifi workshopNifi workshop
Nifi workshop
 
Yifeng hadoop-present-public
Yifeng hadoop-present-publicYifeng hadoop-present-public
Yifeng hadoop-present-public
 
Hive-sub-second-sql-on-hadoop-public
Hive-sub-second-sql-on-hadoop-publicHive-sub-second-sql-on-hadoop-public
Hive-sub-second-sql-on-hadoop-public
 
Yifeng spark-final-public
Yifeng spark-final-publicYifeng spark-final-public
Yifeng spark-final-public
 
Hive present-and-feature-shanghai
Hive present-and-feature-shanghaiHive present-and-feature-shanghai
Hive present-and-feature-shanghai
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
 
Apache Hiveの今とこれから
Apache Hiveの今とこれからApache Hiveの今とこれから
Apache Hiveの今とこれから
 
HDFS Deep Dive
HDFS Deep DiveHDFS Deep Dive
HDFS Deep Dive
 
Hadoop Trends & Hadoop on EC2
Hadoop Trends & Hadoop on EC2Hadoop Trends & Hadoop on EC2
Hadoop Trends & Hadoop on EC2
 
Apache Ambari Overview -- Hadoop for Everyone
Apache Ambari Overview -- Hadoop for EveryoneApache Ambari Overview -- Hadoop for Everyone
Apache Ambari Overview -- Hadoop for Everyone
 

Recently uploaded

Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencessuser9e7c64
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 

Recently uploaded (20)

Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conference
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 

Kinesis vs-kafka-and-kafka-deep-dive

  • 1. Kinesis vs. Kafka – Kafka Deep Dive Yifeng Jiang Solutions Engineer, Hortonworks © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  • 2. 自己紹介 蒋  逸峰  (Yifeng  Jiang) •  Solutions  Engineer,  Hortonworks •  HBase  book  author •  ⽇日本に来て10年年経ちました… •  趣味は⼭山登り •  Twitter:  @uprush
  • 3. About Hortonworks Customer Momentum •  556 customers (as of August 5, 2015) •  119 customers added in Q2 2015 •  Publicly traded on NASDAQ: HDP Hortonworks Data Platform •  Completely open multi-tenant platform for any app and any data •  Consistent enterprise services for security, operations, and governance Partner for Customer Success •  Leader in open-source community, focused on innovation to meet enterprise needs •  Unrivaled Hadoop support subscriptions Founded in 2011 Original 24 architects, developers, operators of Hadoop from Yahoo! 740+ E M P L O Y E E S 1350+ E C O S Y S T E M PA R T N E R S
  • 4. Hortonworks Data Plateform (HDP) Deploy on premises and cloud
  • 5. Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kinesis vs. Kafka
  • 6. Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Amazon Kinesis -- Introduction Amazon Kinesis is a fully managed, cloud-based service for real-time data processing over large, distributed data streams.
  • 7. Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Kafka -- Introduction Messaging systems Real-time Scalable to handle large data volume Low Latency Fault tolerant Originated at LinkedIn Aimed at solving data movement across systems Scala and Java Open Source (Apache 2.0) Adapted at many companies
  • 8. Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kinesis vs. Kafka – Future Similar Futures •  Messaging system for large scale real-time data processing •  High performance, highly scalable, low latency •  Fault tolerant Difference •  Full managed cloud service vs. OSS •  Data durability and performance trade off •  Interface •  AWS service integration vs. OSS or single platform (e.g., HDP) integration
  • 9. Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kinesis vs. Kafka – Data Durability Kinesis •  Synchronously replicates data across three facilities •  High durability for free Kafka •  Replication across servers in the same DC/AZ. Configurable min # in- sync replica and ACKs. •  Asynchronously mirror data across clusters across datacenters / AZs Performance trade off
  • 10. Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kinesis vs. Kafka – Interface Kinesis •  REST only •  Client library wraps REST API Kafka •  Low level API •  REST API available (wrapping low level API). Impact throughput and latency
  • 11. Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kinesis vs. Kafka – Processing Kafka •  Custom consumers •  Event monitoring and alerting use case •  Strom •  Fraud detection, Simple aggregation •  Spark Streaming / Storm Trident •  Micro-batch, near real-time •  Camus •  Batch hadoop ingestion Kinesis •  KCL applications on EC2 •  Storm •  Spark streaming •  EMR for batch ingestion, e.g., write to S3
  • 12. Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kinesis vs. Kafka – Deployment & Operation Kafka •  HDP: almost one-click deploy with Ambari •  Basic monitoring with Ambari •  Expand and rebalance: partition assignment and consumer rebalance •  Zookeeper can also be managed by Ambari Kinesis •  Fully managed, one-click deploy •  CloudWatch monitoring •  Expand and rebalance: resharding a stream •  Easy operation
  • 13. Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka Deep Dive
  • 14. Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka – Concepts * ZK is used by Broker, Consumer Broker-0 P0.R0 (L) P1.R0 Broker-1 P0.R1 P2.R1 (L) Broker-2 P1.R2 (L) P2.R2 Topic with 3 partition and Replica factor 2 Producer Consumer
  • 15. Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka -- Concepts Topics Partitions •  Offset •  Ordered Replication •  Prevents data loss •  Never read or written to •  Does not increase throughput •  Tolerates Replica-1 failures $[ambari-­‐qa@c6401  bin]$  kafka-­‐topics.sh  -­‐-­‐zookeeper  c6401:2181  -­‐-­‐describe  -­‐-­‐topic  page_visits   Topic:page_visits  PartitionCount:4  ReplicationFactor:2  Configs:    Topic:  page_visits  Partition:  0  Leader:  1    Replicas:  0,1      Isr:  1,0    Topic:  page_visits  Partition:  1  Leader:  0    Replicas:  1,0            Isr:  0,1    Topic:  page_visits  Partition:  2  Leader:  1    Replicas:  0,1    Isr:  1,0    Topic:  page_visits  Partition:  3  Leader:  0    Replicas:  1,0    Isr:  0,1  
  • 16. Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka Broker Store messages (logs) on local disk •  Messages are appended to log file •  Log Retention – time and size based Controller •  Cluster management •  Runs on each broker machine •  One leader, others follower Leader Partition •  Broker that is the leader for certain partitions Use ZK for coordination
  • 17. Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka Producer New Producer API in 0.8.2 •  Kafka-client.jar •  New Java API •  Default Asynchronous mode Create a new message and publish to a Topic and Partition •  Takes topic, value and optional key and partition id
  • 18. Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka Producer API (0.8.2) – Cont. •  Original messages are partitioned and then split into batches •  Each split batch is sent to leader broker (and then replicated to ISR) •  Each send is acknowledged by either leader broker and/or all ISR p3 p2 p1 p2 p1m5 m4 m3 m2 m1 Broker-0 P0.R0 (L) P1.R0 Broker-1 P0.R1 P2.R1 (L) Broker-2 P1.R2 (L) P2.R2 Topic with 3 partition and Replica factor 2 App Producer Lib partitioner Split batch
  • 19. Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka Consumer Read data from Kafka brokers •  JVM APIs supported out of box by project •  Consumers pull data from brokers •  Consumer apps have to keep track of the topic-partition offset read Consumer API Simple API •  Greater control over consumption of topic/partitions •  Consumer apps will be complex as they need to handle things like offset handling. High-level •  Uses Simple API internally •  Consumer apps will be simple to implement as offset tracking is out of box •  But not flexible in terms of what partitions to read.
  • 20. Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka Consumer – Cont. Consumer Groups •  Allow multiple hosts to form a group to access a topic •  Consume hosts join a group by using same group.id •  Guarantees a message is read by only one consumer in a group •  Partitions are assigned to consumers in a group •  A consumer node may get one or more partitions •  But one partition is assigned to only one consumer host •  Order of the message is guaranteed with in a partition •  Max parallelism – determined by topic partitions •  More consumers than partitions – some consumers will be idle P0 Broker-0 P3 Broker-1 P1 P2 C1 C2 Consumer Group - 1 C3 C4 Consumer Group - 2 C5 C6
  • 21. Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka – Why Kafka is fast Fast Writes Writes are appends to file system Partitions improve performance and throughput Uses OS buffer cache Lots of memory on the machine helps Fast Reads Memory mapped files File descriptor to socket descriptor efficient transfer Linux sendfile(), JVM transferTo() implementation Why Performance? Disk flushes are delayed Durability is guaranteed via replication When consumers are reading the latest data, it reads from page cache
  • 22. Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka – Cluster Mirroring Mirror Maker •  Mirror data across clusters even in different DCs / AZs •  Stand alone tool uses Consumer and Producer API •  Reads from one or more source cluster and writes to a target cluster •  Whitelist/blacklist topic
  • 23. Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka REST Interface REST Interface •  Wraps Producer and Consumer API Performance Overhead •  Two hops •  Extra REST server to maintain •  Parse JSON payload
  • 24. Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kinesis vs. Kafka -- Terms Amazon Kinesis Apache Kafka Streams Topics Data Records Messages Producers Producers Kinesis Producer Library Producer API Consumers Consumers Kinesis Applications Consumer Applications Kinesis Client Library Consumer – High level API N/A Consumer – Simple API Shards Partitions N/A (built in MD5 hash on partition keys) Custom partitioner Sequence Numbers Offset Application Name Consumer Group ID
  • 25. Page 25 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage 25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Tweet: #hadooproadshow More About Apache Kafka: http://hortonworks.com/hadoop/kafka/