SlideShare a Scribd company logo
1 of 57
Being Ready for Apache Kafka:
Today’s Ecosystem and Future Roadmap
Michael G. Noll
@miguno
Developer Evangelist, Confluent Inc.
1Apache: Big Data Conference, Budapest, Hungary, September 29, 2015
 Developer Evangelist at Confluent since August ‘15
 Previously Big Data lead at .COM/.NET DNS operator Verisign
 Blogging at http://www.michael-noll.com/ (too little time!)
 PMC member of Apache Storm (too little time!)
 michael@confluent.io
2
 Founded in Fall 2014 by the creators of Apache Kafka
 Headquartered in San Francisco bay area
 We provide a stream data platform based on Kafka
 We contribute a lot to Kafka, obviously 
3
4
?
5
Apache Kafka is the distributed, durable equivalent of Unix pipes.
Use it to connect and compose your large-scale data apps.
this this
$ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt
Example: LinkedIn before Kafka
6
Example: LinkedIn after Kafka
7
DB DB DB
Logs Sensors
Log search
Monitoring
Security
RT analytics
Filter
Transform
Aggregate
Data
Warehouse
Hadoop
HDFS
8
Apache Kafka is a high-throughput distributed messaging system.
“1,100,000,000,000 msg/day, totaling 175+ TB/day” (LinkedIn)
= 3 billion messages since the beginning of this talk
9
Apache Kafka is a publish-subscribe messaging
rethought as a distributed commit log.
Broker
Kafka Cluster
Broker Broker
Broker Broker Broker
Broker Broker Broker
Producer
Producer
Producer
Producer
Producer
Producer
Producer
Producer
Producer
Producer
Producer
Consumer
Producer
Producer
Consumer
Producer
Producer
Consumer
ZooKeeper
10
Apache Kafka is a publish-subscribe messaging
rethought as a distributed commit log.
Topic, e.g. “user_clicks”
NewestOldest
P P
CC
So where can Kafka help me?
Example, anyone?
11
12
13
YOU Why is Kafka a great fit here?
 Scalable Writes
 Scalable Reads
 Low latency
 Time machine
Example: Protecting your infrastructure against DDoS attacks
Kafka powers many use cases
• User behavior, click stream analysis
• Infrastructure monitoring and security, e.g. DDoS detection
• Fraud detection
• Operational telemetry data from mobile devices and sensors
• Analyzing system and app logging data
• Internet of Things (IoT) applications
• …and many more
• Yours is missing? Let me know via michael@confluent.io !
14
15https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
Diverse and rapidly growing user base
across many industries and verticals.
A typical Kafka architecture
Yes, we now begin to approach “production”
16
17
Typical architecture => typical questions
18
Apps that
write to it
Source
systems
Apps that
read from it
Destination
systems
Question 3 Question 4
Data and
schemas
Question 5
Operations
Question 2
Question 6a Question 6b
Question 1
19
Wait a minute!
Kafka core
Question 1 or “What are the upcoming improvements to core Kafka?”
20
Apps that
write to it
Source
systems
Apps that
read from it
Destination
systems
Question 3 Question 4
Data and
schemas
Question 5
Operations
Question 2
Question 6a Question 6b
Kafka
Cluster
Question 1
Kafka core: upcoming changes in 0.9.0
• Kafka 0.9.0 (formerly 0.8.3) expected in November 2015
• ZooKeeper now only required for Kafka brokers
• ZK dependency removed from clients = producers and consumers
• Benefits include less load on ZK, lower operational complexity,
user apps don’t require interacting with ZK anymore
• New, unified consumer Java API
• We consolidated the previous “high-level” and “simple” consumer APIs
• Old APIs still available and not deprecated (yet)
21
New consumer Java API in 0.9.0
22
Configure
Subscribe
Process
Kafka core: upcoming changes in 0.9.0
• Improved data import/export via Copycat
• KIP-26: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767
• Will talk about this later
• Improved security: SSL support for encrypted data transfer
• Yay, finally make your InfoSec team (a bit) happier!
• https://cwiki.apache.org/confluence/display/KAFKA/Deploying+SSL+for+Kafka
• Improved multi-tenancy: quotas aka throttling for Ps and Cs
• KIP-13: https://cwiki.apache.org/confluence/display/KAFKA/KIP-13+-+Quotas
• Quotas are defined per broker, will slow down clients if needed
• Reduces collateral damage caused by misbehaving apps/teams
23
Kafka operations
Question 2 or “How do I deploy, manage, monitor, etc. my Kafka clusters?”
24
Apps that
write to it
Source
systems
Apps that
read from it
Destination
systems
Question 3 Question 4
Data and
schemas
Question 5
Operations
Question 2
Question 6a Question 6b
Question 1
Deploying Kafka
• Hardware recommendations, configuration settings, etc.
• http://docs.confluent.io/current/kafka/deployment.html
• http://kafka.apache.org/documentation.html#hwandos
• Deploying Kafka itself = DIY at the moment
• Packages for Debian and RHEL OS families available via Confluent Platform
• http://www.confluent.io/developer
• Straight-forward to use orchestration tools like Puppet, Ansible
• Also: options for Docker, Mesos, YARN, …
25
Managing Kafka: CLI tools
• Kafka includes a plethora of CLI tools
• Managing topics, controlling replication, status of clients, …
• Can be tricky to understand which tool to use, when, and how
• Helpful pointers:
• https://cwiki.apache.org/confluence/display/KAFKA/System+Tools
• https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools
• KIP-4 will eventually add better management APIs
26
Monitoring Kafka: metrics
• How to monitor
• Usual tools like Graphite, InfluxDB, statsd, Grafana, collectd, diamond
• What to monitor – some key metrics
• Host metrics: CPU, memory, disk I/O and usage, network I/O
• Kafka metrics: consumer lag, replication stats, message latency, Java GC
• ZooKeeper metrics: latency of requests, #outstanding requests
• Kafka exposes many built-in metrics via JMX
• Use e.g. jmxtrans to feed these metrics into Graphite, statsd, etc.
27
Monitoring Kafka: logging
• You can expect lots of logging data for larger Kafka clusters
• Centralized logging services help significantly
• You have one already, right?
• Elasticsearch/Kibana, Splunk, Loggly, …
• Further information about operations and monitoring at:
• http://docs.confluent.io/current/kafka/monitoring.html
• https://www.slideshare.net/miguno/apache-kafka-08-basic-training-
verisign
28
Kafka clients #1
Questions 3+4 or “How can my apps talk to Kafka?”
29
Apps that
write to it
Source
systems
Apps that
read from it
Destination
systems
Question 3 Question 4
Data and
schemas
Question 5
Operations
Question 2
Question 6a Question 6b
Question 1
Recommended* Kafka clients as of today
30
Language Name Link
Java <built-in> http://kafka.apache.org/
C/C++ librdkafka http://github.com/edenhill/librdkafka
Python kafka-python https://github.com/mumrah/kafka-python
Go sarama https://github.com/Shopify/sarama
Node kafka-node https://github.com/SOHU-Co/kafka-node/
Scala reactive kafka https://github.com/softwaremill/reactive-kafka
… … …
*Opinionated! Full list at https://cwiki.apache.org/confluence/display/KAFKA/Clients
Kafka clients: upcoming improvements
• Current problem: only Java client is officially supported
• A lot of effort and duplication for client maintainers to be compatible
with Kafka changes over time (e.g. protocol, ZK for offset management)
• Wait time for users until “their” client library is ready for latest Kafka
• Idea: use librdkafka (C) as the basis for Kafka clients and
provide bindings + idiomatic APIs per target language
• Benefits include:
• Full protocol support, SSL, etc. needs to be implemented only once
• All languages will benefit from the speed of the C implementation
• Of course you are always free to pick your favorite client!
31
Confluent Kafka-REST
• Open source, included in Confluent Platform
https://github.com/confluentinc/kafka-rest/
• Alternative to native clients
• Supports reading and writing data, status info, Avro, etc.
32
# Get a list of topics
$ curl "http://rest-proxy:8082/topics"
[ { "name":"userProfiles", "num_partitions": 3 },
{ "name":"locationUpdates", "num_partitions": 1 } ]
Kafka clients #2
Questions 3+4 or “How can my systems talk to Kafka?”
33
Apps that
write to it
Source
systems
Apps that
read from it
Destination
systems
Question 3 Question 4
Data and
schemas
Question 5
Operations
Question 2
Question 6a Question 6b
Question 1
34
? ?
Data import/export: status quo
• Until now this has been your problem to solve
• Only few tools available, e.g. LinkedIn Camus for Kafka  HDFS export
• Typically a DIY solution using the aforementioned client libs
• Kafka 0.9.0 will introduce Copycat
35
36
Copycat is the I/O redirection in your Unix pipelines.
Use it to get your data into and out of Kafka.
$ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt
this this
Data import/export via Copycat
• Copycat is included in upcoming Kafka 0.9.0
• Federated Copycat “connector” development for e.g. HDFS, JDBC
• Light-weight, scales from simple testing and one-off jobs to large-scale
production scenarios serving an entire organization
• Process management agnostic, hence flexible deployment options
• Examples: Standalone, YARN, Mesos, or your own (e.g. Puppet w/ supervisord)
37
Copycat Copycat
<whatever> <whatever>
Data and schemas
Question 5 or “Je te comprends pas”
38
Apps that
write to it
Source
systems
Apps that
read from it
Destination
systems
Question 3 Question 4
Data and
schemas
Question 5
Operations
Question 2
Question 6a Question 6b
Question 1
39
40
Data and schemas
• Agree on contracts for data just like you do for, say, APIs
• Producers and consumers of data must understand each other
• Free-for-alls typically degenerate quickly into team deathmatches
• Benefit from clear contract, schema evolution, type safety, etc.
• Organizational problem rather than technical
• Hilarious /facepalm moments
• Been there, done that 
• Take a look at Apache Avro, Thrift, Protocol Buffers
• Cf. Avro homepage, https://github.com/miguno/avro-hadoop-starter
41
42
“Alternative” to schemas
43
Example: Avro schema for tweets
username
text
timestamp
44
“Tweet” = <definition>
“UserProfile” = <definition>
“Alert” = <definition>
<data> = <definition>
<data> = <definition>
<data> = <definition>
<data> = <definition>
<data> = <definition>
Schema registry
• Stores and retrieves your schemas
• Cornerstone for building resilient data pipelines
• Viable registry implementation missing until recently
• AVRO-1124 (2012-2014)
• So everyone started to roll their own schema registry
• Again: been there, done that 
• There must be a better way, right?
45
Confluent Schema Registry
• Open source, included in Confluent Platform
https://github.com/confluentinc/schema-registry/
• REST API to store and retrieve schemas etc.
• Integrates with Kafka clients, Kafka REST, Camus, ...
• Can enforce policies for your data, e.g. backwards compatibility
• Still not convinced you need a schema registry?
• http://www.confluent.io/blog/schema-registry-kafka-stream-processing-
yes-virginia-you-really-need-one
46
# List all schema versions registered for topic "foo"
$ curl -X GET -i http://registry:8081/subjects/foo/versions
Stream processing
Question 6 or “How do I actually process my data in Kafka?”
47
Apps that
write to it
Source
systems
Apps that
read from it
Destination
systems
Question 3 Question 4
Data and
schemas
Question 5
Operations
Question 2
Question 6a Question 6b
Question 1
Stream processing
• Currently three main options
• Storm: arguably powering the majority of production deployments
• Spark Streaming: runner-up, but gaining momentum due to “main” Spark
• DIY: write your own using Kafka client libs, typically with a narrower focus
48
49
Some people, when confronted with a
problem to process data in Kafka, think
“I know, I’ll use [ Storm | Spark | … ].”
Now they have two problems.
Stream processing
• Currently three main options
• Storm: arguably powering the majority of production deployments
• Spark Streaming: runner-up, but gaining momentum due to “main” Spark
• DIY: write your own using Kafka client libs, typically with a narrower focus
• Kafka 0.9.0 will introduce Kafka Streams
50
Four!
51
Kafka Streams is the commands in your Unix pipelines.
Use it to transform data stored in Kafka.
this this
$ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt
Kafka Streams
• Kafka Streams included in Kafka 0.9.0
• KIP-28: https://cwiki.apache.org/confluence/display/KAFKA/KIP-28+-+Add+a+processor+client
• No need to run another framework like Storm alongside Kafka
• No need for separate infrastructure and trainings either
• Library rather than framework
• Won’t dictate your deployment, configuration management, packaging, …
• Use it like you’d use Apache Commons, Google Guava, etc.
• Easier to integrate with your existing apps and services
• 100% compatible with Kafka by definition
52
Kafka Streams
• Initial version will support
• Low-level API as well as higher-level API for Java 7+
• Operations such as join/filter/map/…
• Windowing
• Proper time modeling, e.g. event time vs. processing time
• Local state management with persistence and replication
• Schema and Avro support
• And more to come – details will be shared over the next weeks!
53
54
Example of higher-level API (much nicer with Java 8 and lambdas)
map()
filter()
Phew, we made it!
55
Apps that
write to it
Source
systems
Apps that
read from it
Destination
systems
Question 3 Question 4
Data and
schemas
Question 5
Operations
Question 2
Question 6a Question 6b
Question 1
56
Kafka
$ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt
Kafka StreamsCopycat
Want to contribute to Kafka and open source?
57
Join the Kafka community
http://kafka.apache.org/
…in a great team with the creators of Kafka
and also getting paid for it?
Confluent is hiring 
http://confluent.io/
Questions, comments? Tweet with #ApacheBigData and /cc to @ConfluentInc

More Related Content

What's hot

Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Kai Wähner
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Erik Onnen
 

What's hot (20)

Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
 
January 2016 Flink Community Update & Roadmap 2016
January 2016 Flink Community Update & Roadmap 2016January 2016 Flink Community Update & Roadmap 2016
January 2016 Flink Community Update & Roadmap 2016
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
 
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016
 
Performance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsPerformance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data Platforms
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics Revised
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
 
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10  an integration story[Spark Summit EU 2017] Apache spark streaming + kafka 0.10  an integration story
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Operationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML ModelsOperationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML Models
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
 
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNApache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
 
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
 
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformAkka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
 
Flink at netflix paypal speaker series
Flink at netflix   paypal speaker seriesFlink at netflix   paypal speaker series
Flink at netflix paypal speaker series
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 

Viewers also liked

Viewers also liked (6)

Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelines
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story
[Big Data Spain] Apache Spark Streaming + Kafka 0.10:  an Integration Story[Big Data Spain] Apache Spark Streaming + Kafka 0.10:  an Integration Story
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story
 

Similar to Being Ready for Apache Kafka - Apache: Big Data Europe 2015

OSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
Timothy Spann
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
confluent
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Timothy Spann
 

Similar to Being Ready for Apache Kafka - Apache: Big Data Europe 2015 (20)

Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka Connect
 
OSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
 
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
 
Sas 2015 event_driven
Sas 2015 event_drivenSas 2015 event_driven
Sas 2015 event_driven
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
 
Distributed messaging through Kafka
Distributed messaging through KafkaDistributed messaging through Kafka
Distributed messaging through Kafka
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
 
Python Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuidePython Kafka Integration: Developers Guide
Python Kafka Integration: Developers Guide
 

Recently uploaded

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
shivangimorya083
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
JohnnyPlasten
 

Recently uploaded (20)

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 

Being Ready for Apache Kafka - Apache: Big Data Europe 2015

  • 1. Being Ready for Apache Kafka: Today’s Ecosystem and Future Roadmap Michael G. Noll @miguno Developer Evangelist, Confluent Inc. 1Apache: Big Data Conference, Budapest, Hungary, September 29, 2015
  • 2.  Developer Evangelist at Confluent since August ‘15  Previously Big Data lead at .COM/.NET DNS operator Verisign  Blogging at http://www.michael-noll.com/ (too little time!)  PMC member of Apache Storm (too little time!)  michael@confluent.io 2
  • 3.  Founded in Fall 2014 by the creators of Apache Kafka  Headquartered in San Francisco bay area  We provide a stream data platform based on Kafka  We contribute a lot to Kafka, obviously  3
  • 4. 4 ?
  • 5. 5 Apache Kafka is the distributed, durable equivalent of Unix pipes. Use it to connect and compose your large-scale data apps. this this $ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt
  • 7. Example: LinkedIn after Kafka 7 DB DB DB Logs Sensors Log search Monitoring Security RT analytics Filter Transform Aggregate Data Warehouse Hadoop HDFS
  • 8. 8 Apache Kafka is a high-throughput distributed messaging system. “1,100,000,000,000 msg/day, totaling 175+ TB/day” (LinkedIn) = 3 billion messages since the beginning of this talk
  • 9. 9 Apache Kafka is a publish-subscribe messaging rethought as a distributed commit log. Broker Kafka Cluster Broker Broker Broker Broker Broker Broker Broker Broker Producer Producer Producer Producer Producer Producer Producer Producer Producer Producer Producer Consumer Producer Producer Consumer Producer Producer Consumer ZooKeeper
  • 10. 10 Apache Kafka is a publish-subscribe messaging rethought as a distributed commit log. Topic, e.g. “user_clicks” NewestOldest P P CC
  • 11. So where can Kafka help me? Example, anyone? 11
  • 12. 12
  • 13. 13 YOU Why is Kafka a great fit here?  Scalable Writes  Scalable Reads  Low latency  Time machine Example: Protecting your infrastructure against DDoS attacks
  • 14. Kafka powers many use cases • User behavior, click stream analysis • Infrastructure monitoring and security, e.g. DDoS detection • Fraud detection • Operational telemetry data from mobile devices and sensors • Analyzing system and app logging data • Internet of Things (IoT) applications • …and many more • Yours is missing? Let me know via michael@confluent.io ! 14
  • 15. 15https://cwiki.apache.org/confluence/display/KAFKA/Powered+By Diverse and rapidly growing user base across many industries and verticals.
  • 16. A typical Kafka architecture Yes, we now begin to approach “production” 16
  • 17. 17
  • 18. Typical architecture => typical questions 18 Apps that write to it Source systems Apps that read from it Destination systems Question 3 Question 4 Data and schemas Question 5 Operations Question 2 Question 6a Question 6b Question 1
  • 20. Kafka core Question 1 or “What are the upcoming improvements to core Kafka?” 20 Apps that write to it Source systems Apps that read from it Destination systems Question 3 Question 4 Data and schemas Question 5 Operations Question 2 Question 6a Question 6b Kafka Cluster Question 1
  • 21. Kafka core: upcoming changes in 0.9.0 • Kafka 0.9.0 (formerly 0.8.3) expected in November 2015 • ZooKeeper now only required for Kafka brokers • ZK dependency removed from clients = producers and consumers • Benefits include less load on ZK, lower operational complexity, user apps don’t require interacting with ZK anymore • New, unified consumer Java API • We consolidated the previous “high-level” and “simple” consumer APIs • Old APIs still available and not deprecated (yet) 21
  • 22. New consumer Java API in 0.9.0 22 Configure Subscribe Process
  • 23. Kafka core: upcoming changes in 0.9.0 • Improved data import/export via Copycat • KIP-26: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767 • Will talk about this later • Improved security: SSL support for encrypted data transfer • Yay, finally make your InfoSec team (a bit) happier! • https://cwiki.apache.org/confluence/display/KAFKA/Deploying+SSL+for+Kafka • Improved multi-tenancy: quotas aka throttling for Ps and Cs • KIP-13: https://cwiki.apache.org/confluence/display/KAFKA/KIP-13+-+Quotas • Quotas are defined per broker, will slow down clients if needed • Reduces collateral damage caused by misbehaving apps/teams 23
  • 24. Kafka operations Question 2 or “How do I deploy, manage, monitor, etc. my Kafka clusters?” 24 Apps that write to it Source systems Apps that read from it Destination systems Question 3 Question 4 Data and schemas Question 5 Operations Question 2 Question 6a Question 6b Question 1
  • 25. Deploying Kafka • Hardware recommendations, configuration settings, etc. • http://docs.confluent.io/current/kafka/deployment.html • http://kafka.apache.org/documentation.html#hwandos • Deploying Kafka itself = DIY at the moment • Packages for Debian and RHEL OS families available via Confluent Platform • http://www.confluent.io/developer • Straight-forward to use orchestration tools like Puppet, Ansible • Also: options for Docker, Mesos, YARN, … 25
  • 26. Managing Kafka: CLI tools • Kafka includes a plethora of CLI tools • Managing topics, controlling replication, status of clients, … • Can be tricky to understand which tool to use, when, and how • Helpful pointers: • https://cwiki.apache.org/confluence/display/KAFKA/System+Tools • https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools • KIP-4 will eventually add better management APIs 26
  • 27. Monitoring Kafka: metrics • How to monitor • Usual tools like Graphite, InfluxDB, statsd, Grafana, collectd, diamond • What to monitor – some key metrics • Host metrics: CPU, memory, disk I/O and usage, network I/O • Kafka metrics: consumer lag, replication stats, message latency, Java GC • ZooKeeper metrics: latency of requests, #outstanding requests • Kafka exposes many built-in metrics via JMX • Use e.g. jmxtrans to feed these metrics into Graphite, statsd, etc. 27
  • 28. Monitoring Kafka: logging • You can expect lots of logging data for larger Kafka clusters • Centralized logging services help significantly • You have one already, right? • Elasticsearch/Kibana, Splunk, Loggly, … • Further information about operations and monitoring at: • http://docs.confluent.io/current/kafka/monitoring.html • https://www.slideshare.net/miguno/apache-kafka-08-basic-training- verisign 28
  • 29. Kafka clients #1 Questions 3+4 or “How can my apps talk to Kafka?” 29 Apps that write to it Source systems Apps that read from it Destination systems Question 3 Question 4 Data and schemas Question 5 Operations Question 2 Question 6a Question 6b Question 1
  • 30. Recommended* Kafka clients as of today 30 Language Name Link Java <built-in> http://kafka.apache.org/ C/C++ librdkafka http://github.com/edenhill/librdkafka Python kafka-python https://github.com/mumrah/kafka-python Go sarama https://github.com/Shopify/sarama Node kafka-node https://github.com/SOHU-Co/kafka-node/ Scala reactive kafka https://github.com/softwaremill/reactive-kafka … … … *Opinionated! Full list at https://cwiki.apache.org/confluence/display/KAFKA/Clients
  • 31. Kafka clients: upcoming improvements • Current problem: only Java client is officially supported • A lot of effort and duplication for client maintainers to be compatible with Kafka changes over time (e.g. protocol, ZK for offset management) • Wait time for users until “their” client library is ready for latest Kafka • Idea: use librdkafka (C) as the basis for Kafka clients and provide bindings + idiomatic APIs per target language • Benefits include: • Full protocol support, SSL, etc. needs to be implemented only once • All languages will benefit from the speed of the C implementation • Of course you are always free to pick your favorite client! 31
  • 32. Confluent Kafka-REST • Open source, included in Confluent Platform https://github.com/confluentinc/kafka-rest/ • Alternative to native clients • Supports reading and writing data, status info, Avro, etc. 32 # Get a list of topics $ curl "http://rest-proxy:8082/topics" [ { "name":"userProfiles", "num_partitions": 3 }, { "name":"locationUpdates", "num_partitions": 1 } ]
  • 33. Kafka clients #2 Questions 3+4 or “How can my systems talk to Kafka?” 33 Apps that write to it Source systems Apps that read from it Destination systems Question 3 Question 4 Data and schemas Question 5 Operations Question 2 Question 6a Question 6b Question 1
  • 35. Data import/export: status quo • Until now this has been your problem to solve • Only few tools available, e.g. LinkedIn Camus for Kafka  HDFS export • Typically a DIY solution using the aforementioned client libs • Kafka 0.9.0 will introduce Copycat 35
  • 36. 36 Copycat is the I/O redirection in your Unix pipelines. Use it to get your data into and out of Kafka. $ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt this this
  • 37. Data import/export via Copycat • Copycat is included in upcoming Kafka 0.9.0 • Federated Copycat “connector” development for e.g. HDFS, JDBC • Light-weight, scales from simple testing and one-off jobs to large-scale production scenarios serving an entire organization • Process management agnostic, hence flexible deployment options • Examples: Standalone, YARN, Mesos, or your own (e.g. Puppet w/ supervisord) 37 Copycat Copycat <whatever> <whatever>
  • 38. Data and schemas Question 5 or “Je te comprends pas” 38 Apps that write to it Source systems Apps that read from it Destination systems Question 3 Question 4 Data and schemas Question 5 Operations Question 2 Question 6a Question 6b Question 1
  • 39. 39
  • 40. 40
  • 41. Data and schemas • Agree on contracts for data just like you do for, say, APIs • Producers and consumers of data must understand each other • Free-for-alls typically degenerate quickly into team deathmatches • Benefit from clear contract, schema evolution, type safety, etc. • Organizational problem rather than technical • Hilarious /facepalm moments • Been there, done that  • Take a look at Apache Avro, Thrift, Protocol Buffers • Cf. Avro homepage, https://github.com/miguno/avro-hadoop-starter 41
  • 43. 43 Example: Avro schema for tweets username text timestamp
  • 44. 44 “Tweet” = <definition> “UserProfile” = <definition> “Alert” = <definition> <data> = <definition> <data> = <definition> <data> = <definition> <data> = <definition> <data> = <definition>
  • 45. Schema registry • Stores and retrieves your schemas • Cornerstone for building resilient data pipelines • Viable registry implementation missing until recently • AVRO-1124 (2012-2014) • So everyone started to roll their own schema registry • Again: been there, done that  • There must be a better way, right? 45
  • 46. Confluent Schema Registry • Open source, included in Confluent Platform https://github.com/confluentinc/schema-registry/ • REST API to store and retrieve schemas etc. • Integrates with Kafka clients, Kafka REST, Camus, ... • Can enforce policies for your data, e.g. backwards compatibility • Still not convinced you need a schema registry? • http://www.confluent.io/blog/schema-registry-kafka-stream-processing- yes-virginia-you-really-need-one 46 # List all schema versions registered for topic "foo" $ curl -X GET -i http://registry:8081/subjects/foo/versions
  • 47. Stream processing Question 6 or “How do I actually process my data in Kafka?” 47 Apps that write to it Source systems Apps that read from it Destination systems Question 3 Question 4 Data and schemas Question 5 Operations Question 2 Question 6a Question 6b Question 1
  • 48. Stream processing • Currently three main options • Storm: arguably powering the majority of production deployments • Spark Streaming: runner-up, but gaining momentum due to “main” Spark • DIY: write your own using Kafka client libs, typically with a narrower focus 48
  • 49. 49 Some people, when confronted with a problem to process data in Kafka, think “I know, I’ll use [ Storm | Spark | … ].” Now they have two problems.
  • 50. Stream processing • Currently three main options • Storm: arguably powering the majority of production deployments • Spark Streaming: runner-up, but gaining momentum due to “main” Spark • DIY: write your own using Kafka client libs, typically with a narrower focus • Kafka 0.9.0 will introduce Kafka Streams 50 Four!
  • 51. 51 Kafka Streams is the commands in your Unix pipelines. Use it to transform data stored in Kafka. this this $ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt
  • 52. Kafka Streams • Kafka Streams included in Kafka 0.9.0 • KIP-28: https://cwiki.apache.org/confluence/display/KAFKA/KIP-28+-+Add+a+processor+client • No need to run another framework like Storm alongside Kafka • No need for separate infrastructure and trainings either • Library rather than framework • Won’t dictate your deployment, configuration management, packaging, … • Use it like you’d use Apache Commons, Google Guava, etc. • Easier to integrate with your existing apps and services • 100% compatible with Kafka by definition 52
  • 53. Kafka Streams • Initial version will support • Low-level API as well as higher-level API for Java 7+ • Operations such as join/filter/map/… • Windowing • Proper time modeling, e.g. event time vs. processing time • Local state management with persistence and replication • Schema and Avro support • And more to come – details will be shared over the next weeks! 53
  • 54. 54 Example of higher-level API (much nicer with Java 8 and lambdas) map() filter()
  • 55. Phew, we made it! 55 Apps that write to it Source systems Apps that read from it Destination systems Question 3 Question 4 Data and schemas Question 5 Operations Question 2 Question 6a Question 6b Question 1
  • 56. 56 Kafka $ cat < in.txt | grep “apache” | tr a-z A-Z > out.txt Kafka StreamsCopycat
  • 57. Want to contribute to Kafka and open source? 57 Join the Kafka community http://kafka.apache.org/ …in a great team with the creators of Kafka and also getting paid for it? Confluent is hiring  http://confluent.io/ Questions, comments? Tweet with #ApacheBigData and /cc to @ConfluentInc

Editor's Notes

  1. Audience check: Who has used Kafka before? In production? Happy with it?
  2. Among many other things Kafka allows you to decouple and simplify your architecture.
  3. Writes: Kafka is able to collect large volumes of data. Reads: Kafka supports many downstream consumers because of its great fan-out performance. Different apps, algorithms, teams can collaborate to defend DDoS attacks. Low latency: Kafka allows downstream applications to analyze this data in real-time. For DDoS attacks fast reaction time is crucial to minimize damage. Time machine: Kafka retains your data for days, weeks, or more. You can rewind in time to e.g. perform A/B tests, rectify miscomputation caused by bugs in your apps, etc. For DDoS attacks this helps to optimize algorithms to minimize false positives, which cause collateral damage.
  4. Kafka is an infrastructure tool which is why it's widely applicable. It uses a very specific data model to achieve high performance, but otherwise is very generic.
  5. Given the scope and time limits of the talk we will focus on breadth, not depth. Feel free to find me after the talk or reach out via email or Twitter if you have questions that weren’t answered in the talk! 
  6. We’ll cover many things and areas in the next slides. But don’t let this mislead you – Kafka is actually one of the most easiest Big Data tools to get started with and to operate in production. For example, in my experience you are significantly more likely to run into problems with any stream processing tools that integrate with Kafka, such as Storm or Spark Streaming.
  7. We’ll talk about Kafka Streams later.
  8. If you think the consensus problem in distributed systems is hard…
  9. Process and resource management is orthogonal to Copycat's runtime model. The framework is not responsible for starting/stopping/restarting processes and can be used with any process management strategy, including YARN, Mesos, Kubernetes, or none if you prefer to manage processes manually or through other orchestration tools. Copycat supports standalone mode. You could very easily use this to deliver logs to Kafka using standalone mode instances on all your servers.
  10. If you think the consensus problem in distributed systems is hard…
  11. If you think the consensus problem in distributed systems is hard…
  12. The alternative to schemas: meetings between teams, tribal knowledge, … = the opposite of being agile and fast
  13. Similar to Java API of Spark Streaming.