Confluent building a real-time streaming platform using kafka streams and kafka connect-20min-version

•Download as PPTX, PDF•

2 likes•627 views

Jeremy Custenborder from Confluent talked about how Kafka brings an event-centric approach to building streaming applications, and how to use Kafka Connect and Kafka Streams to build them.

Software

Building a real-
time streaming
platform using
Kafka Connect +
Kafka Streams
Jeremy Custenborder, Systems Engineer, Confluent

• Everything in the company is a real-time stream
• > 1.2 trillion messages written per day
• > 3.4 trillion messages read per day
• ~ 1 PB of stream data
• Thousands of engineers
• Tens of thousands of producer processes

Resources
• Confluent
• Company website: http://www.confluent.io
• Blog: http://www.confluent.io/blog
• Free Ebook “Making Sense of Stream Processing”
http://www.confluent.io/making-sense-of-stream-processing-ebook
• Apache Kafka
• http://kafka.apache.org
• Kafka Connect
• http://www.confluent.io/blog/announcing-kafka-connect-building-large-scale-
low-latency-data-pipelines
• Kafka Streams
• http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-
made-simple

Thanks!
Jeremy Custenborder | jeremy@confluent.io |
Download Kafka
and Confluent Platform
www.confluent.io/download

Training!
http://www.confluent.io/training
Discount Code: BELLEVUE10
Operations Training in Seattle December 10th.

What's hot

Data integration with Apache Kafkaconfluent

Introduction to Apache Kafka and Confluent... and why they matterconfluent

Kafka connect-london-meetup-2016Gwen (Chen) Shapira

Kafka connectAndrew Stevenson

Apache Kafka & Kafka Connectをに使ったデータ連携パターン(改めETLの実装)Keigo Suda

Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...HostedbyConfluent

The Many Faces of Apache Kafka: Leveraging real-time data at scaleNeha Narkhede

Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...HostedbyConfluent

Monitoring Apache Kafka with Confluent Control Center confluent

Apache Kafka 0.8 basic training - VerisignMichael Noll

Deploying Kafka on DC/OSKaufman Ng

Data Pipelines with Kafka ConnectKaufman Ng

Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017Michael Noll

Introduction to Apache Kafka and why it matters - MadridPaolo Castagna

Apache kafka-a distributed streaming platformconfluent

Power of the Log: LSM & Append Only Data Structuresconfluent

Introducing Kafka's Streams APIconfluent

KSQL Introconfluent

PostgreSQL + Kafka: The Delight of Change Data CaptureJeff Klukas

Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...Michael Noll

What's hot (20)

Data integration with Apache Kafka

Introduction to Apache Kafka and Confluent... and why they matter

Kafka connect-london-meetup-2016

Kafka connect

Apache Kafka & Kafka Connectをに使ったデータ連携パターン(改めETLの実装)

Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...

The Many Faces of Apache Kafka: Leveraging real-time data at scale

Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...

Monitoring Apache Kafka with Confluent Control Center

Apache Kafka 0.8 basic training - Verisign

Deploying Kafka on DC/OS

Data Pipelines with Kafka Connect

Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

Introduction to Apache Kafka and why it matters - Madrid

Apache kafka-a distributed streaming platform

Power of the Log: LSM & Append Only Data Structures

Introducing Kafka's Streams API

KSQL Intro

PostgreSQL + Kafka: The Delight of Change Data Capture

Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...

Similar to Confluent building a real-time streaming platform using kafka streams and kafka connect-20min-version

Apache Kafka - Scalable Message-Processing and more !Guido Schmutz

Reducing Microservice Complexity with Kafka and Reactive Streamsjimriecken

Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...Jonghyun Lee

An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.Data Con LA

Keystone - ApacheCon 2016Peter Bakas

Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis

Kafka 탄생과 생태계Gee Yeol Nahm

0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019confluent

Sas 2015 event_drivenSascha Möllering

Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...LINE Corporation

Being Ready for Apache Kafka - Apache: Big Data Europe 2015Michael Noll

introductiontoapachekafka-201102140206.pdfTarekHamdi8

Introduction to Apache KafkaAIMDek Technologies

Distributed Kafka Architecture Taboola ScaleApache Kafka TLV

Introduction Apache KafkaJoe Stein

AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017Monal Daxini

Web Analytics using Kafka - August talk w/ Women Who CodePurnima Kamath

Data streaming-systemsimcpune

NoSQL afternoon in Japan Kumofs & MessagePackSadayuki Furuhashi

NoSQL afternoon in Japan kumofs & MessagePackSadayuki Furuhashi

Similar to Confluent building a real-time streaming platform using kafka streams and kafka connect-20min-version (20)

Apache Kafka - Scalable Message-Processing and more !

Reducing Microservice Complexity with Kafka and Reactive Streams

Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...

An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.

Keystone - ApacheCon 2016

Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...

Kafka 탄생과 생태계

0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019

Sas 2015 event_driven

Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...

Being Ready for Apache Kafka - Apache: Big Data Europe 2015

introductiontoapachekafka-201102140206.pdf

Introduction to Apache Kafka

Distributed Kafka Architecture Taboola Scale

Introduction Apache Kafka

AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017

Web Analytics using Kafka - August talk w/ Women Who Code

Data streaming-systems

NoSQL afternoon in Japan Kumofs & MessagePack

NoSQL afternoon in Japan kumofs & MessagePack

Recently uploaded

Software Quality Assurance Interview QuestionsArshad QA

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveCall Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls

HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823

Right Money Management App For Your Financial GoalsJhone kinadey

The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171

CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions

Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812

Microsoft AI Transformation Partner Playbook.pdfWilly Marroquin (WillyDevNET)

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave

Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy

Optimizing AI for immediate response in Smart CCTVshikhaohhpro

Recently uploaded (20)

Software Quality Assurance Interview Questions

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

How To Troubleshoot Collaboration Apps for the Modern Connected Worker

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️

HR Software Buyers Guide in 2024 - HRSoftware.com

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️

Right Money Management App For Your Financial Goals

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf

CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...

Unlocking the Future of AI Agents with Large Language Models

Microsoft AI Transformation Partner Playbook.pdf

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications

Optimizing AI for immediate response in Smart CCTV

Confluent building a real-time streaming platform using kafka streams and kafka connect-20min-version

1. Building a real- time streaming platform using Kafka Connect + Kafka Streams Jeremy Custenborder, Systems Engineer, Confluent

10.

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

26. • Everything in the company is a real-time stream • > 1.2 trillion messages written per day • > 3.4 trillion messages read per day • ~ 1 PB of stream data • Thousands of engineers • Tens of thousands of producer processes

27.

28.

29.

30.

31.

32.

33.

34.

35.

36.

37.

38.

39.

40.

41.

42.

43.

44.

45.

46.

47.

48.

49.

50.

51.

52.

53.

54.

55. Resources • Confluent • Company website: http://www.confluent.io • Blog: http://www.confluent.io/blog • Free Ebook “Making Sense of Stream Processing” http://www.confluent.io/making-sense-of-stream-processing-ebook • Apache Kafka • http://kafka.apache.org • Kafka Connect • http://www.confluent.io/blog/announcing-kafka-connect-building-large-scale- low-latency-data-pipelines • Kafka Streams • http://www.confluent.io/blog/introducing-kafka-streams-stream-processing- made-simple

56. Thanks! Jeremy Custenborder | jeremy@confluent.io | Download Kafka and Confluent Platform www.confluent.io/download

57. Training! http://www.confluent.io/training Discount Code: BELLEVUE10 Operations Training in Seattle December 10th.

Editor's Notes

Hi, I’m Neha Narkhede… There is a big paradigm shift happening around the world where companies are moving rapidly towards leveraging data in real-time and fundamentally moving away from batch-oriented computing. But how do you do that? Well that is what today’s talk is about. I’m going to summarize 6 years of work in 15 mins, so let’s get started.
Unordered, unbounded and large-scale datasets are increasingly common in day-to-day business. Stream data means different things for different businesses. For retail, it might mean streams of orders and shipments, for finance, it might mean streams of stock ticker data while for web companies, it might mean streams of user activity data. Stream data is everywhere. At the same time, there is a huge push towards getting faster results: doing instant credit card fraud detection, doing instant credit card payment processing vs only 5 times a day, being able to detect and alert on a problem that causes retail sales to dip in seconds vs a day later (you can only imagine what that would do to retail companies over black Friday)
So the takeaway is that businesses operate in real-time not batch, if you go to a store to buy something, you don’t wait there for several hours to get it. So data processing required to make key business decisions and to operate a business effectively should also happen in real-time. Here are some examples to support that claim…
Event = something that happened. Different for different businesses.
Log files are also event streams. For instance, every line in a log file is an event that in this case tells you how the service is being used.
There is an inherent duality in tables and streams; Traditional databases are all about tables full of state but are not designed to respond to streams of events that modify those tables.
Tables have rows that store the latest value for a unique key. But…no notion of time
If you look at how a table gets constructed over time, you will notice that…
The operations are actually a stream of events where the event is just the operation that modifies the table. Every database does this internally and it is called a changelog
So events are everywhere, what next? We need to fundamentally move to event-centric thinking. For a retail website, there are possibly various avenues that generate the “product view” event. A standard thing to do is to ensure that all product view data ends up in Hadoop so you can run analytics on user interest to power various business functions from marketing to product positioning and so on.
Reality about 100x more complex. In some corner, you are using some messaging system for app-to-app communication. You might have a custom way of loading data from various databases into Hadoop. But then more destinations appear over time and now you have to feed the same data to a search system, various caches etc. This is a common reality and a simplified version. 300 services ~100 databases Multi-datacenter Trolling: load into Oracle, search, etc
The core insight is that a data pipeline is also an event stream.
What you need instead of that scary picture is a central streaming platform at the heart of a datacenter. A central nervous system that collects data from various sources and feeds all other systems and apps that need to consume and process data in real-time. Why does this make sense?
Why is a streaming platform needed? Because data sources and destinations add up over time. Initially you might have just the web app that produces the product view event and maybe you’ve only thought about analyzing it in Hadoop.
But over time, the mobile app shows up that also produces the same data and several more applications as destinations for search, recommendations, security etc. Event centric thinking involves building a forward-compatible architecture. You will never be able to foresee what future apps might show up that will need the same data. So capture it in a central, scalable streaming platform that asynchronously feeds downstream systems.
So how do you build such a streaming platform?
That journey starts with Apache Kafka.
At a high-level, Kafka is a pub-sub messaging system that has producers that capture events. Events are sent to and stored locally on a central cluster of brokers. And consumers subscribe to topics or named categories of data. End-to-end, producers to consumer data flow is real-time.
Magic of Kafka is in the implementation. It is not just a pub-sub messaging system, it is a modern distributed platform… How so?
All that means, you can throw lots of data at Kafka and have it be made available throughout the company within milliseconds. At LinkedIn and several other companies, Kafka is deployed at a large scale…
In the last 5 years since it was open-sourced, it has been widely adopted by 1000s of companies worldwide.
So Kafka is the foundation of the central streaming platform.
Infrastructure is really only as useful as the data it has. The next step moving to a streaming platform based data architecture is solving the ETL problem.
0.9
REST Apis for management
Core: Data pipeline Venture bet: Stream processing
Most people think they know…
Doesn’t mean you drop everything on the floor if anything slows down Streaming algorithms—online space Can compute median
About how inputs are translated into outputs (very fundamental)
HTTP/REST All databases Run all the time Each request totally independent—No real ordering Can fail individual requests if you want Very simple! About the future!
“Ed, the MapReduce job never finishes if you watch it like that” Job kicks off at a certain time Cron! Processes all the input, produces all the input Data is usually static Hadoop! DWH, JCL Archaic but powerful. Can do analytics! Compex algorithms! Also can be really efficient! Inherently high latency
Generalizes request/response and batch. Program takes some inputs and produces some outputs Could be all inputs Could be one at a time Runs continuously forever!
For some time, stream processing was thought of as a faster map-reduce layer useful for faster analytics, requiring deployment of a central cluster much like Hadoop. But in my experience, I’ve learnt that the most compelling applications that do stream processing look much more like an event-driven microservice and less like a Hive query or Spark job.
Companies == streams What a retail store do Streams Retail - Sales - Shipments and logistics - Pricing - Re-ordering - Analytics - Fraud and theft
Let’s dive into the real-time analytics and apps area
Only one thing you can do if you think the world needs to change, you live in Silicon Valley—quit your job and do it. Mission: Build a Streaming Platform Product: Confluent Platform
Thank you slide. Add to the end of your presentation.
Thank you slide. Add to the end of your presentation.

Confluent building a real-time streaming platform using kafka streams and kafka connect-20min-version

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Confluent building a real-time streaming platform using kafka streams and kafka connect-20min-version

Similar to Confluent building a real-time streaming platform using kafka streams and kafka connect-20min-version (20)

Recently uploaded

Recently uploaded (20)

Confluent building a real-time streaming platform using kafka streams and kafka connect-20min-version

Editor's Notes