SlideShare a Scribd company logo
1 of 36
Apache Kafka, Tiered Storage and TensorFlow for
Streaming Machine Learning without a Data Lake
Kai Waehner
Technology Evangelist
contact@kai-waehner.de
LinkedIn
@KaiWaehner
www.confluent.io
www.kai-waehner.de
Disclaimer – Status for Tiered Storage in August 2020
KIP-405 –
Add Tiered Storage Support to Kafka
Confluent is actively working on this
with the open source community -
Uber is leading this initiative
Confluent Tiered Storage is available
today in Confluent Platform and used
under the hood in Confluent Cloud
https://cwiki.apache.org/confluence/display/KAFKA/KIP-
405%3A+Kafka+Tiered+Storage
(in the works)
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
STREAM
PROCESSING
Create and store
materialized views
Filter
Analyze in-flight
Time
C CC
Event Streaming
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
Machine Learning to Improve Traditional
and to Build New Use Cases
5www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
Real Time
Tracking
Predictive
Maintenance
Fraud
Detection
Cross
Selling
Transportation
Rerouting
Customer
Service
Inventory
ManagementAutonomous
Driving
Face
Recognition
Robotics
Speech
Translation
Video
Generation
Supply Chain
Optimization Simulations
Real Time Information Digital Transformation Strategic Goals
Customer
Churn
Global Automotive Company
Builds Connected Car Infrastructure
6
Digital Transformation
● Improve Customer
Experience
● Increase Revenue
● Reduce Risk
3 years ago Today 2 years in the future
Project begins Connected car infrastructure
in production for first use
cases
Improved processes leveraging
machine learning (predictive
maintenance, cross-selling)
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
Streaming Analytics for
Predictive Maintenance at Scale
7
IoT
Integration
Layer
Batch
Analytics
Platform
BI
Dashboard
Streaming
Platform
Big Data
Integration
Layer
Car Sensors
Streaming Platform
Other Components
Real Time
Monitoring
System
All
Data
Critical
Data
Ingest
Data
Human
Intelligence
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
Machine Learning (ML)
...allows computers to find hidden insights without
being programmed where to look
8
Machine Learning
● Decision Trees
● Naïve Bayes
● Clustering
● Neural
Networks
● Etc.
Deep Learning
● CNN
● RNN
● Transformer
● Autoencoder
● Etc.
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
Streaming Analytics for
Predictive Maintenance at Scale
9
IoT
Integration
Layer
Batch
Analytics
Platform
BI
Dashboard
Streaming
Platform
Big Data
Integration
Layer
Car Sensors
Streaming Platform
Analytics Platform
Other Components
Real Time
Monitoring
System
All
Data
Critical
Data
Ingest
Data
Potential
DetectAnalytics
Platform
Train
Analytic
Model
Data
Processing
Analytic
Model
Preprocess
Data
Consume
Data
Deploy
Analytic Model
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
The First
Analytic Models
10
How to deploy the models
in production?
…real-time processing?
…at scale?
…24/7 zero uptime?
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
Hidden Technical Debt
in Machine Learning Systems
11
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
Scalable, Technology-Agnostic ML Infrastructures
What is this
thing used everywhere?
https://www.infoq.com/presentations/netflix-ml-meson
https://eng.uber.com/michelangelo
https://www.infoq.com/presentations/paypal-data-service-fraud
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
A Streaming Platform -
The Underpinning of an Event-Driven Architecture
15
Microservices
DBs
SaaS apps
Mobile
Customer 360
Real-time fraud
detection
Data warehouse
Producers
Consumers
Database
change
Microservices
events
SaaS
data
Customer
experiences
Streams of real time events
Stream processing apps
Connectors
Connectors
Stream processing apps
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
Apache Kafka as Infrastructure for ML
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
Apache Kafka’s Open Ecosystem as Infrastructure for ML
Kafka
Streams/
ksqlDB
Kafka Connect
Confluent REST Proxy
Confluent Schema Registry
Go/.NET/Python
Kafka Producer
ksqlDB
Python
Client
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
Ingestion of
IoT Data
20
Replication
MirrorMaker /
Confluent Replicator
Kafka
Connect
Analytics /
Machine
Learning
Ca
rsCa
rsCa
rsCa
rs
Cars
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
Data
Preprocessing
21
Preprocessing
Filter, transform, anonymize, extract features
Streams
Data Ready
For Model Training
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
Preprocessing
with ksqlDB
22
SELECT car_id, event_id, car_model_id, sensor_input
FROM car_sensor c
LEFT JOIN car_models m ON c.car_model_id = m.car_model_id
WHERE m.car_model_type ='Audi_A8';
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
Data Ingestion
into a Data Store for Model Training
(and Consumption by other Decoupled Applications)
23
Connect
Preprocessed
Data
Batch Near
Real Time
Real
Time
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
Extreme scale
usingTensorFlow
and TPUs
in the cloud!
Analytic
Model
Model Training
Using an Elastic
Infrastructure in
the Cloud
24www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
TensorFlow Model —
Autoencoder for Anomaly Detection
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake 25
Direct streaming ingestion
for model training
with TensorFlow I/O + Kafka Plugin
(no additional data storage
like S3 or HDFS required!)
Time
Model BModel A
Producer
Distributed
Commit Log
Streaming Ingestion and Model Training
with TensorFlow IO
https://github.com/tensorflow/io
26
Model X
(at a later time)
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
Store Data Long-Term
in Kafka?
Today, Kafka works well for recent events,
short horizon storage, and manual data
balancing.
Kafka’s present-day design offers
extraordinarily low messaging latency by
storing topic data on fast disks that are
collocated with brokers. This is usually
good.
But sometimes, you need to store a huge
amount of data for a long time.
Kafka
Processing
App
Storage
Transactions,
auth, quota
enforcement,
compaction, ...
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
Simplified Data Lake Architecture
Tiered Storage for Kafka provides
● one platform for all data processing
● an event-based source of truth for
materialized views
● no need for a pipeline between Kafka and
a Data Lake like Hadoop
Benefits
● cost reduction
● long-term backup
● performance isolation
(real-time and historical analysis in the same cluster)
Confluent Tiered Storage for Kafka
Object Store
Processing Storage
Transactions,
auth, quota
enforcement,
compaction, ...
Local
Remote
Kafka
Apps
Store Forever
Older data is offloaded to inexpensive object
storage, permitting it to be consumed at any time.
Save $$$
Storage limitations, like capacity and duration,
are effectively uncapped.
Instantaneously scale up and down
Your Kafka clusters will be able to automatically
self-balance load and hence elastically scale
(Only available in Confluent Platform)
www.kai-waehner.de | @KaiWaehner
Confluent Tiered Storage for Kafka
30www.kai-waehner.de | @KaiWaehner
(Only available in Confluent Platform)
Use Cases for Reprocessing Historical Events
Give me all events from time A to time B
Real-time Producer
Time
• New consumer application
• Error-handling
• Compliance / regulatory processing
• Query and analyze existing events
• Model training
Real-time Consumer
Consumer of
Historical Data
www.kai-waehner.de | @KaiWaehner
Local Predictions
Model Training
in Cloud
Model Deployment
at the Edge
Analytic Model
Separation of
Model Training and Model Inference
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake 32
Streams
Input Event
Prediction
Request
Response
Model Serving
TensorFlow Serving
gRPC / HTTP
Application
Stream Processing
with External Model and RPC
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake 33
Prediction
Stream Processing
Model
doPrediction()
return
value
Stream Processing
with Embedded Model
Streams
Input Event
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake 34
“CREATE STREAM AnomalyDetection AS
SELECT sensor_id,
detectAnomaly(sensor_values)
FROM car_engine;“
User Defined Function (UDF)
Model Deployment with
Apache Kafka, ksqlDB
and TensorFlow
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake 35
Streaming Analytics with
Kafka and TensorFlow
36www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
MQTT Proxy
MongoDB
Storage
MongoDB
Dashboards
Search
Analytics
Kafka
Cluster
Kafka
Connect
Car Sensors
Kafka Ecosystem
TensorFlow
Other Components
Kafka
Streams
Application
All
Data
Critical
Data
Ingest
Data
Potential
DetectTensorFlow
Train
Analytic
Model
ksqlDB
Analytic
Model
Preprocess
Data
Consume
Data
Deploy
Analytic Model
Tiered
Storage
Mobile App
BI Tool
Demo: 100,000 Connected Cars
(Kafka + ksqlDB + MQTT + TensorFlow)
https://github.com/kaiwaehner/hivemq-mqtt-tensorflow-kafka-realtime-iot-machine-learning-training-inference
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake 37
Live
Demo
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake 38
Machine Learning + Apache Kafka
à Examples @ Github
39
https://github.com/kaiwaehner
One pipeline to rule them all
Real-time model scoring, batch model training, near-real time BI analytics
Give me all events from time A to time B
Car sensors
(MQTT connector)
Time
Production
infrastructure
(Java)
Data science / analytics infrastructure
(Python + Jupyter)
www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
Kai Waehner
Technology Evangelist
contact@kai-waehner.de
@KaiWaehner
www.confluent.io
www.kai-waehner.de
www.confluent.io
LinkedIn
Questions? Feedback?
Let’s connect!

More Related Content

What's hot

What's hot (20)

Modern Data Pipelines
Modern Data PipelinesModern Data Pipelines
Modern Data Pipelines
 
AWS Cloud Day Prague 2023 - Serverless tRPC - API protocol for modern TypeScr...
AWS Cloud Day Prague 2023 - Serverless tRPC - API protocol for modern TypeScr...AWS Cloud Day Prague 2023 - Serverless tRPC - API protocol for modern TypeScr...
AWS Cloud Day Prague 2023 - Serverless tRPC - API protocol for modern TypeScr...
 
Designing Microservices
Designing MicroservicesDesigning Microservices
Designing Microservices
 
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...DevOps for Applications in Azure Databricks: Creating Continuous Integration ...
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...
 
Smart erp solutions oracle cloud services overview - 2021 - 2022
Smart erp solutions   oracle cloud services overview - 2021 - 2022Smart erp solutions   oracle cloud services overview - 2021 - 2022
Smart erp solutions oracle cloud services overview - 2021 - 2022
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoop
 
Oracle EPM/BI Overview
Oracle EPM/BI OverviewOracle EPM/BI Overview
Oracle EPM/BI Overview
 
Domain Driven Data: Apache Kafka® and the Data Mesh
Domain Driven Data: Apache Kafka® and the Data MeshDomain Driven Data: Apache Kafka® and the Data Mesh
Domain Driven Data: Apache Kafka® and the Data Mesh
 
Heroku 101 py con 2015 - David Gouldin
Heroku 101   py con 2015 - David GouldinHeroku 101   py con 2015 - David Gouldin
Heroku 101 py con 2015 - David Gouldin
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
 
Next generation business automation with the red hat decision manager and red...
Next generation business automation with the red hat decision manager and red...Next generation business automation with the red hat decision manager and red...
Next generation business automation with the red hat decision manager and red...
 
DevOps for Databricks
DevOps for DatabricksDevOps for Databricks
DevOps for Databricks
 
Simplify DevOps with Microservices and Mobile Backends.pptx
Simplify DevOps with Microservices and Mobile Backends.pptxSimplify DevOps with Microservices and Mobile Backends.pptx
Simplify DevOps with Microservices and Mobile Backends.pptx
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
 
Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...
 
Deploying SAP Solutions on AWS
Deploying SAP Solutions on AWSDeploying SAP Solutions on AWS
Deploying SAP Solutions on AWS
 
Azure Arc Overview from Microsoft
Azure Arc Overview from MicrosoftAzure Arc Overview from Microsoft
Azure Arc Overview from Microsoft
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
 
Automate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile wayAutomate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile way
 
Introduction to Amazon Web Services
Introduction to Amazon Web ServicesIntroduction to Amazon Web Services
Introduction to Amazon Web Services
 

Similar to Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning without a Data Lake

Unleashing Apache Kafka and TensorFlow in Hybrid Cloud Architectures
Unleashing Apache Kafka and TensorFlow in Hybrid Cloud ArchitecturesUnleashing Apache Kafka and TensorFlow in Hybrid Cloud Architectures
Unleashing Apache Kafka and TensorFlow in Hybrid Cloud Architectures
Kai Wähner
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
Kai Wähner
 
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Kai Wähner
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Kai Wähner
 

Similar to Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning without a Data Lake (20)

2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
 
Unleashing Apache Kafka and TensorFlow in Hybrid Cloud Architectures
Unleashing Apache Kafka and TensorFlow in Hybrid Cloud ArchitecturesUnleashing Apache Kafka and TensorFlow in Hybrid Cloud Architectures
Unleashing Apache Kafka and TensorFlow in Hybrid Cloud Architectures
 
Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...
Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...
Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
Apache Kafka Open Source Ecosystem for Machine Learning at Extreme Scale (Apa...
Apache Kafka Open Source Ecosystem for Machine Learning at Extreme Scale (Apa...Apache Kafka Open Source Ecosystem for Machine Learning at Extreme Scale (Apa...
Apache Kafka Open Source Ecosystem for Machine Learning at Extreme Scale (Apa...
 
Kai Waehner - Deep Learning at Extreme Scale in the Cloud with Apache Kafka a...
Kai Waehner - Deep Learning at Extreme Scale in the Cloud with Apache Kafka a...Kai Waehner - Deep Learning at Extreme Scale in the Cloud with Apache Kafka a...
Kai Waehner - Deep Learning at Extreme Scale in the Cloud with Apache Kafka a...
 
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
 
How to Leverage the Apache Kafka Ecosystem to Productionize Machine Learning ...
How to Leverage the Apache Kafka Ecosystem to Productionize Machine Learning ...How to Leverage the Apache Kafka Ecosystem to Productionize Machine Learning ...
How to Leverage the Apache Kafka Ecosystem to Productionize Machine Learning ...
 
Connected Vehicles and V2X with Apache Kafka
Connected Vehicles and V2X with Apache KafkaConnected Vehicles and V2X with Apache Kafka
Connected Vehicles and V2X with Apache Kafka
 
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...
 
IoT and Event Streaming at Scale with Apache Kafka
IoT and Event Streaming at Scale with Apache KafkaIoT and Event Streaming at Scale with Apache Kafka
IoT and Event Streaming at Scale with Apache Kafka
 
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
 
Apache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep LearningApache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep Learning
 
Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Str...
Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Str...Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Str...
Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Str...
 
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
 
Apache Kafka® and Analytics in a Connected IoT World
Apache Kafka® and Analytics in a Connected IoT WorldApache Kafka® and Analytics in a Connected IoT World
Apache Kafka® and Analytics in a Connected IoT World
 
Apache Kafka as Event Streaming Platform for Microservice Architectures
Apache Kafka as Event Streaming Platform for Microservice ArchitecturesApache Kafka as Event Streaming Platform for Microservice Architectures
Apache Kafka as Event Streaming Platform for Microservice Architectures
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaBest Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
 

More from Kai Wähner

The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
Kai Wähner
 
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Kai Wähner
 
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureServerless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Kai Wähner
 

More from Kai Wähner (20)

Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
 
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping MetaverseKafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareApache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
 
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity IndustryData Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
 
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Apache Kafka for Real-time Supply Chainin the Food and Retail IndustryApache Kafka for Real-time Supply Chainin the Food and Retail Industry
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
 
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
 
Apache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and ManufacturingApache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and Manufacturing
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
 
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesEvent Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
 
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
 
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
 
Apache Kafka in the Transportation and Logistics
Apache Kafka in the Transportation and LogisticsApache Kafka in the Transportation and Logistics
Apache Kafka in the Transportation and Logistics
 
Apache Kafka for Cybersecurity and SIEM / SOAR Modernization
Apache Kafka for Cybersecurity and SIEM / SOAR ModernizationApache Kafka for Cybersecurity and SIEM / SOAR Modernization
Apache Kafka for Cybersecurity and SIEM / SOAR Modernization
 
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
 
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureServerless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
 
IBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
IBM Cloud Pak for Integration with Confluent Platform powered by Apache KafkaIBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
IBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
 
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
 

Recently uploaded

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Recently uploaded (20)

Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 

Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning without a Data Lake

  • 1. Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning without a Data Lake Kai Waehner Technology Evangelist contact@kai-waehner.de LinkedIn @KaiWaehner www.confluent.io www.kai-waehner.de
  • 2. Disclaimer – Status for Tiered Storage in August 2020 KIP-405 – Add Tiered Storage Support to Kafka Confluent is actively working on this with the open source community - Uber is leading this initiative Confluent Tiered Storage is available today in Confluent Platform and used under the hood in Confluent Cloud https://cwiki.apache.org/confluence/display/KAFKA/KIP- 405%3A+Kafka+Tiered+Storage (in the works) www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
  • 3. STREAM PROCESSING Create and store materialized views Filter Analyze in-flight Time C CC Event Streaming www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
  • 4. Machine Learning to Improve Traditional and to Build New Use Cases 5www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake Real Time Tracking Predictive Maintenance Fraud Detection Cross Selling Transportation Rerouting Customer Service Inventory ManagementAutonomous Driving Face Recognition Robotics Speech Translation Video Generation Supply Chain Optimization Simulations Real Time Information Digital Transformation Strategic Goals Customer Churn
  • 5. Global Automotive Company Builds Connected Car Infrastructure 6 Digital Transformation ● Improve Customer Experience ● Increase Revenue ● Reduce Risk 3 years ago Today 2 years in the future Project begins Connected car infrastructure in production for first use cases Improved processes leveraging machine learning (predictive maintenance, cross-selling) www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
  • 6. Streaming Analytics for Predictive Maintenance at Scale 7 IoT Integration Layer Batch Analytics Platform BI Dashboard Streaming Platform Big Data Integration Layer Car Sensors Streaming Platform Other Components Real Time Monitoring System All Data Critical Data Ingest Data Human Intelligence www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
  • 7. Machine Learning (ML) ...allows computers to find hidden insights without being programmed where to look 8 Machine Learning ● Decision Trees ● Naïve Bayes ● Clustering ● Neural Networks ● Etc. Deep Learning ● CNN ● RNN ● Transformer ● Autoencoder ● Etc. www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
  • 8. Streaming Analytics for Predictive Maintenance at Scale 9 IoT Integration Layer Batch Analytics Platform BI Dashboard Streaming Platform Big Data Integration Layer Car Sensors Streaming Platform Analytics Platform Other Components Real Time Monitoring System All Data Critical Data Ingest Data Potential DetectAnalytics Platform Train Analytic Model Data Processing Analytic Model Preprocess Data Consume Data Deploy Analytic Model www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
  • 9. The First Analytic Models 10 How to deploy the models in production? …real-time processing? …at scale? …24/7 zero uptime? www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
  • 10. Hidden Technical Debt in Machine Learning Systems 11 https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
  • 11. Scalable, Technology-Agnostic ML Infrastructures What is this thing used everywhere? https://www.infoq.com/presentations/netflix-ml-meson https://eng.uber.com/michelangelo https://www.infoq.com/presentations/paypal-data-service-fraud www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
  • 12. A Streaming Platform - The Underpinning of an Event-Driven Architecture 15 Microservices DBs SaaS apps Mobile Customer 360 Real-time fraud detection Data warehouse Producers Consumers Database change Microservices events SaaS data Customer experiences Streams of real time events Stream processing apps Connectors Connectors Stream processing apps www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
  • 13. Apache Kafka as Infrastructure for ML www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
  • 14. Apache Kafka’s Open Ecosystem as Infrastructure for ML Kafka Streams/ ksqlDB Kafka Connect Confluent REST Proxy Confluent Schema Registry Go/.NET/Python Kafka Producer ksqlDB Python Client www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
  • 15. Ingestion of IoT Data 20 Replication MirrorMaker / Confluent Replicator Kafka Connect Analytics / Machine Learning Ca rsCa rsCa rsCa rs Cars www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
  • 16. Data Preprocessing 21 Preprocessing Filter, transform, anonymize, extract features Streams Data Ready For Model Training www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
  • 17. Preprocessing with ksqlDB 22 SELECT car_id, event_id, car_model_id, sensor_input FROM car_sensor c LEFT JOIN car_models m ON c.car_model_id = m.car_model_id WHERE m.car_model_type ='Audi_A8'; www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
  • 18. Data Ingestion into a Data Store for Model Training (and Consumption by other Decoupled Applications) 23 Connect Preprocessed Data Batch Near Real Time Real Time www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
  • 19. Extreme scale usingTensorFlow and TPUs in the cloud! Analytic Model Model Training Using an Elastic Infrastructure in the Cloud 24www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
  • 20. TensorFlow Model — Autoencoder for Anomaly Detection www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake 25
  • 21. Direct streaming ingestion for model training with TensorFlow I/O + Kafka Plugin (no additional data storage like S3 or HDFS required!) Time Model BModel A Producer Distributed Commit Log Streaming Ingestion and Model Training with TensorFlow IO https://github.com/tensorflow/io 26 Model X (at a later time) www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
  • 22. Store Data Long-Term in Kafka? Today, Kafka works well for recent events, short horizon storage, and manual data balancing. Kafka’s present-day design offers extraordinarily low messaging latency by storing topic data on fast disks that are collocated with brokers. This is usually good. But sometimes, you need to store a huge amount of data for a long time. Kafka Processing App Storage Transactions, auth, quota enforcement, compaction, ... www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake
  • 23. Simplified Data Lake Architecture Tiered Storage for Kafka provides ● one platform for all data processing ● an event-based source of truth for materialized views ● no need for a pipeline between Kafka and a Data Lake like Hadoop Benefits ● cost reduction ● long-term backup ● performance isolation (real-time and historical analysis in the same cluster)
  • 24. Confluent Tiered Storage for Kafka Object Store Processing Storage Transactions, auth, quota enforcement, compaction, ... Local Remote Kafka Apps Store Forever Older data is offloaded to inexpensive object storage, permitting it to be consumed at any time. Save $$$ Storage limitations, like capacity and duration, are effectively uncapped. Instantaneously scale up and down Your Kafka clusters will be able to automatically self-balance load and hence elastically scale (Only available in Confluent Platform) www.kai-waehner.de | @KaiWaehner
  • 25. Confluent Tiered Storage for Kafka 30www.kai-waehner.de | @KaiWaehner (Only available in Confluent Platform)
  • 26. Use Cases for Reprocessing Historical Events Give me all events from time A to time B Real-time Producer Time • New consumer application • Error-handling • Compliance / regulatory processing • Query and analyze existing events • Model training Real-time Consumer Consumer of Historical Data www.kai-waehner.de | @KaiWaehner
  • 27. Local Predictions Model Training in Cloud Model Deployment at the Edge Analytic Model Separation of Model Training and Model Inference www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake 32
  • 28. Streams Input Event Prediction Request Response Model Serving TensorFlow Serving gRPC / HTTP Application Stream Processing with External Model and RPC www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake 33
  • 29. Prediction Stream Processing Model doPrediction() return value Stream Processing with Embedded Model Streams Input Event www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake 34
  • 30. “CREATE STREAM AnomalyDetection AS SELECT sensor_id, detectAnomaly(sensor_values) FROM car_engine;“ User Defined Function (UDF) Model Deployment with Apache Kafka, ksqlDB and TensorFlow www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake 35
  • 31. Streaming Analytics with Kafka and TensorFlow 36www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake MQTT Proxy MongoDB Storage MongoDB Dashboards Search Analytics Kafka Cluster Kafka Connect Car Sensors Kafka Ecosystem TensorFlow Other Components Kafka Streams Application All Data Critical Data Ingest Data Potential DetectTensorFlow Train Analytic Model ksqlDB Analytic Model Preprocess Data Consume Data Deploy Analytic Model Tiered Storage Mobile App BI Tool
  • 32. Demo: 100,000 Connected Cars (Kafka + ksqlDB + MQTT + TensorFlow) https://github.com/kaiwaehner/hivemq-mqtt-tensorflow-kafka-realtime-iot-machine-learning-training-inference www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake 37
  • 33. Live Demo www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake 38
  • 34. Machine Learning + Apache Kafka à Examples @ Github 39 https://github.com/kaiwaehner
  • 35. One pipeline to rule them all Real-time model scoring, batch model training, near-real time BI analytics Give me all events from time A to time B Car sensors (MQTT connector) Time Production infrastructure (Java) Data science / analytics infrastructure (Python + Jupyter) www.kai-waehner.de | @KaiWaehner | Streaming Machine Learning without a Data Lake