Machine Learning is separated into model training and model inference. ML frameworks typically load historical data from a data store like HDFS or S3 to train models. This talk shows how you can completely avoid such a data store by ingesting streaming data directly via Apache Kafka from any source system into TensorFlow for model training and model inference using the capabilities of “TensorFlow I/O” add-on.
The talk compares this modern streaming architecture to traditional batch and big data alternatives and explains benefits like the simplified architecture, the ability of reprocessing events in the same order for training different models, and the possibility to build a scalable, mission-critical, real time ML architecture with muss less headaches and problems.
Key takeaways for the audience
• Scalable open source Machine Learning infrastructure
• Streaming ingestion into TensorFlow without the need for another data store like HDFS or S3 (leveraging TensorFlow I/O and its Kafka plugin)
• Stream Processing using analytic models in mission-critical deployments to act in Real Time
• Learn how Apache Kafka open source ecosystem including Kafka Connect, Kafka Streams and KSQL help to build, deploy, score and monitor analytic models
• Comparison and trade-offs between this modern streaming approach and traditional batch model training infrastructures
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Simplified Machine Learning Architecture with an Event Streaming Platform (Apache Kafka + TensorFlow I/O)
1. 1
Simplified Machine Learning Architecture
with an Event Streaming Platform
Kai Waehner | Technology Evangelist, Confluent
contact@kai-waehner.de | LinkedIn | @KaiWaehner | www.confluent.io | www.kai-waehner.de
2. 2
Machine Learning to Improve Traditional
and to Build New Use Cases
Seconds Minutes Hours
Windows of Opportunity
Real Time
Tracking
Predictive
Maintenance
Fraud
Detection
Cross Selling
Transportation
Rerouting
Customer
Service
Inventory
Management
Autonomous
Driving
Face
Recognition
Robotics
Speech
Translation
Video
Generation
Supply Chain
Optimization
Strategic
Planning
3. 3
Global Automotive Company
Builds Connected Car Infrastructure
Digital Transformation
• Improve customer experience
• Increase revenue
• Reduce risk
Time
Today 2 years in the future3 years ago
Project begins Connected car infrastructure
in production for first use cases
Improved processes leveraging
machine learning (predictive
maintenance, cross-selling)
4. 4
Streaming Analytics for
Predictive Maintenance at Scale
IoT
Integration
Layer
Batch
Analytics
Platform
BI
Dashboard
Streaming
Platform
Big Data
Integration
Layer
Car Sensors
Streaming Platform
Other Components
Real Time
Monitoring
System
All
Data
Critical
Data
Ingest
Data
Human
Intelligence
5. 5
Machine Learning (ML)
...allows computers to find hidden insights without
being explicitly programmed where to look.
Machine
Learning
• Decision Trees
• Naïve Bayes
• Clustering
• Neural Networks
• Etc.
Deep
Learning
• CNN
• RNN
• Transformer
• Autoencoder
• Etc.
6. 6
Streaming Analytics for
Predictive Maintenance at Scale
IoT
Integration
Layer
Batch
Analytics
Platform
BI
Dashboard
Streaming
Platform
Big Data
Integration
Layer
Car Sensors
Streaming Platform
Analytics Platform
Other Components
Real Time
Monitoring
System
All
Data
Critical
Data
Ingest
Data
Potential
Detect
Data
Processing
Analytics
Platform
Train Analytic
Model
Consume
Data
Preprocess
Data
Analytic Model
Deploy Analytic
Model
12. 12
A Streaming Platform
is the Underpinning of an Event-driven Architecture
Microservices
DBs
SaaS apps
Mobile
Customer 360
Real-time fraud
detection
Data warehouse
Producers
Consumers
Database
change
Microservices
events
SaaS
data
Customer
experiences
Streams of real time events
Stream processing apps
Connectors
Connectors
Stream processing apps
13. 13
Apache Kafka at Scale
at Tech Giants
> 4.5 trillion messages / day > 6 Petabytes / day
“You name it”
* Kafka Is not just used by tech giants
** Kafka is not just used for big data
14. 14Business Value per Use Case
Business
Value
Improve
Customer
Experience
(CX)
Increase
Revenue
(make money)
Decrease
Costs
(save money)
Core Business
Platform
Increase
Operational
Efficiency
Migrate to
Cloud
Mitigate Risk
(protect money)
Key Drivers
Strategic Objectives
(sample)
Fraud
Detection
IoT sensor
ingestion
Digital
replatforming/
Mainframe Offload
Connected Car: Navigation & improved in-
car experience: Audi
Customer 360
Simplifying Omni-channel Retail at Scale:
Target
Faster transactional
processing / analysis
incl. Machine Learning / AI
Mainframe Offload: RBC
Microservices
Architecture
Online Fraud Detection
Online Security
(syslog, log aggregation,
Splunk replacement)
Middleware
replacement
Regulatory
Digital
Transformation
Application Modernization: Multiple
Examples
Website / Core
Operations
(Central Nervous System)
The [Silicon Valley] Digital Natives;
LinkedIn, Netflix, Uber, Yelp...
Predictive Maintenance: Audi
Streaming Platform in a regulated
environment (e.g. Electronic Medical
Records): Celmatix
Real-time app
updates
Real Time Streaming Platform for
Communications and Beyond: Capital One
Developer Velocity - Building Stateful
Financial Applications with Kafka Streams:
Funding Circle
Detect Fraud & Prevent Fraud in Real
Time: PayPal
Kafka as a Service - A Tale of Security and
Multi-Tenancy: Apple
Example Use Cases
$↑
$↓
$↔
Example Case Studies
(of many)
19. 19
SELECT car_id, event_id, car_model_id, sensor_input
FROM car_sensor c
LEFT JOIN car_models m ON c.car_model_id =
m.car_model_id
WHERE m.car_model_type ='Audi_A8';
Preprocessing
with KSQL
20. 20
Data Ingestion into a Data Store for Model Training
(and Consumption by other Decoupled Applications)
Connect
Preprocessed
Data
Batch Near Real Time Real Time
23. 23
Direct streaming ingestion
for model training
with TensorFlow I/O + Kafka Plugin
(no additional data storage
like S3 or HDFS required!)
Time
Model BModel A
Producer
Distributed Commit Log
Streaming Ingestion and Model Training
with TensorFlow IO
https://github.com/tensorflow/io
27. 27
“CREATE STREAM AnomalyDetection AS
SELECT sensor_id, detectAnomaly(sensor_values)
FROM car_engine;“
User Defined Function (UDF)
Model Deployment with
Apache Kafka, KSQL
and TensorFlow
28. 28
Streaming Analytics with
Kafka and TensorFlow
MQTT
Proxy
Elastic
Search
Grafana
Kafka
Cluster
Kafka
Connect
Car Sensors
Kafka Ecosystem
TensorFlow
Other Components
Kafka
Streams
Application
All
Data
Critical
Data
Ingest
Data
Potential
Detect
KSQL
TensorFlow
Train
Analytic Model
Consume
Data
Preprocess
Data
Analytic Model
Deploy Analytic
Model
31. 31
Key Takeaways
Don’t underestimate the
Hidden Technical Debt
in Machine Learning
Systems
Leverage the Apache
Kafka Open Source
Ecosystem as scalable
and flexible Event
Streaming Platform
Use Streaming Machine
Learning with Kafka
and TensorFlow IO to
simplify your Big Data
Architecture
32. 3232
11. November 2019
Steigenberger Frankfurter Hof
13. November 2019
NOVOTEL Zürich City West
Ben Stopford
Office of the CTO
Confluent
Axel Löhn
Senior Project Manager
Deutsche Bahn
Kai Waehner,
Technologist
Confluent
Ralph Debusmann
IoT Solution Architect
Bosch Power Tools
cnfl.io/cse19frankfurt cnfl.io/cse19zurich