SlideShare a Scribd company logo
1 of 35
Schemas Beyond The Edge
Alexei Zenin
Platform Engineer, Uken Games
August 17th, 2021
Journey
Mobile Analytics at Uken Games
Key Features of Schema Registry
Revamped Mobile Analytics Pipeline: JSON to Protobuf
2
Mobile Analytics at Uken Games
Collection of data to drive:
● Retention & engagement analysis (AB tests, DAU)
● Operational debugging & investigation
(purchases, rewards)
● Lower-level metric analysis (HTTP, CPU, Memory)
Focus will be on the first two
3
Current Ingestion Pipeline Design
4
~200 million events per day
Schema Silos
5
● Each silo has its own processes and
tools
● Several teams managing the same data
asset definitions
● Duplication of work across the
boundaries
○ Structure
○ Data types
○ Naming
6
Schema Drift
Pros & Cons
Pros
● Adding a new event is quick
● JSON is easy to use
● JSON well supported across systems
Cons
● Duplicated effort across teams for data management
● Repeated schema definitions
● Manual processes to keep things in sync
● JSON retransmits schema information each time
7
Schema Registry Refresher
8
Overview of Schema Registry
● Provides a central place to upload your
schemas
● Immutable & Idempotent API
● Acts as “barcode system”
9
https://bit.ly/3ikXUci
Default Wire Format (Confluent)
10
Single Primary Architecture
11
● Primary is elected via
Kafka Group Protocol
● Each Secondary is informed of
the primary’s address
● Primary is responsible for writes
(e.g. registering new schemas)
● Every node can serve reads or
forward write requests
Ingestion Pipeline 2.0: Protobuf
12
Ingestion Pipeline 2.0
Goals:
● Decrease boilerplate work
● Leverage automation
● Increase transparency
● Single source of truth for schemas
Solution:
Gitops paradigm with Protobuf & Schema Registry
13
JSON to Protobuf: Schema Structures
● Legacy JSON schema structure use the concept of an envelope with various subdivisions
● Convert into Protobuf, preserve schema structure
14
Protobuf Equivalent
Protobuf features:
● Can express custom types and use
composition to glue together
schemas in 1 top level schema
● Has support for some native types
like google.protobuf.Timestamp
● Ability to generate code from schema
(e.g. Java, C#)
15
Envelope per payload?
● Uken has between 200-300 event
payloads per game
● One approach is to copy paste the
envelope per payload per game
Envelopes = Payloads X Games
● Downsides are duplication in schemas
and generated code
● Leads to a poor developer experience
16
“Oneof” Branching
● Try using the oneof construct to
enumerate all possible payloads at
the EventPayload level
● Only need an envelope per game or 1
mega envelope for all games
Envelopes = O(Games)
● Similar to Avro Unions
● Still has duplication of envelope and
EventPayload
17
How to get around strict Protobuf schemas
● Each definition needs to be defined upfront and explicitly in Protobuf
Solution:
Defer schema attached until runtime to be able to reuse envelope across games
18
Protobuf’s Any: Dynamic Message Container
● Allows you to use any embedded
Protobuf type
● Uses special “packing” code
● The type_url only accepts Protobuf
package names
● Requires class to be present in
application
19
https://bit.ly/2Uo6RcS
Schema Registry Compatible “Any”
● Enables attaching schemas at runtime
● Removes type_url for schema_id
● The value field is Proto3 encoded bytes
20
Dynamic Envelope
● Can define one envelope for
all games
● Use the AnyUken type for
EventPayload
● Tradeoff explicitness for
flexibility
● Elevate schema IDs to a first
class concept within the
schemas themselves
(schema pointers)
21
22
Protobuf View
How do we get these schemas
past the edge?
23
Integrating Schema Registry with Mobile clients
Problems:
● Generated Protobuf classes are not aware of schema IDs
● Client device needs access to schema IDs
24
“Online” Approach (traditional)
25
● Expose Schema Registry (SR) to
mobile clients directly
● Fetch required schema IDs
during app runtime
Disadvantages:
● Need to setup security for
schema registry
● Need to scale SR to millions of
clients
● Point of failure for client
● Adds network overhead to
client
“Embedded” Approach
26
Embedded Tradeoffs
● No need to expose SR to millions of clients
● Leverage the immutable property of schema
IDs
● Custom build process
● Total snapshot of schema IDs built into app
27
Serverless + Schema Registry
28
EventBatch wire format
29
● Define a generic envelope as the API contract
between API Gateway & Mobile client
● Able to take any resolvable schema, return
error if bad data
● Avoids hardcoding a specific version of
AnalyticsEvent
● Keeps wire format compliant to Proto3
Nesting Schema IDs Visualized
30
Performance Comparison
31
● 3x improvement in latency to ingest a
batch of events
● 2x reduction in size per event
● 2x increase in possible number of
events buffered on device
JSON Latency
Protobuf Latency
Schema Management: GitOps paradigm
● Version control Protobuf schemas in Git monorepo
● Use Merge Requests to collaborate on impending changes
● Run CI/CD on commits
● Place people into the right spots, let automation do the rest
32
33
34
Next Steps
● Adding a Data Dictionary
● Improving visibility with integrations to Slack
● Iterating on RACI matrix for data management
● Migrating Spark Jobs to utilize new data management process
35
Takeaways
● GitOps allows for data management
automation
● Schema Registry can empower devices outside
the data center
● Schema IDs allow flexible envelope designs that
operate better at scale
● Protobuf can help reduce costs by several times

More Related Content

What's hot

Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperSaurav Haloi
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaJiangjie Qin
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introductioncolorant
 
Kafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersKafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersJean-Paul Azar
 
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMwareEvent Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMwareHostedbyConfluent
 
Building a Web Application with Kafka as your Database
Building a Web Application with Kafka as your DatabaseBuilding a Web Application with Kafka as your Database
Building a Web Application with Kafka as your Databaseconfluent
 
Présentation de Apache Zookeeper
Présentation de Apache ZookeeperPrésentation de Apache Zookeeper
Présentation de Apache ZookeeperMichaël Morello
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
 
Admission Control in Impala
Admission Control in ImpalaAdmission Control in Impala
Admission Control in ImpalaCloudera, Inc.
 
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan ZhangExperiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan ZhangDatabricks
 
Cloudera Impala Source Code Explanation and Analysis
Cloudera Impala Source Code Explanation and AnalysisCloudera Impala Source Code Explanation and Analysis
Cloudera Impala Source Code Explanation and AnalysisYue Chen
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaGuido Schmutz
 

What's hot (20)

Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
kafka
kafkakafka
kafka
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Apache flink
Apache flinkApache flink
Apache flink
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
Kafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersKafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer Consumers
 
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMwareEvent Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
Event Streaming with Kafka Streams and Spring Cloud Stream | Soby Chacko, VMware
 
Building a Web Application with Kafka as your Database
Building a Web Application with Kafka as your DatabaseBuilding a Web Application with Kafka as your Database
Building a Web Application with Kafka as your Database
 
Présentation de Apache Zookeeper
Présentation de Apache ZookeeperPrésentation de Apache Zookeeper
Présentation de Apache Zookeeper
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
 
Admission Control in Impala
Admission Control in ImpalaAdmission Control in Impala
Admission Control in Impala
 
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan ZhangExperiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
 
Cloudera Impala Source Code Explanation and Analysis
Cloudera Impala Source Code Explanation and AnalysisCloudera Impala Source Code Explanation and Analysis
Cloudera Impala Source Code Explanation and Analysis
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 

Similar to Schemas Beyond The Edge

Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleDmytro Semenov
 
Building Kick Ass Video Games for the Cloud
Building Kick Ass Video Games for the CloudBuilding Kick Ass Video Games for the Cloud
Building Kick Ass Video Games for the CloudChris Schalk
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesSeungYong Oh
 
Node.js Presentation
Node.js PresentationNode.js Presentation
Node.js PresentationExist
 
RPC in Smalltalk
 RPC in Smalltalk RPC in Smalltalk
RPC in SmalltalkESUG
 
Seminario eMadrid 2015 09 10 sobre Serious Games (UCM) Manuel Freire - RAGE:...
Seminario eMadrid 2015 09 10 sobre Serious Games (UCM) Manuel Freire -  RAGE:...Seminario eMadrid 2015 09 10 sobre Serious Games (UCM) Manuel Freire -  RAGE:...
Seminario eMadrid 2015 09 10 sobre Serious Games (UCM) Manuel Freire - RAGE:...eMadrid network
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning InfrastructureSigOpt
 
LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2Linaro
 
mloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game developmentmloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game developmentDavid Galeano
 
Mobile game architecture on GCP
Mobile game architecture on GCPMobile game architecture on GCP
Mobile game architecture on GCP명근 최
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesAlexander Penev
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Guglielmo Iozzia
 
Electron JS | Build cross-platform desktop applications with web technologies
Electron JS | Build cross-platform desktop applications with web technologiesElectron JS | Build cross-platform desktop applications with web technologies
Electron JS | Build cross-platform desktop applications with web technologiesBethmi Gunasekara
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
 
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...David Geurts
 
Ultimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on KubernetesUltimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on Kuberneteskloia
 

Similar to Schemas Beyond The Edge (20)

Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scale
 
ARISE
ARISEARISE
ARISE
 
Building Kick Ass Video Games for the Cloud
Building Kick Ass Video Games for the CloudBuilding Kick Ass Video Games for the Cloud
Building Kick Ass Video Games for the Cloud
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
 
Node.js Presentation
Node.js PresentationNode.js Presentation
Node.js Presentation
 
RPC in Smalltalk
 RPC in Smalltalk RPC in Smalltalk
RPC in Smalltalk
 
Seminario eMadrid 2015 09 10 sobre Serious Games (UCM) Manuel Freire - RAGE:...
Seminario eMadrid 2015 09 10 sobre Serious Games (UCM) Manuel Freire -  RAGE:...Seminario eMadrid 2015 09 10 sobre Serious Games (UCM) Manuel Freire -  RAGE:...
Seminario eMadrid 2015 09 10 sobre Serious Games (UCM) Manuel Freire - RAGE:...
 
Machine Learning Infrastructure
Machine Learning InfrastructureMachine Learning Infrastructure
Machine Learning Infrastructure
 
LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2
 
mloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game developmentmloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game development
 
Mobile game architecture on GCP
Mobile game architecture on GCPMobile game architecture on GCP
Mobile game architecture on GCP
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE Architectures
 
COM+ & MSMQ
COM+ & MSMQCOM+ & MSMQ
COM+ & MSMQ
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
Electron JS | Build cross-platform desktop applications with web technologies
Electron JS | Build cross-platform desktop applications with web technologiesElectron JS | Build cross-platform desktop applications with web technologies
Electron JS | Build cross-platform desktop applications with web technologies
 
resume
resumeresume
resume
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...
 
Ultimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on KubernetesUltimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on Kubernetes
 
Rashmi_Resume
Rashmi_ResumeRashmi_Resume
Rashmi_Resume
 

More from confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

More from confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Recently uploaded

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Recently uploaded (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

Schemas Beyond The Edge

  • 1. Schemas Beyond The Edge Alexei Zenin Platform Engineer, Uken Games August 17th, 2021
  • 2. Journey Mobile Analytics at Uken Games Key Features of Schema Registry Revamped Mobile Analytics Pipeline: JSON to Protobuf 2
  • 3. Mobile Analytics at Uken Games Collection of data to drive: ● Retention & engagement analysis (AB tests, DAU) ● Operational debugging & investigation (purchases, rewards) ● Lower-level metric analysis (HTTP, CPU, Memory) Focus will be on the first two 3
  • 4. Current Ingestion Pipeline Design 4 ~200 million events per day
  • 5. Schema Silos 5 ● Each silo has its own processes and tools ● Several teams managing the same data asset definitions ● Duplication of work across the boundaries ○ Structure ○ Data types ○ Naming
  • 7. Pros & Cons Pros ● Adding a new event is quick ● JSON is easy to use ● JSON well supported across systems Cons ● Duplicated effort across teams for data management ● Repeated schema definitions ● Manual processes to keep things in sync ● JSON retransmits schema information each time 7
  • 9. Overview of Schema Registry ● Provides a central place to upload your schemas ● Immutable & Idempotent API ● Acts as “barcode system” 9 https://bit.ly/3ikXUci
  • 10. Default Wire Format (Confluent) 10
  • 11. Single Primary Architecture 11 ● Primary is elected via Kafka Group Protocol ● Each Secondary is informed of the primary’s address ● Primary is responsible for writes (e.g. registering new schemas) ● Every node can serve reads or forward write requests
  • 12. Ingestion Pipeline 2.0: Protobuf 12
  • 13. Ingestion Pipeline 2.0 Goals: ● Decrease boilerplate work ● Leverage automation ● Increase transparency ● Single source of truth for schemas Solution: Gitops paradigm with Protobuf & Schema Registry 13
  • 14. JSON to Protobuf: Schema Structures ● Legacy JSON schema structure use the concept of an envelope with various subdivisions ● Convert into Protobuf, preserve schema structure 14
  • 15. Protobuf Equivalent Protobuf features: ● Can express custom types and use composition to glue together schemas in 1 top level schema ● Has support for some native types like google.protobuf.Timestamp ● Ability to generate code from schema (e.g. Java, C#) 15
  • 16. Envelope per payload? ● Uken has between 200-300 event payloads per game ● One approach is to copy paste the envelope per payload per game Envelopes = Payloads X Games ● Downsides are duplication in schemas and generated code ● Leads to a poor developer experience 16
  • 17. “Oneof” Branching ● Try using the oneof construct to enumerate all possible payloads at the EventPayload level ● Only need an envelope per game or 1 mega envelope for all games Envelopes = O(Games) ● Similar to Avro Unions ● Still has duplication of envelope and EventPayload 17
  • 18. How to get around strict Protobuf schemas ● Each definition needs to be defined upfront and explicitly in Protobuf Solution: Defer schema attached until runtime to be able to reuse envelope across games 18
  • 19. Protobuf’s Any: Dynamic Message Container ● Allows you to use any embedded Protobuf type ● Uses special “packing” code ● The type_url only accepts Protobuf package names ● Requires class to be present in application 19 https://bit.ly/2Uo6RcS
  • 20. Schema Registry Compatible “Any” ● Enables attaching schemas at runtime ● Removes type_url for schema_id ● The value field is Proto3 encoded bytes 20
  • 21. Dynamic Envelope ● Can define one envelope for all games ● Use the AnyUken type for EventPayload ● Tradeoff explicitness for flexibility ● Elevate schema IDs to a first class concept within the schemas themselves (schema pointers) 21
  • 23. How do we get these schemas past the edge? 23
  • 24. Integrating Schema Registry with Mobile clients Problems: ● Generated Protobuf classes are not aware of schema IDs ● Client device needs access to schema IDs 24
  • 25. “Online” Approach (traditional) 25 ● Expose Schema Registry (SR) to mobile clients directly ● Fetch required schema IDs during app runtime Disadvantages: ● Need to setup security for schema registry ● Need to scale SR to millions of clients ● Point of failure for client ● Adds network overhead to client
  • 27. Embedded Tradeoffs ● No need to expose SR to millions of clients ● Leverage the immutable property of schema IDs ● Custom build process ● Total snapshot of schema IDs built into app 27
  • 28. Serverless + Schema Registry 28
  • 29. EventBatch wire format 29 ● Define a generic envelope as the API contract between API Gateway & Mobile client ● Able to take any resolvable schema, return error if bad data ● Avoids hardcoding a specific version of AnalyticsEvent ● Keeps wire format compliant to Proto3
  • 30. Nesting Schema IDs Visualized 30
  • 31. Performance Comparison 31 ● 3x improvement in latency to ingest a batch of events ● 2x reduction in size per event ● 2x increase in possible number of events buffered on device JSON Latency Protobuf Latency
  • 32. Schema Management: GitOps paradigm ● Version control Protobuf schemas in Git monorepo ● Use Merge Requests to collaborate on impending changes ● Run CI/CD on commits ● Place people into the right spots, let automation do the rest 32
  • 33. 33
  • 34. 34 Next Steps ● Adding a Data Dictionary ● Improving visibility with integrations to Slack ● Iterating on RACI matrix for data management ● Migrating Spark Jobs to utilize new data management process
  • 35. 35 Takeaways ● GitOps allows for data management automation ● Schema Registry can empower devices outside the data center ● Schema IDs allow flexible envelope designs that operate better at scale ● Protobuf can help reduce costs by several times