Personalization Journey: From Single Node to Cloud Streaming

•

0 likes•202 views

In the online gaming industry we receive a vast amount of transactions that need to be handled in real time. Our customers get to choose from hundreds or even thousand options, and providing a seamless experience is crucial in our industry. Recommendation systems can be the answer in such cases but require handling loads of data and need to utilize large amounts of processing power. Towards this goal, in the last two years we have taken down the road of machine learning and AI in order to transform our customer’s daily experience and upgrade our internal services.

Data & Analytics

Personalization Journey
From single node to
Cloud Streaming

Agenda
Stefanos Doltsinis
Machine Learning Architect
Kostas Andrikopoulos
Big Data Architect

About
▪ Kaizen is a top GameTech company in Greece and one of the fastest
growing in Europe.
▪ At Kaizen we use the technology to offer the best possible product
and services to those who trust us for their entertainment.

AIM: Offer personalized services to our customers
▪ Personalized content
▪ Personalized offers

A bit of history - initial workflow
▪ Several data sources
▪ Data Warehouse, DB’s, Files etc.
▪ Training on local workstation
▪ Model / application
deployment (docker)

Architecture Bottlenecks and Challenges
▪ Data
▪ Data availability
▪ Time traveling
▪ Noisy label / no label
▪ Features
▪ Recalculation
▪ Model
▪ Versioning
▪ Experiment tracking / logs
▪ Dedicated VMs
▪ Scalability
▪ Application dockerization
▪ Model versioning
ApplicationMachine learning

Journey Log: Day 210
▪ Databricks & Azure
▪ Real-time Data flows
▪ Feature creation
▪ Model predictions
▪ Batch Data flows
▪ Model training
▪ ETL
▪ MLflow
▪ Experiment Tracking
▪ Model registry
▪ Delta Lake
▪ Single Source of Truth
▪ ACID transactions
▪ Time travel

Designing Data Pipelines (What, Why) => How
▪ What, Why
▪ Input:
▪ Structured Data stored in Kafka in avro format
▪ Latency up to 10 sec
▪ Output:
▪ avro messages dispatched in Kafka
▪ directly consumed from microservices
▪ How
▪ Use structured streaming for both:
▪ feature generation
▪ model prediction
▪ Use Kafka for low latency and pipelining between
data flows
Use case 1. Pipelines with low latency

Designing Data Pipelines (What, Why) => How
▪ What, Why
▪ Input:
▪ Structured Data stored in Kafka in avro format
▪ Delta Tables
▪ Latency few minutes
▪ Output:
▪ Delta Tables
▪ PostgreSQL tables
▪ How
▪ Use structured streaming for both:
▪ feature generation
▪ model prediction
▪ Use Batch processing for feature vector generation
Use case 2. Pipelines with average latency

Personalization Journey
▪ Some numbers
▪ ~3K unique games per day
▪ ~ breaks down to markets
▪ ~300K unique events per year
▪ Our aim is to provide
▪ personalized content
▪ improve experience
▪ increase loyalty
Sportsbook Personalization

Architecture and technical overview
▪ Collaborative filtering
▪ Rating utility matrix
▪ Historical customer preferences
▪ Spark MLlib - ALS
▪ Daily trainings
▪ ~600M of transactions annually
▪ ~400K customers / ~300K unique events
▪ ~ 500M daily recommendations
▪ Dynamic content matching
▪ MAP - Top 100 : ~0.7

Personalization Journey
▪ Reward increases loyalty
▪ ~ 40% of customer support communication
▪ ~ 4.5M bonus reward assessments per year
▪ Manual and periodic assessments
▪ Real-time decision on bonus eligibility and allocation
Real Time Bonus Computation

Architecture and technical overview
▪ Feature / prediction streaming
▪ Binary Classification / MLlib
Gradient Boosting
▪ MLflow
▪ Experimental tracking
▪ Model deployment
▪ Model registry

Future steps
▪ Real-time applications
▪ Feature store and reusability
▪ Cassandra
▪ MLflow Model Serving
▪ Use Redis for key value lookup
use cases

Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.

What's hot

Northwestern Mutual Journey – Transform BI Space to CloudDatabricks

Building Data Lakes with Apache AirflowGary Stafford

Curriculum Associates Strata NYC 2017Kristi Lewandowski

DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...HostedbyConfluent

Machine Learning Data Lineage with MLflow and Delta LakeDatabricks

Real-Time Analytics with Spark and MemSQLSingleStore

Automated Metadata Management in Data Lake – A CI/CD Driven ApproachDatabricks

Google App EngineDave Nielsen

Introducing the Hub for Data OrchestrationAlluxio, Inc.

Migrating Big Data Workloads to the CloudRobert Sanders

Real-Time Analytics with Confluent and MemSQLSingleStore

The Holy Grail of Data AnalyticsDan Lynn

Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentHostedbyConfluent

The evolution of the big data platform @ Netflix (OSCON 2015)Eva Tse

Build Real-Time Applications with Databricks StreamingDatabricks

Five ways database modernization simplifies your data lifeSingleStore

Databricks: A Tool That Empowers You To Do More With DataDatabricks

Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...HostedbyConfluent

Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...HostedbyConfluent

Converging Database Transactions and Analytics SingleStore

What's hot (20)

Northwestern Mutual Journey – Transform BI Space to Cloud

Building Data Lakes with Apache Airflow

Curriculum Associates Strata NYC 2017

DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...

Machine Learning Data Lineage with MLflow and Delta Lake

Real-Time Analytics with Spark and MemSQL

Automated Metadata Management in Data Lake – A CI/CD Driven Approach

Google App Engine

Introducing the Hub for Data Orchestration

Migrating Big Data Workloads to the Cloud

Real-Time Analytics with Confluent and MemSQL

The Holy Grail of Data Analytics

Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent

The evolution of the big data platform @ Netflix (OSCON 2015)

Build Real-Time Applications with Databricks Streaming

Five ways database modernization simplifies your data life

Databricks: A Tool That Empowers You To Do More With Data

Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...

Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...

Converging Database Transactions and Analytics

Similar to Personalization Journey: From Single Node to Cloud Streaming

Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks

Architecting a Next Gen Data Platform – Strata London 2018Jonathan Seidman

How QBerg scaled to store data longer, query it fasterMariaDB plc

WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics PlatformWSO2

Clickstream Analysis With Apache SparkAndreas Zitzelsberger

Clickstream Analysis with Apache SparkQAware GmbH

Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI

Automating Data Quality Processes at ReckittDatabricks

Introduction to WSO2 Analytics Platform: 2016 Q2 UpdateSrinath Perera

DA_01_Intro.pptxAlok Mohapatra

WSO2 Workshop Sydney 2016 - AnalyticsDassana Wijesekara

The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...Databricks

Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...Thomas W. Fry

Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...ScyllaDB

Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks feature...Data Con LA

Architecting a Next Generation Data Platform – Strata Singapore 2017Jonathan Seidman

Efficiently Building Machine Learning Models for Predictive Maintenance in th...Databricks

Worldwide Local Latency With ScyllaDB ScyllaDB

Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax

Webinar: The Future of SQLCrate.io

Similar to Personalization Journey: From Single Node to Cloud Streaming (20)

Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks

Architecting a Next Gen Data Platform – Strata London 2018

How QBerg scaled to store data longer, query it faster

WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform

Clickstream Analysis With Apache Spark

Clickstream Analysis with Apache Spark

Data Analytics Service Company and Its Ruby Usage

Automating Data Quality Processes at Reckitt

Introduction to WSO2 Analytics Platform: 2016 Q2 Update

DA_01_Intro.pptx

WSO2 Workshop Sydney 2016 - Analytics

The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...

Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...

Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...

Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks feature...

Architecting a Next Generation Data Platform – Strata Singapore 2017

Efficiently Building Machine Learning Models for Predictive Maintenance in th...

Worldwide Local Latency With ScyllaDB

Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...

Webinar: The Future of SQL

Recently uploaded

2023 Survey Shows Dip in High School E-Cigarette UseBisnar Chase Personal Injury Attorneys

Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics

Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56

Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics

Non Text Magic Studio Magic Design for Presentations L&P.pdfPratikPatil591646

why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole

What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17

Principles and Practices of Data VisualizationKianJazayeri1

Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics

Introduction to Mongo DB-open-‐source, high-‐performance, document-‐orient...boychatmate1

knowledge representation in artificial intelligencePriyadharshiniG41

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics

Data Analysis Project: Stroke PredictionBoston Institute of Analytics

Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone

DATA ANALYSIS using various data sets like shoping data set etclalithasri22

English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml

Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181

Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics

IBEF report on the Insurance market in IndiaManalVerma4

Recently uploaded (20)

2023 Survey Shows Dip in High School E-Cigarette Use

Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model

Statistics For Management by Richard I. Levin 8ed.pdf

Bank Loan Approval Analysis: A Comprehensive Data Analysis Project

Non Text Magic Studio Magic Design for Presentations L&P.pdf

why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...

What To Do For World Nature Conservation Day by Slidesgo.pptx

Principles and Practices of Data Visualization

Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...

Introduction to Mongo DB-open-‐source, high-‐performance, document-‐orient...

knowledge representation in artificial intelligence

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...

Data Analysis Project: Stroke Prediction

Digital Indonesia Report 2024 by We Are Social .pdf

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024

DATA ANALYSIS using various data sets like shoping data set etc

English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf

Rithik Kumar Singh codealpha pythohn.pdf

Decoding Patterns: Customer Churn Prediction Data Analysis Project

IBEF report on the Insurance market in India

Personalization Journey: From Single Node to Cloud Streaming

1. Personalization Journey From single node to Cloud Streaming

2. Agenda Stefanos Doltsinis Machine Learning Architect Kostas Andrikopoulos Big Data Architect

3. About ▪ Kaizen is a top GameTech company in Greece and one of the fastest growing in Europe. ▪ At Kaizen we use the technology to offer the best possible product and services to those who trust us for their entertainment.

4. AIM: Offer personalized services to our customers ▪ Personalized content ▪ Personalized offers

5. A bit of history - initial workflow ▪ Several data sources ▪ Data Warehouse, DB’s, Files etc. ▪ Training on local workstation ▪ Model / application deployment (docker)

6. Architecture Bottlenecks and Challenges ▪ Data ▪ Data availability ▪ Time traveling ▪ Noisy label / no label ▪ Features ▪ Recalculation ▪ Model ▪ Versioning ▪ Experiment tracking / logs ▪ Dedicated VMs ▪ Scalability ▪ Application dockerization ▪ Model versioning ApplicationMachine learning

7. Journey Log: Day 210 ▪ Databricks & Azure ▪ Real-time Data flows ▪ Feature creation ▪ Model predictions ▪ Batch Data flows ▪ Model training ▪ ETL ▪ MLflow ▪ Experiment Tracking ▪ Model registry ▪ Delta Lake ▪ Single Source of Truth ▪ ACID transactions ▪ Time travel

8. Designing Data Pipelines (What, Why) => How ▪ What, Why ▪ Input: ▪ Structured Data stored in Kafka in avro format ▪ Latency up to 10 sec ▪ Output: ▪ avro messages dispatched in Kafka ▪ directly consumed from microservices ▪ How ▪ Use structured streaming for both: ▪ feature generation ▪ model prediction ▪ Use Kafka for low latency and pipelining between data flows Use case 1. Pipelines with low latency

9. Designing Data Pipelines (What, Why) => How ▪ What, Why ▪ Input: ▪ Structured Data stored in Kafka in avro format ▪ Delta Tables ▪ Latency few minutes ▪ Output: ▪ Delta Tables ▪ PostgreSQL tables ▪ How ▪ Use structured streaming for both: ▪ feature generation ▪ model prediction ▪ Use Batch processing for feature vector generation Use case 2. Pipelines with average latency

10. Personalization Journey ▪ Some numbers ▪ ~3K unique games per day ▪ ~ breaks down to markets ▪ ~300K unique events per year ▪ Our aim is to provide ▪ personalized content ▪ improve experience ▪ increase loyalty Sportsbook Personalization

11. Architecture and technical overview ▪ Collaborative filtering ▪ Rating utility matrix ▪ Historical customer preferences ▪ Spark MLlib - ALS ▪ Daily trainings ▪ ~600M of transactions annually ▪ ~400K customers / ~300K unique events ▪ ~ 500M daily recommendations ▪ Dynamic content matching ▪ MAP - Top 100 : ~0.7

12. Personalization Journey ▪ Reward increases loyalty ▪ ~ 40% of customer support communication ▪ ~ 4.5M bonus reward assessments per year ▪ Manual and periodic assessments ▪ Real-time decision on bonus eligibility and allocation Real Time Bonus Computation

13. Architecture and technical overview ▪ Feature / prediction streaming ▪ Binary Classification / MLlib Gradient Boosting ▪ MLflow ▪ Experimental tracking ▪ Model deployment ▪ Model registry

14. Future steps ▪ Real-time applications ▪ Feature store and reusability ▪ Cassandra ▪ MLflow Model Serving ▪ Use Redis for key value lookup use cases

15. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.

Personalization Journey: From Single Node to Cloud Streaming

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Personalization Journey: From Single Node to Cloud Streaming

Similar to Personalization Journey: From Single Node to Cloud Streaming (20)

More from Databricks

More from Databricks (20)

Recently uploaded

Recently uploaded (20)

Personalization Journey: From Single Node to Cloud Streaming