SlideShare a Scribd company logo
1 of 30
Feature store:
Solving anti-patterns
in ML-systems
About
Synerise
Synerise is a European data company that collects,
interprets and leverages online and offline data with
the use of AI to power 1:1 Customer Engagement.
Our technology helps to power brands in all major
B2C verticals including retail, consumer banking,
telecommunications, public and automotive.
AI: a powerful
engine of growth
Customer
Engagement
Empower
Employee
Innovation
Cost
Optimization
Product
Transformation
Challanges
to address
Old
Combine available datasets for each
customer
Perform regression, scoring, ranking,
segmentation, anomaly detection, …
Do all of that in real-time
Support non-stationary, evolving data
distributions
Support evolving feature spaces
1.
2.
3.
4.
5.
Support incremental improvement when new
data sources become available6.
7.
8.
9.
Achieve performance on-par with
or better than dedicated
single use-case
models
Low latency, high throughput!
Data safety - all data can be obfuscated via
hashing, quantization etc.
Observation
Reality of ML
system
Source: Hidden Technical Debt in Machine Learning Systems, D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd
Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, Dan Dennison
"…a mature system might end up being
(at most) 5% machine learning code
and (at least) 95% glue code”
Source: Hidden Technical Debt in Machine Learning Systems, D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips,
Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, Dan Dennison
ML Systems
Anti-Patterns
Old
Glue Code1.
ML Systems
Anti-Patterns
Glue Code
Pipeline jungles
Dead experimental code paths
1.
2.
3.
ML Systems
Anti-Patterns
Glue Code
Pipeline jungles
Dead experimental code paths
Reproducibility debt & inconsistency
between training and serving
Multi-model systems
1.
2.
3.
4.
5.
ML Systems
Anti-Patterns
Old
Glue Code
Pipeline jungles
Dead experimental code paths
Reproducibility debt & inconsistency
between training and serving
Multi-model systems
1.
2.
3.
4.
5.
Data-processing doesn’t scale6.
7. Real-time Feature requires engineers
ML Systems
Anti-Patterns
Old
Glue Code
Pipeline jungles
Dead experimental code paths
Reproducibility debt & inconsistency
between training and serving
Multi-model systems
1.
2.
3.
4.
5.
Data-processing doesn’t scale6.
7.
9.
10.
Real-time Feature requires engineers
Lack of Feature discovery
Lack of standardization
Lack of data testing8.
11. Multi-language issue
„Data is the hardest part of ML and the most important piece to get right.
Modelers spend most of their time selecting and transforming Feature at training time
and then building the pipelines to deliver those Feature to production models.”
Source: Scaling Machine Learning at Uber with Michelangelo, Jeremy Hermann and Mike Del Balso
Machine Learning & Data science are in the same place
where software engineering was 20 years ago...
Remedy
First-class
entity
Machine learning and data science is about data, but often data is not a first-class entity
in such systems.
So:
1. Let's make the data a first-class entity as code is for software engineering
2. Let's make Feature a first-class entity as functions/modules are for software engineering
3. Let's think about models as compiled software libraries
First-class
entity
Let people be creative, do the awesome job, free them from the usual and boring,
but necessary:
o data access & ingestion
o data processing & cleaning
o feature engineering & management
o data modeling & building processing pipelines
First-class
entity
Source: Hidden Technical Debt in Machine Learning Systems, D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd
Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, Dan Dennison
Feature
store
Feature store is:
o a place to store unified, versioned, tested and documented Feature
o an interface between data engineering and model development
o an interface for feature discovery and analysis
Raw/Structered
Data
Feature store Models
Future
Engineering
Training & Serving
Feature
store
Model
1
Model
2
Model
3
Data
set 1
Data
set 2
Data
set 3
Feature
engineering 2
Feature
engineering 3
Feature
engineering 1
Feature
store
Model
1
Model
2
Model
3
Data
set 1
Data
set 2
Data
set 3
Feature Store
Feature
store gives:
Old
Feature versioning
Feature trust – can be tested
Feature consistency
Feature discovery and reuse
Feature documentation and analytics
1.
2.
3.
4.
5.
Standardized access to Feature between
training and serving
– also reproducibility of results
6.
8.
9.
Feature can be access controlled
Production model results can be Feature for
other models
Automatic backfilling of Feature –
avoid expensive re computations7.
Feature
store
Avg.CostofaNewML
Project
Num. Curated Feature
in Feature Store
Source: The Feature Store in Hopsworks, Jim Dowling
Feature
store architecture
Source Create Ingest Store Access
Event
Stream
Batch
Data
Stream
Transform
Batch
Transform
Ingest
Feature
Storage
ModelAPI
Discovery
API
Model
Serving
Model
Training
Feature
Metadata
Feature
store architecture
Source Create Ingest Store Access
Event
Stream
Batch
Data
Stream
Transform
Batch
Transform
Ingest
Feature
Storage
ModelAPI
Discovery
API
Model
Serving
Model
Training
Feature
Metadata
Feature
store - storage:
Old
Clickhouse:
o Scalable big data column-oriented
database
o Easy to use
o Handle large and sparse feature
spaces
o ASOF join - joining sequences with
a non-exact match
1. SSDB2.
o Persistent high performace key-
value database
o Implements Redis protocol
o Designed to store collection data
o Replication(master-slave), load
balance
Feature
store architecture
Source Create Ingest Store Access
Event
Stream
Batch
Data
Stream
Transform
Batch
Transform
Ingest
Feature
Storage
ModelAPI
Discovery
API
Model
Serving
Model
Training
Feature
Metadata
SSDB
Feature
store
Thanks to the Feature store, we are able to:
o cut down new model development time
o cut down model training time
o easily test new ideas
In one word:
focus on interesting and creative parts of machine learning based systems.
Next steps
and future work
o unify streaming part
o implement feature analytics and monitoring
o improve feature documentation
Andrzej Michałowski
Head of AI Research and Development
andrzej.michalowski@synerise.com
Thank you
Questions?

More Related Content

What's hot

Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformDatabricks
 
Unified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model DeploymentUnified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model DeploymentDatabricks
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowWes McKinney
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchpmanvi
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practicesconfluent
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in RustAndrew Lamb
 
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022HostedbyConfluent
 
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageEnd to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageAnimesh Singh
 
Deep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDeep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDatabricks
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®confluent
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks
 
Zipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkZipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkDatabricks
 

What's hot (20)

Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Elk
Elk Elk
Elk
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
 
Unified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model DeploymentUnified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model Deployment
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
 
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
 
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageEnd to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
 
Deep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDeep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.x
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Apache Atlas: Governance for your Data
Apache Atlas: Governance for your DataApache Atlas: Governance for your Data
Apache Atlas: Governance for your Data
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Zipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkZipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering Framework
 

Similar to Feature store: Solving anti-patterns in ML-systems

Real time data processing and model inferncing platform with Kafka streams (N...
Real time data processing and model inferncing platform with Kafka streams (N...Real time data processing and model inferncing platform with Kafka streams (N...
Real time data processing and model inferncing platform with Kafka streams (N...KafkaZone
 
Practical machine learning
Practical machine learningPractical machine learning
Practical machine learningFaizan Javed
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)Tao Xie
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Databricks
 
Artificial Intelligence As a Service
Artificial Intelligence As a ServiceArtificial Intelligence As a Service
Artificial Intelligence As a ServiceJohn Liu
 
databricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineeringdatabricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineeringMohamed MEJDOUBI
 
Introducción al Machine Learning Automático
Introducción al Machine Learning AutomáticoIntroducción al Machine Learning Automático
Introducción al Machine Learning AutomáticoSri Ambati
 
Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15
Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15
Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15MLconf
 
Making Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedMaking Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedLaurenz Wuttke
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1Bill Liu
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
 
Open, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesOpen, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesNick Pentreath
 
Managing the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflowManaging the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflowDatabricks
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPaige_Roberts
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software EngineeringMiroslaw Staron
 
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Matt Stubbs
 

Similar to Feature store: Solving anti-patterns in ML-systems (20)

Real time data processing and model inferncing platform with Kafka streams (N...
Real time data processing and model inferncing platform with Kafka streams (N...Real time data processing and model inferncing platform with Kafka streams (N...
Real time data processing and model inferncing platform with Kafka streams (N...
 
Practical machine learning
Practical machine learningPractical machine learning
Practical machine learning
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
 
Artificial Intelligence As a Service
Artificial Intelligence As a ServiceArtificial Intelligence As a Service
Artificial Intelligence As a Service
 
Architecting for Data Science
Architecting for Data ScienceArchitecting for Data Science
Architecting for Data Science
 
databricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineeringdatabricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineering
 
Introducción al Machine Learning Automático
Introducción al Machine Learning AutomáticoIntroducción al Machine Learning Automático
Introducción al Machine Learning Automático
 
Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15
Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15
Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15
 
Making Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedMaking Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons Learned
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
Open, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesOpen, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI Pipelines
 
Technovision
TechnovisionTechnovision
Technovision
 
Managing the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflowManaging the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflow
 
AI meets Big Data
AI meets Big DataAI meets Big Data
AI meets Big Data
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
 
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
 

Recently uploaded

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 

Recently uploaded (20)

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 

Feature store: Solving anti-patterns in ML-systems

  • 2. About Synerise Synerise is a European data company that collects, interprets and leverages online and offline data with the use of AI to power 1:1 Customer Engagement. Our technology helps to power brands in all major B2C verticals including retail, consumer banking, telecommunications, public and automotive.
  • 3. AI: a powerful engine of growth Customer Engagement Empower Employee Innovation Cost Optimization Product Transformation
  • 4. Challanges to address Old Combine available datasets for each customer Perform regression, scoring, ranking, segmentation, anomaly detection, … Do all of that in real-time Support non-stationary, evolving data distributions Support evolving feature spaces 1. 2. 3. 4. 5. Support incremental improvement when new data sources become available6. 7. 8. 9. Achieve performance on-par with or better than dedicated single use-case models Low latency, high throughput! Data safety - all data can be obfuscated via hashing, quantization etc.
  • 6. Reality of ML system Source: Hidden Technical Debt in Machine Learning Systems, D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, Dan Dennison
  • 7. "…a mature system might end up being (at most) 5% machine learning code and (at least) 95% glue code” Source: Hidden Technical Debt in Machine Learning Systems, D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, Dan Dennison
  • 9. ML Systems Anti-Patterns Glue Code Pipeline jungles Dead experimental code paths 1. 2. 3.
  • 10. ML Systems Anti-Patterns Glue Code Pipeline jungles Dead experimental code paths Reproducibility debt & inconsistency between training and serving Multi-model systems 1. 2. 3. 4. 5.
  • 11. ML Systems Anti-Patterns Old Glue Code Pipeline jungles Dead experimental code paths Reproducibility debt & inconsistency between training and serving Multi-model systems 1. 2. 3. 4. 5. Data-processing doesn’t scale6. 7. Real-time Feature requires engineers
  • 12. ML Systems Anti-Patterns Old Glue Code Pipeline jungles Dead experimental code paths Reproducibility debt & inconsistency between training and serving Multi-model systems 1. 2. 3. 4. 5. Data-processing doesn’t scale6. 7. 9. 10. Real-time Feature requires engineers Lack of Feature discovery Lack of standardization Lack of data testing8. 11. Multi-language issue
  • 13. „Data is the hardest part of ML and the most important piece to get right. Modelers spend most of their time selecting and transforming Feature at training time and then building the pipelines to deliver those Feature to production models.” Source: Scaling Machine Learning at Uber with Michelangelo, Jeremy Hermann and Mike Del Balso
  • 14. Machine Learning & Data science are in the same place where software engineering was 20 years ago...
  • 16. First-class entity Machine learning and data science is about data, but often data is not a first-class entity in such systems. So: 1. Let's make the data a first-class entity as code is for software engineering 2. Let's make Feature a first-class entity as functions/modules are for software engineering 3. Let's think about models as compiled software libraries
  • 17. First-class entity Let people be creative, do the awesome job, free them from the usual and boring, but necessary: o data access & ingestion o data processing & cleaning o feature engineering & management o data modeling & building processing pipelines
  • 18. First-class entity Source: Hidden Technical Debt in Machine Learning Systems, D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, Dan Dennison
  • 19. Feature store Feature store is: o a place to store unified, versioned, tested and documented Feature o an interface between data engineering and model development o an interface for feature discovery and analysis Raw/Structered Data Feature store Models Future Engineering Training & Serving
  • 20. Feature store Model 1 Model 2 Model 3 Data set 1 Data set 2 Data set 3 Feature engineering 2 Feature engineering 3 Feature engineering 1
  • 22. Feature store gives: Old Feature versioning Feature trust – can be tested Feature consistency Feature discovery and reuse Feature documentation and analytics 1. 2. 3. 4. 5. Standardized access to Feature between training and serving – also reproducibility of results 6. 8. 9. Feature can be access controlled Production model results can be Feature for other models Automatic backfilling of Feature – avoid expensive re computations7.
  • 23. Feature store Avg.CostofaNewML Project Num. Curated Feature in Feature Store Source: The Feature Store in Hopsworks, Jim Dowling
  • 24. Feature store architecture Source Create Ingest Store Access Event Stream Batch Data Stream Transform Batch Transform Ingest Feature Storage ModelAPI Discovery API Model Serving Model Training Feature Metadata
  • 25. Feature store architecture Source Create Ingest Store Access Event Stream Batch Data Stream Transform Batch Transform Ingest Feature Storage ModelAPI Discovery API Model Serving Model Training Feature Metadata
  • 26. Feature store - storage: Old Clickhouse: o Scalable big data column-oriented database o Easy to use o Handle large and sparse feature spaces o ASOF join - joining sequences with a non-exact match 1. SSDB2. o Persistent high performace key- value database o Implements Redis protocol o Designed to store collection data o Replication(master-slave), load balance
  • 27. Feature store architecture Source Create Ingest Store Access Event Stream Batch Data Stream Transform Batch Transform Ingest Feature Storage ModelAPI Discovery API Model Serving Model Training Feature Metadata SSDB
  • 28. Feature store Thanks to the Feature store, we are able to: o cut down new model development time o cut down model training time o easily test new ideas In one word: focus on interesting and creative parts of machine learning based systems.
  • 29. Next steps and future work o unify streaming part o implement feature analytics and monitoring o improve feature documentation
  • 30. Andrzej Michałowski Head of AI Research and Development andrzej.michalowski@synerise.com Thank you Questions?