2. The InferenceService architecture consists of a static graph of components which coordinate
requests for a single model. Advanced features such as Ensembling, A/B testing, and Multi-Arm-
Bandits should compose InferenceServices together.
Inference Service Control Plane
4. Feature
● What is a feature?
4
/feature/
A feature is a measurable property of the object you’re trying to analyze.
Features are the basic building blocks of models. The
quality of the features in your dataset has a major impact
on the quality of the insights you will gain when you use
that dataset for machine learning.
Importance of Features
5. Hidden Technical Debt in Machine Learning Systems
5
Tech/User
Trends
Coming up with features is difficult, time-consuming,
requires expert knowledge. "Applied machine
learning" is basically feature engineering.
- Andrew Ng, Founder of deeplearning.ai
...some machine learning projects succeed and
some fail. What makes the difference? Easily the
most important factor is the features used.
- Pedro Domingos, author of ‘The Master Algorithm
algorithms we used are very standard for Kagglers. […]
We spent most of our efforts in feature engineering.
[...]
- Xavier Conort, Chief Data Scientist DataRobot
Feature Engineering is essential, difficult, and costly
6. The Feature problem
● Different ML models typically use some common set of features
● Examples of common features:
○ Average Loan default rate by zip code: Used by models which predict who should be targeted for a marketing offer, models which predict who should be offered a loan, etc.
○ Average property prices in an area
○ Credit history of customers: Used by models which predict anything that is related to clients.
○ Average traffic in an area: Used by models which deal with finding best route
● Finding the right features for an ML models requires:
○ Thinking of which features will be relevant for building the model
○ Identifying the right data from the data catalog/data lake for building the feature
○ Feature engineering to get the feature in the right format from the source data
○ This is repeated across teams by data scientists!
7. 7
Feature Management is a Huge Painpoint
Spend more time on data prep
Lack of data consistency between training and serving
Duplicate work because they do not know it exists
Manage fragmented data infrastructure
Deal with more request as the data science team scales
Hard to get features into production
Data Scientists
Data Engineer
8. ● Poor Feature Management Leads to….
8
Long Development Time Poor Data Quality Difficulty in Production
9. The feature store is the central
place to store curated features
for machine learning pipelines.
F E A T U R E S T O R E
Feature Store
9
10. Feast is a Feature Store Catalog that
attempts to solve the key data
challenges with production machine
learning
FEAST
11. Feature stores are a critical piece of ML infra
‘17 Uber Michelangelo (Proprietary, original feature store)
‘18 Feast (Open source)
‘18 Logical Clocks (Open source, ML platform)
‘19 Airbnb’s Zipline (Closed source)
‘19 Spotify’s Feature Store on Kubeflow (Closed source)
‘20 Pinterest (Closed source)
‘20 Twitter Feature Store (Closed source, library based)
‘20 Tecton Feature Store (Closed source)
12. What is a Feature Catalog?
● Feature catalog can be thought of as “Master Data” which is used for building and serving Machine learning models
● It stores different features which can be used across different teams for building ML models
● It is not just a feature repository, but also includes two serving mechanisms for:
○ Batch access
○ Real time access
● Feature Update: Feature values will get updated over time
○ Some will be updated in real time. E.g., average traffic in an area
○ Some will be updated not very frequently. E.g., Credit rating of customer
○ Features need to be synced between repositories used for batch access and real time access.
13. What does Feast provide?
Registry: A common catalog with which to explore, develop, collaborate on, and publish new feature definitions within
and across teams.
Ingestion: A means for continually ingesting batch and streaming data and storing consistent copies in both an offline
and online store
Serving: A feature-retrieval interface which provides a temporally consistent view of features for both training and online
serving.
Monitoring: Tools that allow operational teams to monitor and act on the quality and accuracy of data reaching models.
14. Feature Repo feast apply
Redis Serving API
Ingestion
API
Offline Store
(BQ/S3/GCS/Other)
Kafka Spark on K8s
Spark on K8s
Configures infrastructure based on feature definitions
and “provider”
Feast on K8s
Exists
Planning phase
TBD what the scope of apply would be for an K8s provider. It may be that it only spins up jobs and updates stores.
GCS/S3
registry
18. KFServing with Feast
Feast transformer as a new type of transformer for preprocess
○ Has a custom container image with generic implementation to interact with Feast online serving
○ Properties: entity IDs, feature refs, project, Feast serving URL…
○ Specify IDs in inference service yaml
■ Entity ids è FeatureStore.get_online_features(entity_rows…)
■ Feature refs è FeatureStore.get_online_features(feature_refs…)
○ The initial request will be augmented with features from Feast online store and sent to predictor as the final input
○ Postprocess is a pass-through, not implemented in this transformer
Preprocess Predict Postprocess
Explain
Python
dict
Python
dict
Transformer Transformer
Predictor
Explainer
Feast Online Serving
Model Serving
Request (predict
or explain)
Online Store
(Redis)
Registry
Feast
Model Serving
Response
(predict or
explain)
Python, gRPC
19. KFServing with Feast – Phased Approach
Phase 1: Provide a sample Feast transformer
○ Illustrate how online features in Feast feature stores can be retrieved and used for model serving
○ As a sample in KFServing docs folder
○ Use the driver ranking data and model from Feast tutorial, https://github.com/feast-dev/feast-
driver-ranking-tutorial
○ Use a custom container image
○ Interact with Feast online serving via python API
Phase 2: Provide a generic Feast transformer
○ Support a variety of Feast feature stores in preprocessing and model serving
○ As a general transformer in KFServing python folder
○ Include test, instructions, and examples
○ Provide a common Feast base image
○ Interact with Feast online serving via gRPC API
21. Better precision
21
Data Asset 1
Model 1
(poor quality)
Data Asset 2
Model 2
(poor quality)
Data Asset 3
Model 3
(poor quality)
Difficult to identify the features
the lead to poor quality models
Feature
22. Better precision
22
Feature 1
Poor Quality Feature
Store
Model 2
(poor quality)
Model 3
(poor quality)
Model 4
(Good quality)
Feature 2
Moderate Quality
Model 1
(poor quality)
• Feature 1 – Used in 3
models all have poor quality
• Feature 2 – Used in 2
models which have good +
poor quality
• Feature 3 – Used in 1 model
with good quality
Easy to identify feature quality