SlideShare a Scribd company logo
1 of 12
Download to read offline
Confidential and Proprietary to Daugherty Business Solutions
Feature Store Overview
Adam Doyle
St. Louis Big Data IDEA
August 2020
Confidential and Proprietary to Daugherty Business Solutions
The Data Science Process
Confidential and Proprietary to Daugherty Business Solutions
“A feature is an individual measurable property or characteristic of a phenomenon being observed… Feature data is
used both as input to models during training and when models are served in production.”
Key takeaways
• Features are not data
• Features enumerate information
• Not all features are equal
Features
https://docs.feast.dev/user-guide/features
Confidential and Proprietary to Daugherty Business Solutions
Feature Engineering is the process of extracting features from raw data.
Feature Engineering Techniques
• Imputation
• Handling Outliers
• Binning
• Numerical Transform
• One-Hot Encoding
• Grouping
• Extraction
• Scaling
Feature Engineering
Confidential and Proprietary to Daugherty Business Solutions
• Feature Reuse Between Models
• Consistent Feature Definitions
• Latency / Recency
• Environmental Variation
• Unstable Dependencies
• Governance
• Versioning
Feature Challenges
Confidential and Proprietary to Daugherty Business Solutions
Feature Store
API
Metadata /
Model /
Predictions
Offline
Data Store
Online
Data Store
Batch Engine
Stream Engine
Batch Prediction
Stream Prediction
Confidential and Proprietary to Daugherty Business Solutions
• Retrieve Feature Metadata
• Retrieve Feature Values
• Remove Features
• Store Features
• Stream Store Features
• Stream Retrieve Features
• Feature Versioning
• Model Versioning
• Record Predictions
Feature Store Use Cases
Confidential and Proprietary to Daugherty Business Solutions
• Data engineers interact with a feature store by creating
data pipeline definitions.
• Data pipeline definitions combine
– Data Sources
– Business definitions
– Transformation rule
– Streaming/Batch definitions
– Scheduling
• Data pipelines are executed by the feature store engines
and stored in online and offline data stores.
Data Pipeline
Confidential and Proprietary to Daugherty Business Solutions
• Data scientists interact with the feature store through the Feature Registry.
• They can search for and browse feature definitions.
• They can register data science models as a class of data pipeline.
Feature Registry
Confidential and Proprietary to Daugherty Business Solutions
• Feature stores can assist with versioning and monitoring data
science applications.
• Predictions are recorded in the feature store API including
source data, model used, version of that model, and the
rendered prediction.
• Predictions can be compared with reality to determine the
accuracy of the models.
• Models and versions are tracked and can be used to determine
the lift provided by a particular instance of a model.
Versioning and Monitoring
Confidential and Proprietary to Daugherty Business Solutions
• Open Source
– GoJEK/Google FEAST
• Product Offerings
– Logical Clocks Hopsworks
– Scribble Enrich
• Presentations Only
– Uber Michaelangelo
– Airbnb Zipline
– Survey Monkey ML Feature Store
– Netflix MetaFlow
Feature Store Implementations
Confidential and Proprietary to Daugherty Business Solutions
• http://featurestore.org/
• https://www.scribbledata.io/resources-feature-store-guide
• https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
• https://towardsdatascience.com/feature-stores-components-of-a-data-science-factory-f0f1f73d39b8
• https://towardsdatascience.com/what-are-feature-stores-and-why-are-they-critical-for-scaling-data-science-
3f9156f7ab4
• https://www.logicalclocks.com/hopsworks-featurestore
• https://eng.uber.com/michelangelo-machine-learning-platform/
• https://technology.condenast.com/story/accelerating-machine-learning-with-the-feature-store-service
• https://cloud.google.com/blog/products/ai-machine-learning/introducing-feast-an-open-source-feature-store-for-
machine-learning
• https://databricks.com/session/zipline-airbnbs-machine-learning-data-management-platform
• https://engineering.linkedin.com/blog/2017/06/building-the-activity-graph--part-i
• https://databricks.com/session/fact-store-scale-for-netflix-recommendations
• https://medium.com/@changshe/rethinking-feature-stores-74963c2596f0
Links

More Related Content

What's hot

201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine LearningMark Tabladillo
 
Accelerate Data Science Initiatives: Databricks & Privacera
Accelerate Data Science Initiatives: Databricks & PrivaceraAccelerate Data Science Initiatives: Databricks & Privacera
Accelerate Data Science Initiatives: Databricks & PrivaceraDatabricks
 
Using Redash for SQL Analytics on Databricks
Using Redash for SQL Analytics on DatabricksUsing Redash for SQL Analytics on Databricks
Using Redash for SQL Analytics on DatabricksDatabricks
 
Northwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to CloudNorthwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to CloudDatabricks
 
Part 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure SynapsePart 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure SynapseNilesh Gule
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAlex Tumanoff
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Databricks
 
Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory Sergio Zenatti Filho
 
Integration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data LakeIntegration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data LakeTom Kerkhove
 
Azure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsThomas Sykes
 
Azure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekAzure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekMark Kromer
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeTorsten Steinbach
 
Automating Data Quality Processes at Reckitt
Automating Data Quality Processes at ReckittAutomating Data Quality Processes at Reckitt
Automating Data Quality Processes at ReckittDatabricks
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Michael Rys
 
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...Databricks
 
Machine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta LakeMachine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta LakeDatabricks
 
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...Microsoft Tech Community
 

What's hot (20)

201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning
 
Accelerate Data Science Initiatives: Databricks & Privacera
Accelerate Data Science Initiatives: Databricks & PrivaceraAccelerate Data Science Initiatives: Databricks & Privacera
Accelerate Data Science Initiatives: Databricks & Privacera
 
Using Redash for SQL Analytics on Databricks
Using Redash for SQL Analytics on DatabricksUsing Redash for SQL Analytics on Databricks
Using Redash for SQL Analytics on Databricks
 
Northwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to CloudNorthwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to Cloud
 
Part 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure SynapsePart 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure Synapse
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene Polonichko
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory
 
Integration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data LakeIntegration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data Lake
 
Azure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data Flows
 
Azure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekAzure Data Factory for Azure Data Week
Azure Data Factory for Azure Data Week
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
 
Automating Data Quality Processes at Reckitt
Automating Data Quality Processes at ReckittAutomating Data Quality Processes at Reckitt
Automating Data Quality Processes at Reckitt
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
 
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
 
Machine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta LakeMachine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta Lake
 
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
 

Similar to Feature store Overview St. Louis Big Data IDEA Meetup aug 2020

Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
 
Deagital smart data proposal en
Deagital smart data proposal enDeagital smart data proposal en
Deagital smart data proposal enJose Torres
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in ProductionDataWorks Summit
 
SPSChicagoBurbs 2019 - What is CDM and CDS?
SPSChicagoBurbs 2019 - What is CDM and CDS?SPSChicagoBurbs 2019 - What is CDM and CDS?
SPSChicagoBurbs 2019 - What is CDM and CDS?Nicolas Georgeault
 
Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks feature...
Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks feature...Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks feature...
Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks feature...Data Con LA
 
Gobblin' Big Data With Ease @ QConSF 2014
Gobblin' Big Data With Ease @ QConSF 2014Gobblin' Big Data With Ease @ QConSF 2014
Gobblin' Big Data With Ease @ QConSF 2014Lin Qiao
 
IBM InfoSphere Data Architect 9.1 - Francis Arnaudiès
IBM InfoSphere Data Architect 9.1 - Francis ArnaudièsIBM InfoSphere Data Architect 9.1 - Francis Arnaudiès
IBM InfoSphere Data Architect 9.1 - Francis ArnaudièsIBMInfoSphereUGFR
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseDatabricks
 
Technically Speaking: How Self-Service Analytics Fosters Collaboration
Technically Speaking: How Self-Service Analytics Fosters CollaborationTechnically Speaking: How Self-Service Analytics Fosters Collaboration
Technically Speaking: How Self-Service Analytics Fosters CollaborationInside Analysis
 
Conceptual vs. Logical vs. Physical Data Modeling
Conceptual vs. Logical vs. Physical Data ModelingConceptual vs. Logical vs. Physical Data Modeling
Conceptual vs. Logical vs. Physical Data ModelingDATAVERSITY
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning ModelsTash Bickley
 
Mapping Manager Product Overview
Mapping Manager Product OverviewMapping Manager Product Overview
Mapping Manager Product OverviewRakesh Kumar
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...DataWorks Summit
 
Database Tools - ER Studio Facts and Features
Database Tools - ER Studio Facts and FeaturesDatabase Tools - ER Studio Facts and Features
Database Tools - ER Studio Facts and FeaturesMichael Findling
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It? Caserta
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 
Ca e rwin state of the union 09082010
Ca e rwin state of the union 09082010Ca e rwin state of the union 09082010
Ca e rwin state of the union 09082010ERwin Modeling
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Caserta
 

Similar to Feature store Overview St. Louis Big Data IDEA Meetup aug 2020 (20)

Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
Deagital smart data proposal en
Deagital smart data proposal enDeagital smart data proposal en
Deagital smart data proposal en
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
SPSChicagoBurbs 2019 - What is CDM and CDS?
SPSChicagoBurbs 2019 - What is CDM and CDS?SPSChicagoBurbs 2019 - What is CDM and CDS?
SPSChicagoBurbs 2019 - What is CDM and CDS?
 
Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks feature...
Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks feature...Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks feature...
Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks feature...
 
Gobblin' Big Data With Ease @ QConSF 2014
Gobblin' Big Data With Ease @ QConSF 2014Gobblin' Big Data With Ease @ QConSF 2014
Gobblin' Big Data With Ease @ QConSF 2014
 
IBM InfoSphere Data Architect 9.1 - Francis Arnaudiès
IBM InfoSphere Data Architect 9.1 - Francis ArnaudièsIBM InfoSphere Data Architect 9.1 - Francis Arnaudiès
IBM InfoSphere Data Architect 9.1 - Francis Arnaudiès
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 
Technically Speaking: How Self-Service Analytics Fosters Collaboration
Technically Speaking: How Self-Service Analytics Fosters CollaborationTechnically Speaking: How Self-Service Analytics Fosters Collaboration
Technically Speaking: How Self-Service Analytics Fosters Collaboration
 
Conceptual vs. Logical vs. Physical Data Modeling
Conceptual vs. Logical vs. Physical Data ModelingConceptual vs. Logical vs. Physical Data Modeling
Conceptual vs. Logical vs. Physical Data Modeling
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning Models
 
Mapping Manager Product Overview
Mapping Manager Product OverviewMapping Manager Product Overview
Mapping Manager Product Overview
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...
 
Database Tools - ER Studio Facts and Features
Database Tools - ER Studio Facts and FeaturesDatabase Tools - ER Studio Facts and Features
Database Tools - ER Studio Facts and Features
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It?
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to Production
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
Ca e rwin state of the union 09082010
Ca e rwin state of the union 09082010Ca e rwin state of the union 09082010
Ca e rwin state of the union 09082010
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 

More from Adam Doyle

Managed Cluster Services
Managed Cluster ServicesManaged Cluster Services
Managed Cluster ServicesAdam Doyle
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations PresentationAdam Doyle
 
May 2021 Spark Testing ... or how to farm reputation on StackOverflow
May 2021 Spark Testing ... or how to farm reputation on StackOverflowMay 2021 Spark Testing ... or how to farm reputation on StackOverflow
May 2021 Spark Testing ... or how to farm reputation on StackOverflowAdam Doyle
 
Automate your data flows with Apache NIFI
Automate your data flows with Apache NIFIAutomate your data flows with Apache NIFI
Automate your data flows with Apache NIFIAdam Doyle
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAAdam Doyle
 
Localized Hadoop Development
Localized Hadoop DevelopmentLocalized Hadoop Development
Localized Hadoop DevelopmentAdam Doyle
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleAdam Doyle
 
Operationalizing Data Science St. Louis Big Data IDEA
Operationalizing Data Science St. Louis Big Data IDEAOperationalizing Data Science St. Louis Big Data IDEA
Operationalizing Data Science St. Louis Big Data IDEAAdam Doyle
 
Retooling on the Modern Data and Analytics Tech Stack
Retooling on the Modern Data and Analytics Tech StackRetooling on the Modern Data and Analytics Tech Stack
Retooling on the Modern Data and Analytics Tech StackAdam Doyle
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020Adam Doyle
 
How stlrda does data
How stlrda does dataHow stlrda does data
How stlrda does dataAdam Doyle
 
Tailoring machine learning practices to support prescriptive analytics
Tailoring machine learning practices to support prescriptive analyticsTailoring machine learning practices to support prescriptive analytics
Tailoring machine learning practices to support prescriptive analyticsAdam Doyle
 
Synthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingSynthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingAdam Doyle
 
Big Data IDEA 101 2019
Big Data IDEA 101 2019Big Data IDEA 101 2019
Big Data IDEA 101 2019Adam Doyle
 
Data Engineering and the Data Science Lifecycle
Data Engineering and the Data Science LifecycleData Engineering and the Data Science Lifecycle
Data Engineering and the Data Science LifecycleAdam Doyle
 
Data engineering Stl Big Data IDEA user group
Data engineering   Stl Big Data IDEA user groupData engineering   Stl Big Data IDEA user group
Data engineering Stl Big Data IDEA user groupAdam Doyle
 
Cloudera - Docker on hadoop
Cloudera - Docker on hadoopCloudera - Docker on hadoop
Cloudera - Docker on hadoopAdam Doyle
 
Big Data Retrospective - STL Big Data IDEA Jan 2019
Big Data Retrospective - STL Big Data IDEA Jan 2019Big Data Retrospective - STL Big Data IDEA Jan 2019
Big Data Retrospective - STL Big Data IDEA Jan 2019Adam Doyle
 

More from Adam Doyle (20)

ML Ops.pptx
ML Ops.pptxML Ops.pptx
ML Ops.pptx
 
Managed Cluster Services
Managed Cluster ServicesManaged Cluster Services
Managed Cluster Services
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 
May 2021 Spark Testing ... or how to farm reputation on StackOverflow
May 2021 Spark Testing ... or how to farm reputation on StackOverflowMay 2021 Spark Testing ... or how to farm reputation on StackOverflow
May 2021 Spark Testing ... or how to farm reputation on StackOverflow
 
Automate your data flows with Apache NIFI
Automate your data flows with Apache NIFIAutomate your data flows with Apache NIFI
Automate your data flows with Apache NIFI
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
Localized Hadoop Development
Localized Hadoop DevelopmentLocalized Hadoop Development
Localized Hadoop Development
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at Scale
 
Operationalizing Data Science St. Louis Big Data IDEA
Operationalizing Data Science St. Louis Big Data IDEAOperationalizing Data Science St. Louis Big Data IDEA
Operationalizing Data Science St. Louis Big Data IDEA
 
Retooling on the Modern Data and Analytics Tech Stack
Retooling on the Modern Data and Analytics Tech StackRetooling on the Modern Data and Analytics Tech Stack
Retooling on the Modern Data and Analytics Tech Stack
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020
 
How stlrda does data
How stlrda does dataHow stlrda does data
How stlrda does data
 
Tailoring machine learning practices to support prescriptive analytics
Tailoring machine learning practices to support prescriptive analyticsTailoring machine learning practices to support prescriptive analytics
Tailoring machine learning practices to support prescriptive analytics
 
Synthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingSynthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-making
 
Big Data IDEA 101 2019
Big Data IDEA 101 2019Big Data IDEA 101 2019
Big Data IDEA 101 2019
 
Data Engineering and the Data Science Lifecycle
Data Engineering and the Data Science LifecycleData Engineering and the Data Science Lifecycle
Data Engineering and the Data Science Lifecycle
 
Data engineering Stl Big Data IDEA user group
Data engineering   Stl Big Data IDEA user groupData engineering   Stl Big Data IDEA user group
Data engineering Stl Big Data IDEA user group
 
Cloudera - Docker on hadoop
Cloudera - Docker on hadoopCloudera - Docker on hadoop
Cloudera - Docker on hadoop
 
Big Data Retrospective - STL Big Data IDEA Jan 2019
Big Data Retrospective - STL Big Data IDEA Jan 2019Big Data Retrospective - STL Big Data IDEA Jan 2019
Big Data Retrospective - STL Big Data IDEA Jan 2019
 

Recently uploaded

Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 

Feature store Overview St. Louis Big Data IDEA Meetup aug 2020

  • 1. Confidential and Proprietary to Daugherty Business Solutions Feature Store Overview Adam Doyle St. Louis Big Data IDEA August 2020
  • 2. Confidential and Proprietary to Daugherty Business Solutions The Data Science Process
  • 3. Confidential and Proprietary to Daugherty Business Solutions “A feature is an individual measurable property or characteristic of a phenomenon being observed… Feature data is used both as input to models during training and when models are served in production.” Key takeaways • Features are not data • Features enumerate information • Not all features are equal Features https://docs.feast.dev/user-guide/features
  • 4. Confidential and Proprietary to Daugherty Business Solutions Feature Engineering is the process of extracting features from raw data. Feature Engineering Techniques • Imputation • Handling Outliers • Binning • Numerical Transform • One-Hot Encoding • Grouping • Extraction • Scaling Feature Engineering
  • 5. Confidential and Proprietary to Daugherty Business Solutions • Feature Reuse Between Models • Consistent Feature Definitions • Latency / Recency • Environmental Variation • Unstable Dependencies • Governance • Versioning Feature Challenges
  • 6. Confidential and Proprietary to Daugherty Business Solutions Feature Store API Metadata / Model / Predictions Offline Data Store Online Data Store Batch Engine Stream Engine Batch Prediction Stream Prediction
  • 7. Confidential and Proprietary to Daugherty Business Solutions • Retrieve Feature Metadata • Retrieve Feature Values • Remove Features • Store Features • Stream Store Features • Stream Retrieve Features • Feature Versioning • Model Versioning • Record Predictions Feature Store Use Cases
  • 8. Confidential and Proprietary to Daugherty Business Solutions • Data engineers interact with a feature store by creating data pipeline definitions. • Data pipeline definitions combine – Data Sources – Business definitions – Transformation rule – Streaming/Batch definitions – Scheduling • Data pipelines are executed by the feature store engines and stored in online and offline data stores. Data Pipeline
  • 9. Confidential and Proprietary to Daugherty Business Solutions • Data scientists interact with the feature store through the Feature Registry. • They can search for and browse feature definitions. • They can register data science models as a class of data pipeline. Feature Registry
  • 10. Confidential and Proprietary to Daugherty Business Solutions • Feature stores can assist with versioning and monitoring data science applications. • Predictions are recorded in the feature store API including source data, model used, version of that model, and the rendered prediction. • Predictions can be compared with reality to determine the accuracy of the models. • Models and versions are tracked and can be used to determine the lift provided by a particular instance of a model. Versioning and Monitoring
  • 11. Confidential and Proprietary to Daugherty Business Solutions • Open Source – GoJEK/Google FEAST • Product Offerings – Logical Clocks Hopsworks – Scribble Enrich • Presentations Only – Uber Michaelangelo – Airbnb Zipline – Survey Monkey ML Feature Store – Netflix MetaFlow Feature Store Implementations
  • 12. Confidential and Proprietary to Daugherty Business Solutions • http://featurestore.org/ • https://www.scribbledata.io/resources-feature-store-guide • https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf • https://towardsdatascience.com/feature-stores-components-of-a-data-science-factory-f0f1f73d39b8 • https://towardsdatascience.com/what-are-feature-stores-and-why-are-they-critical-for-scaling-data-science- 3f9156f7ab4 • https://www.logicalclocks.com/hopsworks-featurestore • https://eng.uber.com/michelangelo-machine-learning-platform/ • https://technology.condenast.com/story/accelerating-machine-learning-with-the-feature-store-service • https://cloud.google.com/blog/products/ai-machine-learning/introducing-feast-an-open-source-feature-store-for- machine-learning • https://databricks.com/session/zipline-airbnbs-machine-learning-data-management-platform • https://engineering.linkedin.com/blog/2017/06/building-the-activity-graph--part-i • https://databricks.com/session/fact-store-scale-for-netflix-recommendations • https://medium.com/@changshe/rethinking-feature-stores-74963c2596f0 Links