Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Feature store Overview St. Louis Big Data IDEA Meetup aug 2020

Slides from the August 2020 St. Louis Big Data IDEA Meetup discussing feature stores Adam Doyle presenting.

  • Be the first to comment

  • Be the first to like this

Feature store Overview St. Louis Big Data IDEA Meetup aug 2020

  1. 1. Confidential and Proprietary to Daugherty Business Solutions Feature Store Overview Adam Doyle St. Louis Big Data IDEA August 2020
  2. 2. Confidential and Proprietary to Daugherty Business Solutions The Data Science Process
  3. 3. Confidential and Proprietary to Daugherty Business Solutions “A feature is an individual measurable property or characteristic of a phenomenon being observed… Feature data is used both as input to models during training and when models are served in production.” Key takeaways • Features are not data • Features enumerate information • Not all features are equal Features https://docs.feast.dev/user-guide/features
  4. 4. Confidential and Proprietary to Daugherty Business Solutions Feature Engineering is the process of extracting features from raw data. Feature Engineering Techniques • Imputation • Handling Outliers • Binning • Numerical Transform • One-Hot Encoding • Grouping • Extraction • Scaling Feature Engineering
  5. 5. Confidential and Proprietary to Daugherty Business Solutions • Feature Reuse Between Models • Consistent Feature Definitions • Latency / Recency • Environmental Variation • Unstable Dependencies • Governance • Versioning Feature Challenges
  6. 6. Confidential and Proprietary to Daugherty Business Solutions Feature Store API Metadata / Model / Predictions Offline Data Store Online Data Store Batch Engine Stream Engine Batch Prediction Stream Prediction
  7. 7. Confidential and Proprietary to Daugherty Business Solutions • Retrieve Feature Metadata • Retrieve Feature Values • Remove Features • Store Features • Stream Store Features • Stream Retrieve Features • Feature Versioning • Model Versioning • Record Predictions Feature Store Use Cases
  8. 8. Confidential and Proprietary to Daugherty Business Solutions • Data engineers interact with a feature store by creating data pipeline definitions. • Data pipeline definitions combine – Data Sources – Business definitions – Transformation rule – Streaming/Batch definitions – Scheduling • Data pipelines are executed by the feature store engines and stored in online and offline data stores. Data Pipeline
  9. 9. Confidential and Proprietary to Daugherty Business Solutions • Data scientists interact with the feature store through the Feature Registry. • They can search for and browse feature definitions. • They can register data science models as a class of data pipeline. Feature Registry
  10. 10. Confidential and Proprietary to Daugherty Business Solutions • Feature stores can assist with versioning and monitoring data science applications. • Predictions are recorded in the feature store API including source data, model used, version of that model, and the rendered prediction. • Predictions can be compared with reality to determine the accuracy of the models. • Models and versions are tracked and can be used to determine the lift provided by a particular instance of a model. Versioning and Monitoring
  11. 11. Confidential and Proprietary to Daugherty Business Solutions • Open Source – GoJEK/Google FEAST • Product Offerings – Logical Clocks Hopsworks – Scribble Enrich • Presentations Only – Uber Michaelangelo – Airbnb Zipline – Survey Monkey ML Feature Store – Netflix MetaFlow Feature Store Implementations
  12. 12. Confidential and Proprietary to Daugherty Business Solutions • http://featurestore.org/ • https://www.scribbledata.io/resources-feature-store-guide • https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf • https://towardsdatascience.com/feature-stores-components-of-a-data-science-factory-f0f1f73d39b8 • https://towardsdatascience.com/what-are-feature-stores-and-why-are-they-critical-for-scaling-data-science- 3f9156f7ab4 • https://www.logicalclocks.com/hopsworks-featurestore • https://eng.uber.com/michelangelo-machine-learning-platform/ • https://technology.condenast.com/story/accelerating-machine-learning-with-the-feature-store-service • https://cloud.google.com/blog/products/ai-machine-learning/introducing-feast-an-open-source-feature-store-for- machine-learning • https://databricks.com/session/zipline-airbnbs-machine-learning-data-management-platform • https://engineering.linkedin.com/blog/2017/06/building-the-activity-graph--part-i • https://databricks.com/session/fact-store-scale-for-netflix-recommendations • https://medium.com/@changshe/rethinking-feature-stores-74963c2596f0 Links

    Be the first to comment

Slides from the August 2020 St. Louis Big Data IDEA Meetup discussing feature stores Adam Doyle presenting.

Views

Total views

113

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

12

Shares

0

Comments

0

Likes

0

×