SlideShare a Scribd company logo
1 of 17
Download to read offline
© Copyright 2018 Pivotal Software, Inc. All rights Reserved. Version 1.0
Ambarish Joshi, Senior Data Scientist at Pivotal
June 21, 2018
Using Data Science to Build an End-
to-End Recommendation System
True Digital
Transformation requires
modern software
informed by data science-
driven insights.
Context
End to end Recommendation System, from data to insights
Power utility company
seeking to build and end to
end recommendation
systems for ancillary products
which will integrate with a
mobile app and call center
systems
●  Machine learning techniques
and rich data to build models
to recommend products
●  Microservices based
architecture to integrate data
science results into mobile
app and call center systems
●  Agile development practices
to build high quality software
✓  End to end product
recommendation solution
✓  Model results exposed via API
✓  Enablement of Data Science
team
Customer Solution Outcome
Technology and Data Overview
●  Electric charges
●  Account
●  Demographic data (Acxiom)
●  Product eligibility
●  Product participation
●  6.5+ Million Customers
●  150+ Million rows
Data Sources
Tools
Platform
Agile Data Science
Pair Programming
Retros
Test Driven Development
Continuous Integration /
API First
Tracker
Standups
Agile Data Science
Discovery
Phase
✓  Data exploration for
understanding context of the
data and its business
implications
✓  Data cleansing, transformation
and feature engineering
✓  Training, validation and
evaluation of ML algorithms
✓  Multiple iteration of above steps
to get the desired model
performance
Operationalization (O16n)
Phase
✓  Test driven development of data
cleansing and feature
engineering scripts
✓  Setup automatic data pipelines
to clean, cleanse and score new
data
✓  Setup monitoring code to check
incoming data to identify
remodeling efforts
✓  Build APIs to consume model
output
End to End
Data exploration ,
feature generation
and ad-hoc ML
modeling
Use test driven
development
(TDD) to create
production quality
pyspark scripts
Build an automated
scoring workflow
using pyspark
scripts to generate
recommendations
TDD
Recommendation
microservice on
Pivotal Cloud Foundry
to server customer
recommendations
Discovery Phase : Data Exploration
Worked with subject matter experts (SMEs) to
understand how the data is generated, how the
data is used and business implications of data
Takeaways
●  Context and business impact of data gained here is very valuable to eventual
success of the machine learning model
●  There might be resistance from stakeholders for such activity (“not real work”)
●  Mitigate this resistance by sharing the data exploration insights and their
business implications
Discovery Phase : Feature Engineering
●  Our goal was to predict the propensity of a customer to buy a
particular ancillary product
●  We only had information when a customer bought the product
●  We did not have any solicitation history
●  We took all the buy events and calculated features for that event
with a backward looking window for our +ve examples
●  We sampled -ve events randomly and calculated features using the
same backward looking window Time
Buy Event
Window for features
Takeaways
●  Setting up data to run machine learning algorithm is more of an art than science
●  Balance +ve and -ve examples especially for rare events
●  Be aware of biases that may affect data, these biases have modeling implications
Discovery Phase : ML modeling iterations
Takeaways
●  Getting feedback from SMEs on the model results is very important
●  Sharing impactful features a great way to get feedback and build
SME trust in ML models
●  Figure alongside show the ML model iteration process
●  We tried many algorithm with various hyper parameters
●  Elastic net models were the most viable models and were
chosen to deploy during operationalization phase
O16n : Production scripts using TDD
After the discovery phase, we used TDD to write production scripts
for data cleansing, feature generation and model scoring
Why Paring and TDD?
Pair Programming
Test Driven Development
“Time spent writing a test beforehand is rarely wasted. Code written to pass
a test takes much less time to debug.” – client 1
“TDD gives me the confidence that I won’t commit code that breaks existing
functionality, no matter what I change” – client 2
“Pairing instills critical thinking, builds confidence, distributes knowledge, and
gets work done. Most methods of work only do one of those things.” – client 1
“Pairing was an educational experience for me, as well as a real-time validator.
If my pair catches a problem with my code, I’ll know about it in real time.”
– client 2
End to End
Data exploration ,
feature generation
and ad-hoc ML
modeling
Use test driven
development
(TDD) to create
production quality
pyspark scripts
Build an automated
scoring workflow
using pyspark
scripts to generate
recommendations
TDD
Recommendation
microservice on
Pivotal Cloud Foundry
to server customer
recommendations
Summary of Enablement
●  Ad-hoc model building in SAS
enterprise miner
●  Minimal data science rigor
●  Manual data upload to SAS
environment for modeling
●  Model results shared using
Excel
●  Results used only for
forecasting
Before
✓  Data science on modern open
source tools
✓  Data science rigor
✓  Automated workflow for data
cleansing, feature generation
and scoring
✓  Robust logging and validation
of data and model results
✓  Recommendation microservice
up in production to be
consumed by app developers
After
Transforming How The World Builds Software
© Copyright 2018 Pivotal Software, Inc. All rights Reserved.
There are no shortcuts
THINNEST IMPACTFUL SLICE
https://hackernoon.com/the-ai-hierarchy-of-needs

More Related Content

What's hot

BI and Data Analytics
BI and Data Analytics BI and Data Analytics
BI and Data Analytics Incorta
 
IBM's Business Analytics Portfolio for Training Purposes
IBM's Business Analytics Portfolio for Training PurposesIBM's Business Analytics Portfolio for Training Purposes
IBM's Business Analytics Portfolio for Training PurposesNatalija Pavic
 
#MITXData 2014 - Leveraging Self-Service Business Intelligence to Drive Marke...
#MITXData 2014 - Leveraging Self-Service Business Intelligence to Drive Marke...#MITXData 2014 - Leveraging Self-Service Business Intelligence to Drive Marke...
#MITXData 2014 - Leveraging Self-Service Business Intelligence to Drive Marke...MITX
 
"Building Data Foundations and Analytics Tools Across The Product" by Crystal...
"Building Data Foundations and Analytics Tools Across The Product" by Crystal..."Building Data Foundations and Analytics Tools Across The Product" by Crystal...
"Building Data Foundations and Analytics Tools Across The Product" by Crystal...Tech in Asia ID
 
Pattern driven Enterprise Architecture
Pattern driven Enterprise ArchitecturePattern driven Enterprise Architecture
Pattern driven Enterprise ArchitectureWSO2
 
Agile, qa and data projects geek night 2020
Agile, qa and data projects   geek night 2020Agile, qa and data projects   geek night 2020
Agile, qa and data projects geek night 2020Balvinder Hira
 
Intel boubker el mouttahid
Intel boubker el mouttahidIntel boubker el mouttahid
Intel boubker el mouttahidBigDataExpo
 
Micro strategy 9-vs-microsoft
Micro strategy 9-vs-microsoftMicro strategy 9-vs-microsoft
Micro strategy 9-vs-microsoftBiBoard.Org
 
What are actionable insights? (Introduction to Operational Analytics Software)
What are actionable insights? (Introduction to Operational Analytics Software)What are actionable insights? (Introduction to Operational Analytics Software)
What are actionable insights? (Introduction to Operational Analytics Software)Newton Day Uploads
 
Gartner Magic Quadrant for Operational Database Management Systems
Gartner Magic Quadrant for Operational Database Management SystemsGartner Magic Quadrant for Operational Database Management Systems
Gartner Magic Quadrant for Operational Database Management SystemsRobert Bira
 
Delivering Large Scale Real-time Graph Analytics with Dell Infrastructure and...
Delivering Large Scale Real-time Graph Analytics with Dell Infrastructure and...Delivering Large Scale Real-time Graph Analytics with Dell Infrastructure and...
Delivering Large Scale Real-time Graph Analytics with Dell Infrastructure and...TigerGraph
 
CI/DC in MLOps by J.B. Hunt
CI/DC in MLOps by J.B. HuntCI/DC in MLOps by J.B. Hunt
CI/DC in MLOps by J.B. HuntDatabricks
 
InterSystems Caché a leader in Gartner MQ on Operational DBMS
InterSystems Caché a leader in Gartner MQ on Operational DBMSInterSystems Caché a leader in Gartner MQ on Operational DBMS
InterSystems Caché a leader in Gartner MQ on Operational DBMSRobert Bira
 
Gain better insights into your business processes with Nintex Insights
Gain better insights into your business processes with Nintex InsightsGain better insights into your business processes with Nintex Insights
Gain better insights into your business processes with Nintex InsightsJames Milne
 
apidays LIVE Singapore - Democratising data access with APIs by Tarush Aggarw...
apidays LIVE Singapore - Democratising data access with APIs by Tarush Aggarw...apidays LIVE Singapore - Democratising data access with APIs by Tarush Aggarw...
apidays LIVE Singapore - Democratising data access with APIs by Tarush Aggarw...apidays
 
Business Discovery Ppt
Business Discovery PptBusiness Discovery Ppt
Business Discovery PptTrevor Tucker
 
Keynote: Looping through data, insight, and action
Keynote: Looping through data, insight, and actionKeynote: Looping through data, insight, and action
Keynote: Looping through data, insight, and actionElasticsearch
 
Sterling IT
Sterling ITSterling IT
Sterling ITkbass101
 

What's hot (20)

BI and Data Analytics
BI and Data Analytics BI and Data Analytics
BI and Data Analytics
 
IBM's Business Analytics Portfolio for Training Purposes
IBM's Business Analytics Portfolio for Training PurposesIBM's Business Analytics Portfolio for Training Purposes
IBM's Business Analytics Portfolio for Training Purposes
 
#MITXData 2014 - Leveraging Self-Service Business Intelligence to Drive Marke...
#MITXData 2014 - Leveraging Self-Service Business Intelligence to Drive Marke...#MITXData 2014 - Leveraging Self-Service Business Intelligence to Drive Marke...
#MITXData 2014 - Leveraging Self-Service Business Intelligence to Drive Marke...
 
"Building Data Foundations and Analytics Tools Across The Product" by Crystal...
"Building Data Foundations and Analytics Tools Across The Product" by Crystal..."Building Data Foundations and Analytics Tools Across The Product" by Crystal...
"Building Data Foundations and Analytics Tools Across The Product" by Crystal...
 
Pattern driven Enterprise Architecture
Pattern driven Enterprise ArchitecturePattern driven Enterprise Architecture
Pattern driven Enterprise Architecture
 
Agile, qa and data projects geek night 2020
Agile, qa and data projects   geek night 2020Agile, qa and data projects   geek night 2020
Agile, qa and data projects geek night 2020
 
Intel boubker el mouttahid
Intel boubker el mouttahidIntel boubker el mouttahid
Intel boubker el mouttahid
 
Micro strategy 9-vs-microsoft
Micro strategy 9-vs-microsoftMicro strategy 9-vs-microsoft
Micro strategy 9-vs-microsoft
 
EffectiveSoft QA Services
EffectiveSoft QA ServicesEffectiveSoft QA Services
EffectiveSoft QA Services
 
What are actionable insights? (Introduction to Operational Analytics Software)
What are actionable insights? (Introduction to Operational Analytics Software)What are actionable insights? (Introduction to Operational Analytics Software)
What are actionable insights? (Introduction to Operational Analytics Software)
 
Gartner Magic Quadrant for Operational Database Management Systems
Gartner Magic Quadrant for Operational Database Management SystemsGartner Magic Quadrant for Operational Database Management Systems
Gartner Magic Quadrant for Operational Database Management Systems
 
Delivering Large Scale Real-time Graph Analytics with Dell Infrastructure and...
Delivering Large Scale Real-time Graph Analytics with Dell Infrastructure and...Delivering Large Scale Real-time Graph Analytics with Dell Infrastructure and...
Delivering Large Scale Real-time Graph Analytics with Dell Infrastructure and...
 
CI/DC in MLOps by J.B. Hunt
CI/DC in MLOps by J.B. HuntCI/DC in MLOps by J.B. Hunt
CI/DC in MLOps by J.B. Hunt
 
InterSystems Caché a leader in Gartner MQ on Operational DBMS
InterSystems Caché a leader in Gartner MQ on Operational DBMSInterSystems Caché a leader in Gartner MQ on Operational DBMS
InterSystems Caché a leader in Gartner MQ on Operational DBMS
 
Gain better insights into your business processes with Nintex Insights
Gain better insights into your business processes with Nintex InsightsGain better insights into your business processes with Nintex Insights
Gain better insights into your business processes with Nintex Insights
 
apidays LIVE Singapore - Democratising data access with APIs by Tarush Aggarw...
apidays LIVE Singapore - Democratising data access with APIs by Tarush Aggarw...apidays LIVE Singapore - Democratising data access with APIs by Tarush Aggarw...
apidays LIVE Singapore - Democratising data access with APIs by Tarush Aggarw...
 
Business Discovery Ppt
Business Discovery PptBusiness Discovery Ppt
Business Discovery Ppt
 
QlikView & Big Data
QlikView & Big DataQlikView & Big Data
QlikView & Big Data
 
Keynote: Looping through data, insight, and action
Keynote: Looping through data, insight, and actionKeynote: Looping through data, insight, and action
Keynote: Looping through data, insight, and action
 
Sterling IT
Sterling ITSterling IT
Sterling IT
 

Similar to Using Data Science to Build an End-to-End Recommendation System

Guide to end end machine learning projects
Guide to end end machine learning projectsGuide to end end machine learning projects
Guide to end end machine learning projectsSkyl.ai
 
Data Analytics in Digital Transformation
Data Analytics in Digital TransformationData Analytics in Digital Transformation
Data Analytics in Digital TransformationMukund Babbar
 
Accelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWSAccelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWSSri Ambati
 
Nadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell
Nadine Schöne, Dataiku. The Complete Data Value Chain in a NutshellNadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell
Nadine Schöne, Dataiku. The Complete Data Value Chain in a NutshellIT Arena
 
Getting Started with Product Analytics - A 101 Implementation Guide for Begin...
Getting Started with Product Analytics - A 101 Implementation Guide for Begin...Getting Started with Product Analytics - A 101 Implementation Guide for Begin...
Getting Started with Product Analytics - A 101 Implementation Guide for Begin...Vishrut Shukla
 
Mobile Analytics - The intersection of Product and Marketing
Mobile Analytics - The intersection of Product and MarketingMobile Analytics - The intersection of Product and Marketing
Mobile Analytics - The intersection of Product and MarketingChad
 
Requirement management presentation to a software team
Requirement management presentation to a software teamRequirement management presentation to a software team
Requirement management presentation to a software teamrchakra
 
MVP (Minimum Viable Product) Readiness | Boost Labs
MVP (Minimum Viable Product) Readiness | Boost LabsMVP (Minimum Viable Product) Readiness | Boost Labs
MVP (Minimum Viable Product) Readiness | Boost LabsBoost Labs
 
ATD-2018_kroth_agile_thinking
ATD-2018_kroth_agile_thinkingATD-2018_kroth_agile_thinking
ATD-2018_kroth_agile_thinkingNorbertKroth
 
How to analyze text data for AI and ML with Named Entity Recognition
How to analyze text data for AI and ML with Named Entity RecognitionHow to analyze text data for AI and ML with Named Entity Recognition
How to analyze text data for AI and ML with Named Entity RecognitionSkyl.ai
 
Driving Customer Loyalty with Azure Machine Learning
Driving Customer Loyalty with Azure Machine LearningDriving Customer Loyalty with Azure Machine Learning
Driving Customer Loyalty with Azure Machine LearningCCG
 
Pydata Chicago - work hard once
Pydata Chicago - work hard oncePydata Chicago - work hard once
Pydata Chicago - work hard onceJi Dong
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 
AI projects - Lifecyle & Best Practices
AI projects - Lifecyle & Best PracticesAI projects - Lifecyle & Best Practices
AI projects - Lifecyle & Best PracticesVincent de Stoecklin
 
Chethan Updated Resume
Chethan Updated ResumeChethan Updated Resume
Chethan Updated ResumeChethan H
 
Group 3 slide presentation
Group 3 slide presentationGroup 3 slide presentation
Group 3 slide presentationMichael Young
 

Similar to Using Data Science to Build an End-to-End Recommendation System (20)

Demystifying Data Science
Demystifying Data ScienceDemystifying Data Science
Demystifying Data Science
 
Guide to end end machine learning projects
Guide to end end machine learning projectsGuide to end end machine learning projects
Guide to end end machine learning projects
 
Data Analytics in Digital Transformation
Data Analytics in Digital TransformationData Analytics in Digital Transformation
Data Analytics in Digital Transformation
 
Accelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWSAccelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWS
 
1 introduction of OOAD
1 introduction of OOAD1 introduction of OOAD
1 introduction of OOAD
 
Nadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell
Nadine Schöne, Dataiku. The Complete Data Value Chain in a NutshellNadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell
Nadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell
 
Getting Started with Product Analytics - A 101 Implementation Guide for Begin...
Getting Started with Product Analytics - A 101 Implementation Guide for Begin...Getting Started with Product Analytics - A 101 Implementation Guide for Begin...
Getting Started with Product Analytics - A 101 Implementation Guide for Begin...
 
Mobile Analytics - The intersection of Product and Marketing
Mobile Analytics - The intersection of Product and MarketingMobile Analytics - The intersection of Product and Marketing
Mobile Analytics - The intersection of Product and Marketing
 
Requirement management presentation to a software team
Requirement management presentation to a software teamRequirement management presentation to a software team
Requirement management presentation to a software team
 
MVP (Minimum Viable Product) Readiness | Boost Labs
MVP (Minimum Viable Product) Readiness | Boost LabsMVP (Minimum Viable Product) Readiness | Boost Labs
MVP (Minimum Viable Product) Readiness | Boost Labs
 
ATD-2018_kroth_agile_thinking
ATD-2018_kroth_agile_thinkingATD-2018_kroth_agile_thinking
ATD-2018_kroth_agile_thinking
 
Technical Debt.pptx
Technical Debt.pptxTechnical Debt.pptx
Technical Debt.pptx
 
How to analyze text data for AI and ML with Named Entity Recognition
How to analyze text data for AI and ML with Named Entity RecognitionHow to analyze text data for AI and ML with Named Entity Recognition
How to analyze text data for AI and ML with Named Entity Recognition
 
Get your data analytics strategy right!
Get your data analytics strategy right!Get your data analytics strategy right!
Get your data analytics strategy right!
 
Driving Customer Loyalty with Azure Machine Learning
Driving Customer Loyalty with Azure Machine LearningDriving Customer Loyalty with Azure Machine Learning
Driving Customer Loyalty with Azure Machine Learning
 
Pydata Chicago - work hard once
Pydata Chicago - work hard oncePydata Chicago - work hard once
Pydata Chicago - work hard once
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
AI projects - Lifecyle & Best Practices
AI projects - Lifecyle & Best PracticesAI projects - Lifecyle & Best Practices
AI projects - Lifecyle & Best Practices
 
Chethan Updated Resume
Chethan Updated ResumeChethan Updated Resume
Chethan Updated Resume
 
Group 3 slide presentation
Group 3 slide presentationGroup 3 slide presentation
Group 3 slide presentation
 

More from VMware Tanzu

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItVMware Tanzu
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023VMware Tanzu
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleVMware Tanzu
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023VMware Tanzu
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductVMware Tanzu
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready AppsVMware Tanzu
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And BeyondVMware Tanzu
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023VMware Tanzu
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptxVMware Tanzu
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchVMware Tanzu
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishVMware Tanzu
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVMware Tanzu
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - FrenchVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023VMware Tanzu
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootVMware Tanzu
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerVMware Tanzu
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeVMware Tanzu
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsVMware Tanzu
 

More from VMware Tanzu (20)

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About It
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at Scale
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a Product
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And Beyond
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptx
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - French
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - English
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - English
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - French
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software Engineer
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs Practice
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
 

Recently uploaded

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Recently uploaded (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Using Data Science to Build an End-to-End Recommendation System

  • 1. © Copyright 2018 Pivotal Software, Inc. All rights Reserved. Version 1.0 Ambarish Joshi, Senior Data Scientist at Pivotal June 21, 2018 Using Data Science to Build an End- to-End Recommendation System
  • 2.
  • 3. True Digital Transformation requires modern software informed by data science- driven insights.
  • 4. Context End to end Recommendation System, from data to insights Power utility company seeking to build and end to end recommendation systems for ancillary products which will integrate with a mobile app and call center systems ●  Machine learning techniques and rich data to build models to recommend products ●  Microservices based architecture to integrate data science results into mobile app and call center systems ●  Agile development practices to build high quality software ✓  End to end product recommendation solution ✓  Model results exposed via API ✓  Enablement of Data Science team Customer Solution Outcome
  • 5. Technology and Data Overview ●  Electric charges ●  Account ●  Demographic data (Acxiom) ●  Product eligibility ●  Product participation ●  6.5+ Million Customers ●  150+ Million rows Data Sources Tools Platform
  • 6. Agile Data Science Pair Programming Retros Test Driven Development Continuous Integration / API First Tracker Standups
  • 7. Agile Data Science Discovery Phase ✓  Data exploration for understanding context of the data and its business implications ✓  Data cleansing, transformation and feature engineering ✓  Training, validation and evaluation of ML algorithms ✓  Multiple iteration of above steps to get the desired model performance Operationalization (O16n) Phase ✓  Test driven development of data cleansing and feature engineering scripts ✓  Setup automatic data pipelines to clean, cleanse and score new data ✓  Setup monitoring code to check incoming data to identify remodeling efforts ✓  Build APIs to consume model output
  • 8. End to End Data exploration , feature generation and ad-hoc ML modeling Use test driven development (TDD) to create production quality pyspark scripts Build an automated scoring workflow using pyspark scripts to generate recommendations TDD Recommendation microservice on Pivotal Cloud Foundry to server customer recommendations
  • 9. Discovery Phase : Data Exploration Worked with subject matter experts (SMEs) to understand how the data is generated, how the data is used and business implications of data Takeaways ●  Context and business impact of data gained here is very valuable to eventual success of the machine learning model ●  There might be resistance from stakeholders for such activity (“not real work”) ●  Mitigate this resistance by sharing the data exploration insights and their business implications
  • 10. Discovery Phase : Feature Engineering ●  Our goal was to predict the propensity of a customer to buy a particular ancillary product ●  We only had information when a customer bought the product ●  We did not have any solicitation history ●  We took all the buy events and calculated features for that event with a backward looking window for our +ve examples ●  We sampled -ve events randomly and calculated features using the same backward looking window Time Buy Event Window for features Takeaways ●  Setting up data to run machine learning algorithm is more of an art than science ●  Balance +ve and -ve examples especially for rare events ●  Be aware of biases that may affect data, these biases have modeling implications
  • 11. Discovery Phase : ML modeling iterations Takeaways ●  Getting feedback from SMEs on the model results is very important ●  Sharing impactful features a great way to get feedback and build SME trust in ML models ●  Figure alongside show the ML model iteration process ●  We tried many algorithm with various hyper parameters ●  Elastic net models were the most viable models and were chosen to deploy during operationalization phase
  • 12. O16n : Production scripts using TDD After the discovery phase, we used TDD to write production scripts for data cleansing, feature generation and model scoring
  • 13. Why Paring and TDD? Pair Programming Test Driven Development “Time spent writing a test beforehand is rarely wasted. Code written to pass a test takes much less time to debug.” – client 1 “TDD gives me the confidence that I won’t commit code that breaks existing functionality, no matter what I change” – client 2 “Pairing instills critical thinking, builds confidence, distributes knowledge, and gets work done. Most methods of work only do one of those things.” – client 1 “Pairing was an educational experience for me, as well as a real-time validator. If my pair catches a problem with my code, I’ll know about it in real time.” – client 2
  • 14. End to End Data exploration , feature generation and ad-hoc ML modeling Use test driven development (TDD) to create production quality pyspark scripts Build an automated scoring workflow using pyspark scripts to generate recommendations TDD Recommendation microservice on Pivotal Cloud Foundry to server customer recommendations
  • 15. Summary of Enablement ●  Ad-hoc model building in SAS enterprise miner ●  Minimal data science rigor ●  Manual data upload to SAS environment for modeling ●  Model results shared using Excel ●  Results used only for forecasting Before ✓  Data science on modern open source tools ✓  Data science rigor ✓  Automated workflow for data cleansing, feature generation and scoring ✓  Robust logging and validation of data and model results ✓  Recommendation microservice up in production to be consumed by app developers After
  • 16. Transforming How The World Builds Software © Copyright 2018 Pivotal Software, Inc. All rights Reserved.
  • 17. There are no shortcuts THINNEST IMPACTFUL SLICE https://hackernoon.com/the-ai-hierarchy-of-needs