SlideShare a Scribd company logo
1 of 20
• Please mute your phone and turn off your video. There are over eighty people who don’t want to see or
hear you chewing.
• If you have any suggestions for future topics that you would like this group to cover, please send them to
Scott Shaw using Webex’s chat feature.
• We will send out the presentation deck after the meeting. Look for an announcement in the Meetup link
for this meeting.
• If you have questions during the presentation, also send them to Scott Shaw using Webex’s chat feature.
We will get to as many questions as we can.
Before we begin…
Operationalizing Data Science
Adam Doyle
Daugherty Business Solutions
April 1, 2020
Thanks for coming out everyone!
April Fools!
Operationalizing Data Science
Adam Doyle
Daugherty Business Solutions
April 1, 2020
Begin with the end in mind.
Pause in the middle to make sure that you can get to where you are
going.
• What is the business
intention that you are
trying to achieve?
• Minimize Cost
• Maximize Return
• Minimize Risk
• Realize Opportunity
• Engage Stakeholders
• POC vs Production
ready and valued
product
Identify your thesis.
Goal vs. intention
SMART goal
Refine to question that can be
answered with data science
Data science – predict,
explain, evaluate
Decision science –
combination of data science
and data engineering
Acquire data.
Third-Party
Data
Internal API Streaming
General – Amount, Access,
Quality, Labeled?
Third Party
o Assess Data Quality (Value
Range, Adherence,
Representative)
o Data Format (Automatic vs
hand-generated, Similar data
from different partners are
vastly different)
o Governed (Use appropriate
– avoid reidentification, TTL,
Contractuals, Track access,
renewals)
Internal
API (Data size limits,
unreliability, costs)
Streaming (CDC, Device Data,
Standardized?)
Explore the data.
Data Exploration
Statistical
Relationships and Correlations
Profiling
Textual – Word, Stop Words,
Bigram, Trigram
Clustering
Check in with SME
Every block of stone has a statue inside it, and it is the
task of the sculptor to discover it.
Cleanse data.
Data profiling
Deduplication
Outliers
Filter
Imputation
Source Corrections
Data shaping
Sort
Project
Enrichment
Create the model and features.
Type of Models
(Supervised,
Unsupervised,
Reinforcement Learning,
Neural Networks)
Feature Engineering
(Transformations and
Aggregations)
Encode Indicator Variables
Binning/Bucketing
Sparse Classes
Interaction Features
Extract Elements (eg.
Time)
Normalization
Feature Selection
Testing your features
Testing your model
Check in with SME
Check in with Business
Does what you’ve created
address the concerns of
the business?
Batch vs. Real-time?
Batch Training vs
Real-time for
- Training
- Evaluation
Evaluate the model.
Accuracy
Precision
Recall
MSE
Alignment to
Business
Deploy the model.
Automation
Scaling
SLAs
Versioning
Data Pipelines
Ongoing Data Acquisition
Ongoing Data Cleaning
Ongoing Feature Encoding
Integration in application
Monitor the model.
Drift
Degrading the model
Predictions and their
effects
Optimize the model.
Feature Optimization
Retraining
Remodeling
Conclusion
What does it mean to be
done?
Explanation as a Result
Questions?
• https://www.dataengineeringpodcast.com/
• https://dataengweekly.com/
• https://www.logicalclocks.com/blog/feature-store-the-missing-data-
layer-in-ml-pipelines
• https://www.imperva.com/blog/deployment-isnt-the-final-step-
monitoring-machine-learning-models-in-production/
Links

More Related Content

What's hot

Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBernard Marr
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science TeamsEMC
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamGreg Goltsov
 
Big Data & the importance of Data Science
Big Data & the importance of Data ScienceBig Data & the importance of Data Science
Big Data & the importance of Data ScienceWim Van Leuven
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
Big Data - Insights & Challenges
Big Data - Insights & ChallengesBig Data - Insights & Challenges
Big Data - Insights & ChallengesRupen Momaya
 
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your life
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your lifeTop 10 ways BigInsights BigIntegrate and BigQuality will improve your life
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your lifeIBM Analytics
 
Big data issues and challenges
Big data issues and challengesBig data issues and challenges
Big data issues and challengesDilpreet kaur Virk
 
Big data course | big data training | big data classes
Big data course | big data training | big data classesBig data course | big data training | big data classes
Big data course | big data training | big data classesNaviWalker
 
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...Vasu S
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data ScienceAndrew Gardner
 
Big data privacy issues in public social media
Big data privacy issues in public social mediaBig data privacy issues in public social media
Big data privacy issues in public social mediaSupriya Radhakrishna
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and OpportunitiesKenny Huang Ph.D.
 
Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPeter Wang
 
Data Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of PeopleData Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of Peoplemark madsen
 

What's hot (20)

Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science Teams
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data Team
 
Big Data & the importance of Data Science
Big Data & the importance of Data ScienceBig Data & the importance of Data Science
Big Data & the importance of Data Science
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Challenges of Big Data Research
Challenges of Big Data ResearchChallenges of Big Data Research
Challenges of Big Data Research
 
Big Data - Insights & Challenges
Big Data - Insights & ChallengesBig Data - Insights & Challenges
Big Data - Insights & Challenges
 
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your life
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your lifeTop 10 ways BigInsights BigIntegrate and BigQuality will improve your life
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your life
 
Big data
Big dataBig data
Big data
 
Big data issues and challenges
Big data issues and challengesBig data issues and challenges
Big data issues and challenges
 
Big data course | big data training | big data classes
Big data course | big data training | big data classesBig data course | big data training | big data classes
Big data course | big data training | big data classes
 
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data Science
 
Motivation for big data
Motivation for big dataMotivation for big data
Motivation for big data
 
Big data privacy issues in public social media
Big data privacy issues in public social mediaBig data privacy issues in public social media
Big data privacy issues in public social media
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and Opportunities
 
Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data Analysis
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
Big data Introduction by Mohan
Big data Introduction by MohanBig data Introduction by Mohan
Big data Introduction by Mohan
 
Data Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of PeopleData Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of People
 

Similar to Operationalizing Data Science St. Louis Big Data IDEA

Data Science Highlights
Data Science Highlights Data Science Highlights
Data Science Highlights Joe Lamantia
 
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...ryanorban
 
Data, AI and Tokens: A Glimpse of What is to Come
Data, AI and Tokens: A Glimpse of What is to ComeData, AI and Tokens: A Glimpse of What is to Come
Data, AI and Tokens: A Glimpse of What is to ComeClaire Ingram Bogusz
 
Tips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseTips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseLisa Cohen
 
Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...
Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...
Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...Julia Grosman
 
Speaking That Gets You What You Want: How to Create and Deliver Powerful Pres...
Speaking That Gets You What You Want: How to Create and Deliver Powerful Pres...Speaking That Gets You What You Want: How to Create and Deliver Powerful Pres...
Speaking That Gets You What You Want: How to Create and Deliver Powerful Pres...The Veritas Group
 
Workshop_Presentation.pptx
Workshop_Presentation.pptxWorkshop_Presentation.pptx
Workshop_Presentation.pptxRUDRAPRASADSABAR
 
Enterprise Grade Data Labeling - Design Your Ground Truth to Scale in Produ...
Enterprise Grade Data Labeling - Design Your Ground Truth to Scale in Produ...Enterprise Grade Data Labeling - Design Your Ground Truth to Scale in Produ...
Enterprise Grade Data Labeling - Design Your Ground Truth to Scale in Produ...Jai Natarajan
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?Inside Analysis
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionInside Analysis
 
Tips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data ScientistTips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data ScientistLisa Cohen
 
Step by step guide.docx
Step by step guide.docxStep by step guide.docx
Step by step guide.docxeram_abbasi
 
IIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data ScienceIIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data Scienceds4good
 
GM AHSN, MAHSC & STFC Hartree Centre Cognitive Computing Event
GM AHSN, MAHSC & STFC Hartree Centre Cognitive Computing EventGM AHSN, MAHSC & STFC Hartree Centre Cognitive Computing Event
GM AHSN, MAHSC & STFC Hartree Centre Cognitive Computing EventIsabelle Sparrow
 
From Lab to Factory: Or how to turn data into value
From Lab to Factory: Or how to turn data into valueFrom Lab to Factory: Or how to turn data into value
From Lab to Factory: Or how to turn data into valuePeadar Coyle
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Maxdiff webinar_10_19_10
 Maxdiff webinar_10_19_10 Maxdiff webinar_10_19_10
Maxdiff webinar_10_19_10QuestionPro
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Domino Data Lab
 

Similar to Operationalizing Data Science St. Louis Big Data IDEA (20)

Data Science Highlights
Data Science Highlights Data Science Highlights
Data Science Highlights
 
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
 
Data, AI and Tokens: A Glimpse of What is to Come
Data, AI and Tokens: A Glimpse of What is to ComeData, AI and Tokens: A Glimpse of What is to Come
Data, AI and Tokens: A Glimpse of What is to Come
 
Tips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseTips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the Enterprise
 
Data and data scientists are not equal to money david hoyle
Data and data scientists are not equal to money   david hoyleData and data scientists are not equal to money   david hoyle
Data and data scientists are not equal to money david hoyle
 
Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...
Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...
Bob Selfridge - Identify, Collect, and Act Upon Customer Interactions; Rinse,...
 
Speaking That Gets You What You Want: How to Create and Deliver Powerful Pres...
Speaking That Gets You What You Want: How to Create and Deliver Powerful Pres...Speaking That Gets You What You Want: How to Create and Deliver Powerful Pres...
Speaking That Gets You What You Want: How to Create and Deliver Powerful Pres...
 
Workshop_Presentation.pptx
Workshop_Presentation.pptxWorkshop_Presentation.pptx
Workshop_Presentation.pptx
 
Enterprise Grade Data Labeling - Design Your Ground Truth to Scale in Produ...
Enterprise Grade Data Labeling - Design Your Ground Truth to Scale in Produ...Enterprise Grade Data Labeling - Design Your Ground Truth to Scale in Produ...
Enterprise Grade Data Labeling - Design Your Ground Truth to Scale in Produ...
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop Adoption
 
Tips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data ScientistTips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data Scientist
 
Step by step guide.docx
Step by step guide.docxStep by step guide.docx
Step by step guide.docx
 
IIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data ScienceIIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data Science
 
GM AHSN, MAHSC & STFC Hartree Centre Cognitive Computing Event
GM AHSN, MAHSC & STFC Hartree Centre Cognitive Computing EventGM AHSN, MAHSC & STFC Hartree Centre Cognitive Computing Event
GM AHSN, MAHSC & STFC Hartree Centre Cognitive Computing Event
 
From Lab to Factory: Or how to turn data into value
From Lab to Factory: Or how to turn data into valueFrom Lab to Factory: Or how to turn data into value
From Lab to Factory: Or how to turn data into value
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Maxdiff webinar_10_19_10
 Maxdiff webinar_10_19_10 Maxdiff webinar_10_19_10
Maxdiff webinar_10_19_10
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field
 
Introduction to BigData
Introduction to BigData Introduction to BigData
Introduction to BigData
 

More from Adam Doyle

Data Engineering Roles
Data Engineering RolesData Engineering Roles
Data Engineering RolesAdam Doyle
 
Managed Cluster Services
Managed Cluster ServicesManaged Cluster Services
Managed Cluster ServicesAdam Doyle
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations PresentationAdam Doyle
 
May 2021 Spark Testing ... or how to farm reputation on StackOverflow
May 2021 Spark Testing ... or how to farm reputation on StackOverflowMay 2021 Spark Testing ... or how to farm reputation on StackOverflow
May 2021 Spark Testing ... or how to farm reputation on StackOverflowAdam Doyle
 
Automate your data flows with Apache NIFI
Automate your data flows with Apache NIFIAutomate your data flows with Apache NIFI
Automate your data flows with Apache NIFIAdam Doyle
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAAdam Doyle
 
Localized Hadoop Development
Localized Hadoop DevelopmentLocalized Hadoop Development
Localized Hadoop DevelopmentAdam Doyle
 
The new big data
The new big dataThe new big data
The new big dataAdam Doyle
 
Feature store Overview St. Louis Big Data IDEA Meetup aug 2020
Feature store Overview   St. Louis Big Data IDEA Meetup aug 2020Feature store Overview   St. Louis Big Data IDEA Meetup aug 2020
Feature store Overview St. Louis Big Data IDEA Meetup aug 2020Adam Doyle
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleAdam Doyle
 
Retooling on the Modern Data and Analytics Tech Stack
Retooling on the Modern Data and Analytics Tech StackRetooling on the Modern Data and Analytics Tech Stack
Retooling on the Modern Data and Analytics Tech StackAdam Doyle
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020Adam Doyle
 
How stlrda does data
How stlrda does dataHow stlrda does data
How stlrda does dataAdam Doyle
 
Tailoring machine learning practices to support prescriptive analytics
Tailoring machine learning practices to support prescriptive analyticsTailoring machine learning practices to support prescriptive analytics
Tailoring machine learning practices to support prescriptive analyticsAdam Doyle
 
Synthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingSynthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingAdam Doyle
 
Big Data IDEA 101 2019
Big Data IDEA 101 2019Big Data IDEA 101 2019
Big Data IDEA 101 2019Adam Doyle
 
Data Engineering and the Data Science Lifecycle
Data Engineering and the Data Science LifecycleData Engineering and the Data Science Lifecycle
Data Engineering and the Data Science LifecycleAdam Doyle
 
Data engineering Stl Big Data IDEA user group
Data engineering   Stl Big Data IDEA user groupData engineering   Stl Big Data IDEA user group
Data engineering Stl Big Data IDEA user groupAdam Doyle
 

More from Adam Doyle (20)

ML Ops.pptx
ML Ops.pptxML Ops.pptx
ML Ops.pptx
 
Data Engineering Roles
Data Engineering RolesData Engineering Roles
Data Engineering Roles
 
Managed Cluster Services
Managed Cluster ServicesManaged Cluster Services
Managed Cluster Services
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 
May 2021 Spark Testing ... or how to farm reputation on StackOverflow
May 2021 Spark Testing ... or how to farm reputation on StackOverflowMay 2021 Spark Testing ... or how to farm reputation on StackOverflow
May 2021 Spark Testing ... or how to farm reputation on StackOverflow
 
Automate your data flows with Apache NIFI
Automate your data flows with Apache NIFIAutomate your data flows with Apache NIFI
Automate your data flows with Apache NIFI
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
Localized Hadoop Development
Localized Hadoop DevelopmentLocalized Hadoop Development
Localized Hadoop Development
 
The new big data
The new big dataThe new big data
The new big data
 
Feature store Overview St. Louis Big Data IDEA Meetup aug 2020
Feature store Overview   St. Louis Big Data IDEA Meetup aug 2020Feature store Overview   St. Louis Big Data IDEA Meetup aug 2020
Feature store Overview St. Louis Big Data IDEA Meetup aug 2020
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at Scale
 
Retooling on the Modern Data and Analytics Tech Stack
Retooling on the Modern Data and Analytics Tech StackRetooling on the Modern Data and Analytics Tech Stack
Retooling on the Modern Data and Analytics Tech Stack
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020
 
How stlrda does data
How stlrda does dataHow stlrda does data
How stlrda does data
 
Tailoring machine learning practices to support prescriptive analytics
Tailoring machine learning practices to support prescriptive analyticsTailoring machine learning practices to support prescriptive analytics
Tailoring machine learning practices to support prescriptive analytics
 
Synthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingSynthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-making
 
Big Data IDEA 101 2019
Big Data IDEA 101 2019Big Data IDEA 101 2019
Big Data IDEA 101 2019
 
Data Engineering and the Data Science Lifecycle
Data Engineering and the Data Science LifecycleData Engineering and the Data Science Lifecycle
Data Engineering and the Data Science Lifecycle
 
Data engineering Stl Big Data IDEA user group
Data engineering   Stl Big Data IDEA user groupData engineering   Stl Big Data IDEA user group
Data engineering Stl Big Data IDEA user group
 

Recently uploaded

1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 

Recently uploaded (20)

1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 

Operationalizing Data Science St. Louis Big Data IDEA

  • 1. • Please mute your phone and turn off your video. There are over eighty people who don’t want to see or hear you chewing. • If you have any suggestions for future topics that you would like this group to cover, please send them to Scott Shaw using Webex’s chat feature. • We will send out the presentation deck after the meeting. Look for an announcement in the Meetup link for this meeting. • If you have questions during the presentation, also send them to Scott Shaw using Webex’s chat feature. We will get to as many questions as we can. Before we begin…
  • 2. Operationalizing Data Science Adam Doyle Daugherty Business Solutions April 1, 2020
  • 3. Thanks for coming out everyone!
  • 5. Operationalizing Data Science Adam Doyle Daugherty Business Solutions April 1, 2020
  • 6. Begin with the end in mind. Pause in the middle to make sure that you can get to where you are going. • What is the business intention that you are trying to achieve? • Minimize Cost • Maximize Return • Minimize Risk • Realize Opportunity • Engage Stakeholders • POC vs Production ready and valued product
  • 7. Identify your thesis. Goal vs. intention SMART goal Refine to question that can be answered with data science Data science – predict, explain, evaluate Decision science – combination of data science and data engineering
  • 8. Acquire data. Third-Party Data Internal API Streaming General – Amount, Access, Quality, Labeled? Third Party o Assess Data Quality (Value Range, Adherence, Representative) o Data Format (Automatic vs hand-generated, Similar data from different partners are vastly different) o Governed (Use appropriate – avoid reidentification, TTL, Contractuals, Track access, renewals) Internal API (Data size limits, unreliability, costs) Streaming (CDC, Device Data, Standardized?)
  • 9. Explore the data. Data Exploration Statistical Relationships and Correlations Profiling Textual – Word, Stop Words, Bigram, Trigram Clustering Check in with SME
  • 10. Every block of stone has a statue inside it, and it is the task of the sculptor to discover it. Cleanse data. Data profiling Deduplication Outliers Filter Imputation Source Corrections Data shaping Sort Project Enrichment
  • 11. Create the model and features. Type of Models (Supervised, Unsupervised, Reinforcement Learning, Neural Networks) Feature Engineering (Transformations and Aggregations) Encode Indicator Variables Binning/Bucketing Sparse Classes Interaction Features Extract Elements (eg. Time) Normalization Feature Selection Testing your features Testing your model Check in with SME
  • 12. Check in with Business Does what you’ve created address the concerns of the business?
  • 13. Batch vs. Real-time? Batch Training vs Real-time for - Training - Evaluation
  • 15. Deploy the model. Automation Scaling SLAs Versioning Data Pipelines Ongoing Data Acquisition Ongoing Data Cleaning Ongoing Feature Encoding Integration in application
  • 16. Monitor the model. Drift Degrading the model Predictions and their effects
  • 17. Optimize the model. Feature Optimization Retraining Remodeling
  • 18. Conclusion What does it mean to be done? Explanation as a Result
  • 20. • https://www.dataengineeringpodcast.com/ • https://dataengweekly.com/ • https://www.logicalclocks.com/blog/feature-store-the-missing-data- layer-in-ml-pipelines • https://www.imperva.com/blog/deployment-isnt-the-final-step- monitoring-machine-learning-models-in-production/ Links

Editor's Notes

  1. Welcome. Introduction.
  2. Welcome. Introduction.
  3. What is the business intention that you are trying to achieve? Minimize Cost Maximize Return Minimize Risk Realize Opportunity Engage Stakeholders POC vs Production ready and valued product
  4. Decision science SMART goal Goal vs. intention Refine to question Data science – predict, explain, evaluate
  5. General – Amount, Access, Quality, Labeled? Third Party o   Assess Data Quality (Value Range, Adherence, Representative) o   Data Format (Automatic vs hand-generated, Similar data from different partners are vastly different) o   Governed (Use appropriate – avoid reidentification, TTL, Contractuals, Track access, renewals) Internal API (Data size limits, unreliability, costs) Streaming (CDC, Device Data, Standardized?)
  6. Data Exploration Statistical Relationships and Correlations Profiling Textual – Word, Stop Words, Bigram, Trigram Clustering Check in with SME
  7. Data profiling Deduplication Outliers Filter Imputation Source Corrections Data shaping Sort Project Enrichment
  8. Type of Models (Supervised, Unsupervised, Reinforcement Learning, Neural Networks) Feature Engineering (Transformations and Aggregations) Encode Indicator Variables Binning/Bucketing Sparse Classes Interaction Features Extract Elements (eg. Time) Normalization Feature Selection Testing your features Testing your model Check in with SME
  9. Check in with Business Does what you’ve created address the concerns of the business?
  10. Batch Training vs Real-time Training Batch Evaluation vs Real-time Evaluation
  11. Truth Matrix Mean Square Error Evaluation time
  12. Automation Scaling SLAs Versioning Data Pipelines Ongoing Data Acquisition Ongoing Data Cleaning Ongoing Feature Encoding Integration in application
  13. Drift Degrading the model Predictions and their effects
  14. Feature Optimization Retraining Remodeling
  15. What does it mean to be done? Explanation as a Result