SlideShare a Scribd company logo
1 of 16
Download to read offline
DATA SCIENCE
PROJECT
METHODOLOGY
Sergey Shelpuk
sergey@shelpuk.com
DATA
SCIENCE
IS ALL
ABOUT
BUSINESS
TOP BIG DATA CHALLENGES
0 10 20 30 40 50 60
Determining how to get value from Big Data
Defining our strategy
Obtaining skills and capabilities needed
Integrating multiple data sources
Infrastructure and/or architecture
Risk and governance issues
Funding for Big Data related initiatives
Understanding what is Big Data
Leadership or organizational issues
Other
Top challenge Second Third© Gartner
METHODOLOGY IS A KEY TO
SUCCESS
Cross-Industry Standard Process for Data Mining (CRISP-DM)
BUSINESS
UNDERSTANDING
Determining Business Objectives
1. Gather background information
 Compiling the business background
 Defining business objectives
 Business success criteria
2. Assessing the situation
 Resource Inventory
 Requirements, Assumptions, and Constraints
 Risks and Contingencies
 Cost/Benefit Analysis
4. Determining data science goals
 Data science goals
 Data science success criteria
4. Producing a Project Plan
© IBM
EXAMPLE OF THE
PROJECT PLAN
Phase Time Resources Risks
Business
understanding
1 week All analysts Economic change
Data
understanding
3 weeks All analysts Data problems, technology
problems
Data preparation 5 weeks Data scientists, DB
engineers
Data problems, technology
problems
Modeling 2 weeks Data scientists Technology problems, inability
to build adequate model
Evaluation 1 week All analysts Economic change, inability to
implement results
Deployment 1 week Data scientist, DB
engineers,
implementation
team
Economic change, inability to
implement results
© IBM
READY FOR THE
DATA UNDERSTANDING?
From a business perspective:
 What does your business hope to gain from this project?
 How will you define the successful completion of our efforts?
 Do you have the budget and resources needed to reach our goals?
 Do you have access to all the data needed for this project?
 Have you and your team discussed the risks and contingencies associated with
this project?
 Do the results of your cost/benefit analysis make this project worthwhile?
From a data science perspective:
 How specifically can data mining help you meet your business goals?
 Do you have an idea about which data mining techniques might produce the best
results?
 How will you know when your results are accurate or effective enough? (Have we
set a measurement of data mining success?)
 How will the modeling results be deployed? Have you considered deployment in
your project plan?
 Does the project plan include all phases of CRISP-DM?
 Are risks and dependencies called out in the plan?© IBM
DATA
UNDERSTANDING
© IBM
1. Collect initial data
 Existing data
 Purchased data
 Additional data
2. Describe data
 Amount of data
 Value types
 Coding schemes
3. Explore data
4. Verify data quality
 Missing data
 Data errors
 Coding inconsistencies
 Bad metadata
READY FOR THE
DATA PREPARATION?
 Are all data sources clearly identified and accessed? Are you aware of
any problems or restrictions?
 Have you identified key attributes from the available data?
 Did these attributes help you to formulate hypotheses?
 Have you noted the size of all data sources?
 Are you able to use a subset of data where appropriate?
 Have you computed basic statistics for each attribute of interest? Did
meaningful information emerge?
 Did you use exploratory graphics to gain further insight into key
attributes? Did this insight reshape any of your hypotheses?
 What are the data quality issues for this project? Do you have a plan to
address these issues?
 Are the data preparation steps clear? For instance, do you know which
data sources to merge and which attributes to filter or select?
© IBM
DATA
PREPARATION
© IBM
1. Select right data
 Select training examples
 Select features
2. Clean data
 Fill in missed data
 Correct data errors
 Make coding consistent
2. Extend data
 Extend training examples
 Extend features
2. Format data
 Put data in a format for training the model
READY FOR THE
MODELING?
 Based upon your initial exploration and understanding, were you able to
select relevant subsets of data?
 Have you cleaned the data effectively or removed unsalvageable items?
Document any decisions in the final report.
 Are multiple data sets integrated properly? Were there any merging
problems that should be documented?
 Have you researched the requirements of the modeling tools that you
plan to use?
 Are there any formatting issues you can address before modeling? This
includes both required formatting concerns as well as tasks that may
reduce modeling time.
© IBM
MODELING
© IBM
1. Select modeling techniques
 Select data types available for analysis
 Select an algorithm or a model
Define modeling goals
 State specific modeling requirements
2. Build the model
 Set up hyperparameters
 Train the model
 Describe the result
3. Assess the model
READY FOR THE
EVALUATION?
 Are you able to understand the results of the models?
 Do the model results make sense to you from a purely logical
perspective? Are there apparent inconsistencies that need further
exploration?
 From your initial glance, do the results seem to address your
organization’s business question?
 Have you used analysis nodes and lift or gains charts to compare and
evaluate model accuracy?
 Have you explored more than one type of model and compared the
results?
 Are the results of your model deployable?
© IBM
EVALUATION
© IBM
1. Evaluate the results
 Are results presented clearly?
 Are there any novel findings?
 Can models and findings be applicable to business
goals?
 How well do the models and findings answer business
goals?
 What additional questions the modeling results have
risen?
2. Review the process
 Did the stage contribute to the value of the results?
 What went wrong and how it can be fixed?
 Are there alternative decisions which could have been
executed?
2. Determine the next steps
DEPLOYMENT
© IBM
1. Planning for deployment
 Summarize models and findings
 For each model create a deployment plan
 Identify any deployment problems and plan for
contingencies
2. Plan monitoring and maintenance
 Identify models and findings which require support
 How can the accuracy and validity be evaluated?
 How will you determine that a model has expired?
 What to do with the expired models?
2. Conduct a final project review
THANK YOU

More Related Content

What's hot

Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
Denodo
 

What's hot (20)

Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Data Engineering Basics
Data Engineering BasicsData Engineering Basics
Data Engineering Basics
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Data Mining
Data MiningData Mining
Data Mining
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Modernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesModernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data Pipelines
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
DAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best Practices
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?
 
Data Mining Technique - SEMMA
Data Mining Technique - SEMMAData Mining Technique - SEMMA
Data Mining Technique - SEMMA
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Introduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceIntroduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data Science
 
Data modelling 101
Data modelling 101Data modelling 101
Data modelling 101
 
Data Product Architectures
Data Product ArchitecturesData Product Architectures
Data Product Architectures
 

Similar to CRISP-DM: a data science project methodology

DAT 520 Final Project Guidelines and Rubric Overview .docx
DAT 520 Final Project Guidelines and Rubric  Overview .docxDAT 520 Final Project Guidelines and Rubric  Overview .docx
DAT 520 Final Project Guidelines and Rubric Overview .docx
simonithomas47935
 
413451520-8-Steps-Successful-Enterprise-Data-Manag.pdf
413451520-8-Steps-Successful-Enterprise-Data-Manag.pdf413451520-8-Steps-Successful-Enterprise-Data-Manag.pdf
413451520-8-Steps-Successful-Enterprise-Data-Manag.pdf
IsmailCassiem
 
Sfeldman bbworld 07_going_enterprise (1)
Sfeldman bbworld 07_going_enterprise (1)Sfeldman bbworld 07_going_enterprise (1)
Sfeldman bbworld 07_going_enterprise (1)
Steve Feldman
 
Big data and Hadoop Training Brochure
Big data and Hadoop Training Brochure Big data and Hadoop Training Brochure
Big data and Hadoop Training Brochure
MCAL Management Consulting and Advanced Learning
 

Similar to CRISP-DM: a data science project methodology (20)

Big data@work
Big data@workBig data@work
Big data@work
 
Group 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptxGroup 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptx
 
Analytics
AnalyticsAnalytics
Analytics
 
Santander's Data Transformation
Santander's Data TransformationSantander's Data Transformation
Santander's Data Transformation
 
Data integration my_experience
Data integration my_experienceData integration my_experience
Data integration my_experience
 
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
 
DAT 520 Final Project Guidelines and Rubric Overview .docx
DAT 520 Final Project Guidelines and Rubric  Overview .docxDAT 520 Final Project Guidelines and Rubric  Overview .docx
DAT 520 Final Project Guidelines and Rubric Overview .docx
 
Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?
Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?
Agile Mumbai 2022 - Ashwinee Singh | Agile in AI or AI in Agile?
 
Demystifying ML/AI
Demystifying ML/AIDemystifying ML/AI
Demystifying ML/AI
 
Data Science.pdf
Data Science.pdfData Science.pdf
Data Science.pdf
 
Best practice for_agile_ds_projects
Best practice for_agile_ds_projectsBest practice for_agile_ds_projects
Best practice for_agile_ds_projects
 
Data mining
Data miningData mining
Data mining
 
The Value of Predictive Analytics and Decision Modeling
The Value of Predictive Analytics and Decision ModelingThe Value of Predictive Analytics and Decision Modeling
The Value of Predictive Analytics and Decision Modeling
 
413451520-8-Steps-Successful-Enterprise-Data-Manag.pdf
413451520-8-Steps-Successful-Enterprise-Data-Manag.pdf413451520-8-Steps-Successful-Enterprise-Data-Manag.pdf
413451520-8-Steps-Successful-Enterprise-Data-Manag.pdf
 
Doing Analytics Right - Selecting Analytics
Doing Analytics Right - Selecting AnalyticsDoing Analytics Right - Selecting Analytics
Doing Analytics Right - Selecting Analytics
 
Technical Documentation 101 for Data Engineers.pdf
Technical Documentation 101 for Data Engineers.pdfTechnical Documentation 101 for Data Engineers.pdf
Technical Documentation 101 for Data Engineers.pdf
 
Egypt hackathon 2014 analytics & spss session
Egypt hackathon 2014   analytics & spss sessionEgypt hackathon 2014   analytics & spss session
Egypt hackathon 2014 analytics & spss session
 
Sfeldman bbworld 07_going_enterprise (1)
Sfeldman bbworld 07_going_enterprise (1)Sfeldman bbworld 07_going_enterprise (1)
Sfeldman bbworld 07_going_enterprise (1)
 
Big data and Hadoop Training Brochure
Big data and Hadoop Training Brochure Big data and Hadoop Training Brochure
Big data and Hadoop Training Brochure
 
SAP Applications and the Modern Data Scientist - Predictive Analytics for the...
SAP Applications and the Modern Data Scientist - Predictive Analytics for the...SAP Applications and the Modern Data Scientist - Predictive Analytics for the...
SAP Applications and the Modern Data Scientist - Predictive Analytics for the...
 

More from Sergey Shelpuk (8)

Data science: A New Profession in IT
Data science: A New Profession in ITData science: A New Profession in IT
Data science: A New Profession in IT
 
Buzzword scheme
Buzzword schemeBuzzword scheme
Buzzword scheme
 
Machine Learning: Advanced Topics Overview
Machine Learning: Advanced Topics OverviewMachine Learning: Advanced Topics Overview
Machine Learning: Advanced Topics Overview
 
Artificial intelligence 2015: Quo Vadis?
Artificial intelligence 2015: Quo Vadis?Artificial intelligence 2015: Quo Vadis?
Artificial intelligence 2015: Quo Vadis?
 
Machine learning intro
Machine learning introMachine learning intro
Machine learning intro
 
How to take over the world with artificial intelligence final
How to take over the world with artificial intelligence finalHow to take over the world with artificial intelligence final
How to take over the world with artificial intelligence final
 
Object similarity with office laptop
Object similarity with office laptopObject similarity with office laptop
Object similarity with office laptop
 
Data science for HR
Data science for HRData science for HR
Data science for HR
 

Recently uploaded

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Recently uploaded (20)

The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 

CRISP-DM: a data science project methodology

  • 3. TOP BIG DATA CHALLENGES 0 10 20 30 40 50 60 Determining how to get value from Big Data Defining our strategy Obtaining skills and capabilities needed Integrating multiple data sources Infrastructure and/or architecture Risk and governance issues Funding for Big Data related initiatives Understanding what is Big Data Leadership or organizational issues Other Top challenge Second Third© Gartner
  • 4. METHODOLOGY IS A KEY TO SUCCESS Cross-Industry Standard Process for Data Mining (CRISP-DM)
  • 5. BUSINESS UNDERSTANDING Determining Business Objectives 1. Gather background information  Compiling the business background  Defining business objectives  Business success criteria 2. Assessing the situation  Resource Inventory  Requirements, Assumptions, and Constraints  Risks and Contingencies  Cost/Benefit Analysis 4. Determining data science goals  Data science goals  Data science success criteria 4. Producing a Project Plan © IBM
  • 6. EXAMPLE OF THE PROJECT PLAN Phase Time Resources Risks Business understanding 1 week All analysts Economic change Data understanding 3 weeks All analysts Data problems, technology problems Data preparation 5 weeks Data scientists, DB engineers Data problems, technology problems Modeling 2 weeks Data scientists Technology problems, inability to build adequate model Evaluation 1 week All analysts Economic change, inability to implement results Deployment 1 week Data scientist, DB engineers, implementation team Economic change, inability to implement results © IBM
  • 7. READY FOR THE DATA UNDERSTANDING? From a business perspective:  What does your business hope to gain from this project?  How will you define the successful completion of our efforts?  Do you have the budget and resources needed to reach our goals?  Do you have access to all the data needed for this project?  Have you and your team discussed the risks and contingencies associated with this project?  Do the results of your cost/benefit analysis make this project worthwhile? From a data science perspective:  How specifically can data mining help you meet your business goals?  Do you have an idea about which data mining techniques might produce the best results?  How will you know when your results are accurate or effective enough? (Have we set a measurement of data mining success?)  How will the modeling results be deployed? Have you considered deployment in your project plan?  Does the project plan include all phases of CRISP-DM?  Are risks and dependencies called out in the plan?© IBM
  • 8. DATA UNDERSTANDING © IBM 1. Collect initial data  Existing data  Purchased data  Additional data 2. Describe data  Amount of data  Value types  Coding schemes 3. Explore data 4. Verify data quality  Missing data  Data errors  Coding inconsistencies  Bad metadata
  • 9. READY FOR THE DATA PREPARATION?  Are all data sources clearly identified and accessed? Are you aware of any problems or restrictions?  Have you identified key attributes from the available data?  Did these attributes help you to formulate hypotheses?  Have you noted the size of all data sources?  Are you able to use a subset of data where appropriate?  Have you computed basic statistics for each attribute of interest? Did meaningful information emerge?  Did you use exploratory graphics to gain further insight into key attributes? Did this insight reshape any of your hypotheses?  What are the data quality issues for this project? Do you have a plan to address these issues?  Are the data preparation steps clear? For instance, do you know which data sources to merge and which attributes to filter or select? © IBM
  • 10. DATA PREPARATION © IBM 1. Select right data  Select training examples  Select features 2. Clean data  Fill in missed data  Correct data errors  Make coding consistent 2. Extend data  Extend training examples  Extend features 2. Format data  Put data in a format for training the model
  • 11. READY FOR THE MODELING?  Based upon your initial exploration and understanding, were you able to select relevant subsets of data?  Have you cleaned the data effectively or removed unsalvageable items? Document any decisions in the final report.  Are multiple data sets integrated properly? Were there any merging problems that should be documented?  Have you researched the requirements of the modeling tools that you plan to use?  Are there any formatting issues you can address before modeling? This includes both required formatting concerns as well as tasks that may reduce modeling time. © IBM
  • 12. MODELING © IBM 1. Select modeling techniques  Select data types available for analysis  Select an algorithm or a model Define modeling goals  State specific modeling requirements 2. Build the model  Set up hyperparameters  Train the model  Describe the result 3. Assess the model
  • 13. READY FOR THE EVALUATION?  Are you able to understand the results of the models?  Do the model results make sense to you from a purely logical perspective? Are there apparent inconsistencies that need further exploration?  From your initial glance, do the results seem to address your organization’s business question?  Have you used analysis nodes and lift or gains charts to compare and evaluate model accuracy?  Have you explored more than one type of model and compared the results?  Are the results of your model deployable? © IBM
  • 14. EVALUATION © IBM 1. Evaluate the results  Are results presented clearly?  Are there any novel findings?  Can models and findings be applicable to business goals?  How well do the models and findings answer business goals?  What additional questions the modeling results have risen? 2. Review the process  Did the stage contribute to the value of the results?  What went wrong and how it can be fixed?  Are there alternative decisions which could have been executed? 2. Determine the next steps
  • 15. DEPLOYMENT © IBM 1. Planning for deployment  Summarize models and findings  For each model create a deployment plan  Identify any deployment problems and plan for contingencies 2. Plan monitoring and maintenance  Identify models and findings which require support  How can the accuracy and validity be evaluated?  How will you determine that a model has expired?  What to do with the expired models? 2. Conduct a final project review