SlideShare a Scribd company logo
1 of 30
The Practice of Data Science:
People, Processes and Tools
Bob. E. Hayes, PhD
bob@businessoverbroadway.com
@bobehayes
Presented at Metis’ Demystifying Data Science: A FREE
Online Conference for Aspiring Data Scientists – Sept 27,
2017
Bob E. Hayes, PhD
Email: bob@businessoverbroadway.com
Web: www.businessoverbroadway.com
Twitter: @bobehayes
• Author of three books on customer experience
management and analytics
• PhD in industrial-organizational psychology
• #6 blogger overall on CustomerThink
(http://customerthink.com/author/bobehayes/)
• #3 blogger on the topic of customer analytics
(http://customerthink.com/top-authors-category/)
• Top expert in Big Data and Data Science
• https://www.maptive.com/the-top-100-big-data-
experts/
• http://www.kdnuggets.com/2015/02/top-big-data-
influencers-brands.html
3
Outline
• Why now?
• Definition of Data Science
• The People: Data Science Skills
• The Process: From Data to Insight
• The Tools
• Education Requirements
• Gender Diversity
4
Data and Our Ability to Process it
Analytics Skills Gap is Huge*
* From PwC: Investing in America’s Data Science and Analytics Talent
6
Data Science Defined
Data science is way of extracting
insights from data using the powers of
computer science and statistics applied to
data from a specific field of study.
7
Data Science Defined
The People
8
JobRolesinDataScience
*Researcher (e.g., researcher, scientist, statistician); Business Management (e.g., leader, business person, entrepreneur); Creative
(e.g., jack of all trades, artist, hacker); Developer (e.g., developer, engineer)
9
Three Skill Domains of Data Science
Domain
Knowledge
Math /
Statistics
Technology /
Programming
10
25 Data Science Skills
Top 10 Data Science
Skills
1. Communication
2. Managing structured data
3. Data mining and visualization tools
4. Science / Scientific method
5. Math
6. Project management
7. Data management
8. Statistics and statistical modeling
9. Product design and development
10. Business developmentData are based on responses to AnalyticsWeek and Business Over
Broadway Data Science Survey. From September 2015.
11
Skill Proficiency Varies by Data Science Role
0
10
20
30
40
50
60
70
80
Buisness development
Budgeting
Goverance and Compliance
Optimization
Math
Graphical Models
Algorithms
Bayesian Statistics
Machine Learning
Data Mining and Viz Tools
Statistics and statistical modeling
Science/Scientific Method
CommunicationUnstructured data
Structured data
NLP and text mining
Data Management
Big and distributed data
Systems Administration
Database Administration
Cloud Management
Back-end Programming
Front-end Programming
Product Design
Project management
Domain Expert
Developer
Researcher
Proficiency Standard
Math /
Statistics
Tech /
Programming
Domain Knowledge
Data are based on responses to AnalyticsWeek and Business Over Broadway Data Science Survey. From September 2015.
12
In Search of the Data Science Unicorn
I wish I knew
some Python.
Data are based on responses to AnalyticsWeek and Business Over Broadway Data Science Survey. From 2015.
14
Analytics, Data Mining and Data Science Methods
S = Start with Strategy
M = Measure Metrics and Data
A = Apply Analytics
R = Report Results
T = Transform your Business
From “CRISP-DM, still the top methodology for analytics, data mining, or data science projects“
http://www.kdnuggets.com/2014/10/crisp-dm-top-methodology-analytics-data-mining-data-science-projects.html
15
Cross Industry Standard Process
for Data Mining (CRISP-DM)
(IBM, Teradata, Daimler AG, NCR Corporation and OHRA)
From Data to Insight
For more information on these methods, see: https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining;
https://en.wikipedia.org/wiki/SEMMA; https://en.wikipedia.org/wiki/Data_mining
Knowledge Discovery in
Databases (KDD)
SEMMA
(SAS)
16
Getting Insight from Data: The Scientific Method
1. Formulate
Questions
2. Generate
hypothesis/
hunch
3. Gather /
Generate data
4. Analyze
data / Test
hypothesis
5. Take action /
Communicate
results
• Start with a problem
statement.
• What are your hunches /
hypotheses?
• Be sure your hypotheses
are testable.
• You can use experimental or
observational approach to
analyzing data.
• Integrate your data silos to ask
bigger questions; connect the
dots and get a 360 degree view of
the phenomenon you’re studying.
• Employ Predictive analytics /
Inferential statistics to test
hypotheses.
• Employ machine learning to
quickly surface insights.
• Implement your findings;
inform decision-makers;
optimize algorithms
• Use Prescriptive analytics
to guide course of action.
17
Iterative Process of Discovery
Image from Netflix Tech Blog: https://medium.com/netflix-techblog/a-b-testing-and-beyond-improving-the-netflix-streaming-experience-
with-experimentation-and-data-5b0ae9295bdf
18
Scientific Method and Data Science Skills
19
The Tools
20
Top Data Science Tools
Rexer Analytics Data
Science Survey 2015
For a comprehensive overview of different data science tools,
please see: http://r4stats.com/articles/popularity/
21
Data Science Ecosystem
Gartner Magic Quadrant (2017) Forrester Wave
Leaders
IBM
SAS
RapidMiner
KNIME
For a good review of data science platforms, please see:
https://thomaswdinsmore.com/2017/02/28/gartner-looks-at-data-science-platforms/
22
Extra
Important Skills, Role of Formal Education, Gender Diversity
23
Importance of Data Science Skills by Job Role
24
What skills are linked to project success?
25
Highest Level of Education Attained
26
Education and Data Science Skills
Data are based on responses to AnalyticsWeek and Business Over Broadway Data Science Survey. From 2015.
27
Lack of Gender Diversity
28
Job Roles in Data Science by Gender
29
Gender Diversity – Other Science Roles
30
Gender Comparison of Proficiency across Skills
31
Advice for Data Scientists
• Be specific when talking about “data scientists”
• There are different types – defined by what they do and the skills they possess
• Work with other data professionals who have complementary skills.
Teamwork is key to successful data science projects.
• Learn to use data mining and visualization tools
• R, Python, SPSS, SAS, graphics, mapping, web-based data visualization
• Be an advocate for women in the field of data science

More Related Content

More from Business Over Broadway

In a Word: The Customer Sentiment Index
In a Word: The Customer Sentiment IndexIn a Word: The Customer Sentiment Index
In a Word: The Customer Sentiment IndexBusiness Over Broadway
 
Big Data - What it Really Means for VOC and Customer Experience Professionals
Big Data - What it Really Means for VOC and Customer Experience ProfessionalsBig Data - What it Really Means for VOC and Customer Experience Professionals
Big Data - What it Really Means for VOC and Customer Experience ProfessionalsBusiness Over Broadway
 
Improving the customer experience using big data customer-centric measurement...
Improving the customer experience using big data customer-centric measurement...Improving the customer experience using big data customer-centric measurement...
Improving the customer experience using big data customer-centric measurement...Business Over Broadway
 
Customer Relationship Diagnostic: Sample Report
Customer Relationship Diagnostic: Sample ReportCustomer Relationship Diagnostic: Sample Report
Customer Relationship Diagnostic: Sample ReportBusiness Over Broadway
 
Customer Experience Management for Startups
Customer Experience Management for StartupsCustomer Experience Management for Startups
Customer Experience Management for StartupsBusiness Over Broadway
 
Big Data has Big Implications for Customer Experience Management
Big Data has Big Implications for Customer Experience ManagementBig Data has Big Implications for Customer Experience Management
Big Data has Big Implications for Customer Experience ManagementBusiness Over Broadway
 
Asking the Right CX Questions: Optimizing your Customer Relationship Survey
Asking the Right CX Questions: Optimizing your Customer Relationship SurveyAsking the Right CX Questions: Optimizing your Customer Relationship Survey
Asking the Right CX Questions: Optimizing your Customer Relationship SurveyBusiness Over Broadway
 
Linkage Analysis in Customer Feedback Programs
Linkage Analysis in Customer Feedback ProgramsLinkage Analysis in Customer Feedback Programs
Linkage Analysis in Customer Feedback ProgramsBusiness Over Broadway
 
Competitive Analytics that Drive Customer Loyalty
Competitive Analytics that Drive Customer LoyaltyCompetitive Analytics that Drive Customer Loyalty
Competitive Analytics that Drive Customer LoyaltyBusiness Over Broadway
 
Developing a Customer Centric Research Program
Developing a Customer Centric Research ProgramDeveloping a Customer Centric Research Program
Developing a Customer Centric Research ProgramBusiness Over Broadway
 
Managing Customer Loyalty - Micro and Macro Approach
Managing Customer Loyalty - Micro and Macro ApproachManaging Customer Loyalty - Micro and Macro Approach
Managing Customer Loyalty - Micro and Macro ApproachBusiness Over Broadway
 

More from Business Over Broadway (16)

In a Word: The Customer Sentiment Index
In a Word: The Customer Sentiment IndexIn a Word: The Customer Sentiment Index
In a Word: The Customer Sentiment Index
 
The Hidden Bias in Customer Metrics
The Hidden Bias in Customer MetricsThe Hidden Bias in Customer Metrics
The Hidden Bias in Customer Metrics
 
Big Data and Customer Experience
Big Data and Customer ExperienceBig Data and Customer Experience
Big Data and Customer Experience
 
Big Data - What it Really Means for VOC and Customer Experience Professionals
Big Data - What it Really Means for VOC and Customer Experience ProfessionalsBig Data - What it Really Means for VOC and Customer Experience Professionals
Big Data - What it Really Means for VOC and Customer Experience Professionals
 
Improving the customer experience using big data customer-centric measurement...
Improving the customer experience using big data customer-centric measurement...Improving the customer experience using big data customer-centric measurement...
Improving the customer experience using big data customer-centric measurement...
 
Customer Relationship Diagnostic: Sample Report
Customer Relationship Diagnostic: Sample ReportCustomer Relationship Diagnostic: Sample Report
Customer Relationship Diagnostic: Sample Report
 
Customer Experience Management for Startups
Customer Experience Management for StartupsCustomer Experience Management for Startups
Customer Experience Management for Startups
 
Big Data has Big Implications for Customer Experience Management
Big Data has Big Implications for Customer Experience ManagementBig Data has Big Implications for Customer Experience Management
Big Data has Big Implications for Customer Experience Management
 
Asking the Right CX Questions: Optimizing your Customer Relationship Survey
Asking the Right CX Questions: Optimizing your Customer Relationship SurveyAsking the Right CX Questions: Optimizing your Customer Relationship Survey
Asking the Right CX Questions: Optimizing your Customer Relationship Survey
 
Linkage Analysis in Customer Feedback Programs
Linkage Analysis in Customer Feedback ProgramsLinkage Analysis in Customer Feedback Programs
Linkage Analysis in Customer Feedback Programs
 
Competitive Analytics that Drive Customer Loyalty
Competitive Analytics that Drive Customer LoyaltyCompetitive Analytics that Drive Customer Loyalty
Competitive Analytics that Drive Customer Loyalty
 
Validation of Customer Survey
Validation of Customer SurveyValidation of Customer Survey
Validation of Customer Survey
 
Developing a Customer Centric Research Program
Developing a Customer Centric Research ProgramDeveloping a Customer Centric Research Program
Developing a Customer Centric Research Program
 
Managing Customer Loyalty - Micro and Macro Approach
Managing Customer Loyalty - Micro and Macro ApproachManaging Customer Loyalty - Micro and Macro Approach
Managing Customer Loyalty - Micro and Macro Approach
 
Building a Customer Feedback Program
Building a Customer Feedback ProgramBuilding a Customer Feedback Program
Building a Customer Feedback Program
 
RAPID Loyalty Measurement
RAPID Loyalty MeasurementRAPID Loyalty Measurement
RAPID Loyalty Measurement
 

Recently uploaded

Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 

Recently uploaded (20)

Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 

The Practice of Data Science - Demystifying Data Science Conference

  • 1. The Practice of Data Science: People, Processes and Tools Bob. E. Hayes, PhD bob@businessoverbroadway.com @bobehayes Presented at Metis’ Demystifying Data Science: A FREE Online Conference for Aspiring Data Scientists – Sept 27, 2017
  • 2. Bob E. Hayes, PhD Email: bob@businessoverbroadway.com Web: www.businessoverbroadway.com Twitter: @bobehayes • Author of three books on customer experience management and analytics • PhD in industrial-organizational psychology • #6 blogger overall on CustomerThink (http://customerthink.com/author/bobehayes/) • #3 blogger on the topic of customer analytics (http://customerthink.com/top-authors-category/) • Top expert in Big Data and Data Science • https://www.maptive.com/the-top-100-big-data- experts/ • http://www.kdnuggets.com/2015/02/top-big-data- influencers-brands.html
  • 3. 3 Outline • Why now? • Definition of Data Science • The People: Data Science Skills • The Process: From Data to Insight • The Tools • Education Requirements • Gender Diversity
  • 4. 4 Data and Our Ability to Process it
  • 5. Analytics Skills Gap is Huge* * From PwC: Investing in America’s Data Science and Analytics Talent
  • 6. 6 Data Science Defined Data science is way of extracting insights from data using the powers of computer science and statistics applied to data from a specific field of study.
  • 8. 8 JobRolesinDataScience *Researcher (e.g., researcher, scientist, statistician); Business Management (e.g., leader, business person, entrepreneur); Creative (e.g., jack of all trades, artist, hacker); Developer (e.g., developer, engineer)
  • 9. 9 Three Skill Domains of Data Science Domain Knowledge Math / Statistics Technology / Programming
  • 10. 10 25 Data Science Skills Top 10 Data Science Skills 1. Communication 2. Managing structured data 3. Data mining and visualization tools 4. Science / Scientific method 5. Math 6. Project management 7. Data management 8. Statistics and statistical modeling 9. Product design and development 10. Business developmentData are based on responses to AnalyticsWeek and Business Over Broadway Data Science Survey. From September 2015.
  • 11. 11 Skill Proficiency Varies by Data Science Role 0 10 20 30 40 50 60 70 80 Buisness development Budgeting Goverance and Compliance Optimization Math Graphical Models Algorithms Bayesian Statistics Machine Learning Data Mining and Viz Tools Statistics and statistical modeling Science/Scientific Method CommunicationUnstructured data Structured data NLP and text mining Data Management Big and distributed data Systems Administration Database Administration Cloud Management Back-end Programming Front-end Programming Product Design Project management Domain Expert Developer Researcher Proficiency Standard Math / Statistics Tech / Programming Domain Knowledge Data are based on responses to AnalyticsWeek and Business Over Broadway Data Science Survey. From September 2015.
  • 12. 12 In Search of the Data Science Unicorn I wish I knew some Python. Data are based on responses to AnalyticsWeek and Business Over Broadway Data Science Survey. From 2015.
  • 13. 14 Analytics, Data Mining and Data Science Methods S = Start with Strategy M = Measure Metrics and Data A = Apply Analytics R = Report Results T = Transform your Business From “CRISP-DM, still the top methodology for analytics, data mining, or data science projects“ http://www.kdnuggets.com/2014/10/crisp-dm-top-methodology-analytics-data-mining-data-science-projects.html
  • 14. 15 Cross Industry Standard Process for Data Mining (CRISP-DM) (IBM, Teradata, Daimler AG, NCR Corporation and OHRA) From Data to Insight For more information on these methods, see: https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining; https://en.wikipedia.org/wiki/SEMMA; https://en.wikipedia.org/wiki/Data_mining Knowledge Discovery in Databases (KDD) SEMMA (SAS)
  • 15. 16 Getting Insight from Data: The Scientific Method 1. Formulate Questions 2. Generate hypothesis/ hunch 3. Gather / Generate data 4. Analyze data / Test hypothesis 5. Take action / Communicate results • Start with a problem statement. • What are your hunches / hypotheses? • Be sure your hypotheses are testable. • You can use experimental or observational approach to analyzing data. • Integrate your data silos to ask bigger questions; connect the dots and get a 360 degree view of the phenomenon you’re studying. • Employ Predictive analytics / Inferential statistics to test hypotheses. • Employ machine learning to quickly surface insights. • Implement your findings; inform decision-makers; optimize algorithms • Use Prescriptive analytics to guide course of action.
  • 16. 17 Iterative Process of Discovery Image from Netflix Tech Blog: https://medium.com/netflix-techblog/a-b-testing-and-beyond-improving-the-netflix-streaming-experience- with-experimentation-and-data-5b0ae9295bdf
  • 17. 18 Scientific Method and Data Science Skills
  • 19. 20 Top Data Science Tools Rexer Analytics Data Science Survey 2015 For a comprehensive overview of different data science tools, please see: http://r4stats.com/articles/popularity/
  • 20. 21 Data Science Ecosystem Gartner Magic Quadrant (2017) Forrester Wave Leaders IBM SAS RapidMiner KNIME For a good review of data science platforms, please see: https://thomaswdinsmore.com/2017/02/28/gartner-looks-at-data-science-platforms/
  • 21. 22 Extra Important Skills, Role of Formal Education, Gender Diversity
  • 22. 23 Importance of Data Science Skills by Job Role
  • 23. 24 What skills are linked to project success?
  • 24. 25 Highest Level of Education Attained
  • 25. 26 Education and Data Science Skills Data are based on responses to AnalyticsWeek and Business Over Broadway Data Science Survey. From 2015.
  • 26. 27 Lack of Gender Diversity
  • 27. 28 Job Roles in Data Science by Gender
  • 28. 29 Gender Diversity – Other Science Roles
  • 29. 30 Gender Comparison of Proficiency across Skills
  • 30. 31 Advice for Data Scientists • Be specific when talking about “data scientists” • There are different types – defined by what they do and the skills they possess • Work with other data professionals who have complementary skills. Teamwork is key to successful data science projects. • Learn to use data mining and visualization tools • R, Python, SPSS, SAS, graphics, mapping, web-based data visualization • Be an advocate for women in the field of data science

Editor's Notes

  1. Involves the collection, analysis and interpretation of data to extract empirically-based insights that augment and enhance human decisions and algorithms.
  2. Involves the collection, analysis and interpretation of data to extract empirically-based insights that augment and enhance human decisions and algorithms.
  3. Involves the collection, analysis and interpretation of data to extract empirically-based insights that augment and enhance human decisions and algorithms.
  4. Involves the collection, analysis and interpretation of data to extract empirically-based insights that augment and enhance human decisions and algorithms.
  5. Involves the collection, analysis and interpretation of data to extract empirically-based insights that augment and enhance human decisions and algorithms.
  6. Involves the collection, analysis and interpretation of data to extract empirically-based insights that augment and enhance human decisions and algorithms.
  7. SEMMA is an acronym that stands for Sample, Explore, Modify, Model, and Assess. It is a list of sequential steps developed by SAS Institute. The only other data mining approach named in these polls was SEMMA. However, SAS Institute clearly states that SEMMA is not a data mining methodology, but rather a "logical organization of the functional tool set of SAS Enterprise Miner."  The term Knowledge Discovery in Databases, or KDD for short, refers to the broad process of finding knowledge in data, and emphasizes the "high-level" application of particular data mining methods.