The industry of data science is still in its youth. To those of us forging the path, we face many challenges: what infrastructure should we use? How do we scale to mountains of data? How do we maintain data integrity? However the center of data science complexity is not a technical challenge, it's a people challenge. How do educate our business leaders? How do we collaborate with other departments? What is our strategy for building data-driven organizations?
In this talk, Dylan will be arguing how we use the data science hierarchy of needs to guide data strategy and lead a discussion about best practices in data science.
Dylan Gregersen is a data scientist at Teem, which provides intelligent workplace tools and analytics.
Presented at https://www.meetup.com/utah-data-engineering-meetup/events/253208050/
Sponsored by Google Cloud Platform, Pluralsight and Overstock.
4. Rachel Schutt & Cathy O’Neil in Doing Data Science: Straight Talk From the Frontline
Data Science is the process of
collecting, cleaning, analyzing,
visualizing, and communicating
data in order to solve problems
in the real world.
Data science is...
5. What people think data science is...
People often think data science
is all about mathematics,
algorithms, and something call
“machine learning”
Rachel Schutt & Cathy O’Neil in Doing Data Science: Straight Talk From the Frontline
6. What most data science is...
Data science actually consists
mostly of data collection,
cleaning, and organization
(often 80% of the work)
Rachel Schutt & Cathy O’Neil in Doing Data Science: Straight Talk From the Frontline
7. What people forget that data science is
People tend to forget the skills
needed in data science to
communicate results so someone
can take an action in the real
worldRachel Schutt & Cathy O’Neil in Doing Data Science: Straight Talk From the Frontline
8. Rachel Schutt & Cathy O’Neil in Doing Data Science: Straight Talk From the Frontline
Data science is a process
When doing data science we...
1. Collect Data: We must first collect
and store information about real
world phenomena
2. Structure Data: Then we structure
that data into conceptual models of
the phenomena.
3. Extract Insight: We use our data
model to understand something
about the phenomena
4. Solve Problems: We apply our
understanding to solve a problem
by taking an action
9. Data science is successful when you learn
something about the real world which
helps you solve a problem by taking an
action.
13. Identifying the problem
U: What is my conference room utilization?
Me: What problem are you trying to solve?
U: I want to know which rooms are underutilized
Me: Why do you want to know?
U: To improve the efficiency of conference rooms use
Me: What are you going to do with that information?
A: Repurpose rooms who’s meeting usage is less than 50%
14. Problem: Conference rooms should be used efficiently
Action: repurpose rooms with usage less than 50%, also heavily used areas
Metric: room utilization = hours in use / available hours per day
Identifying the problem
U: What is my conference room utilization?
Me: What problem are you trying to solve?
U: I want to know which rooms are underutilized
Me: Why do you want to know?
U: To improve the efficiency of conference rooms use
Me: What are you going to do with that information?
A: Repurpose rooms who’s meeting usage is less than 50%
16. U: What is my conference room utilization?
Me: What problem are you trying to solve?
U: I want to know which departments are using the rooms the most.
Me: Why do you want to know?
U: To adjust the rooms to meet their needs
Me: What are you going to do with that information?
A: Buy new technology or furniture to better meet those needs
Identifying the problem
17. Problem: Change meeting rooms to fit the needs of department
Action: make purchasing decisions about technology or furniture
Metrics: room utilization, organizer’s department, occupancy size,
technology or furniture used
U: What is my conference room utilization?
Me: What problem are you trying to solve?
U: I want to know which departments are using the rooms the most.
Me: Why do you want to know?
U: To adjust the rooms to meet their needs
Me: What are you going to do with that information?
A: Buy new technology or furniture to better meet those needs
Identifying the problem
18. What problem are you
trying to solve?
What action will you take
with this number?
19. What problem are you
trying to solve?
What action will you take
with this number?
27. First point of value
Descriptive Analytics are your first
stage where you can actually solve a
problem and take an action.
Especially important for business end
users who want to apply the results
of your analysis.
28. First point of value
Descriptive Analytics are your first
stage where you can actually solve a
problem and take an action.
Especially important for business end
users who want to apply the results
of your analysis.
Your early projects should not try to
extend beyond this stage
29. First point of value
Focus first on counting
These will be...
● Easier to explain to your
stakeholders
● Faster to build and for
stakeholders to realize value
● Easier to focus on good
infrastructure and process.
Including tests and alerting.
30. First point of value
Businesses spend 1-3
months to get this into
production the first time
They spend 1-3 years to
really get this right
Descriptive Analytics are your
first stage where you can actually
solve a problem and take an
action.
31. Businesses spend 1-3
months to get this into
production the first time
They spend 1-3 years to
really get this right
1-2 years to do this well
1-2 years integrate these
1+ years modeling to
integrate optimizations
32. Businesses spend 1-3
months to get this into
production the first time
They spend 1-3 years to
really get this right
1-2 years to do this well
1-2 years integrate these
1+ years modeling to
integrate optimizations
35. Traditional product development lifecycle
Developing a data product is the same as any product.
Having this process in place will mean more success in your
data endeavours.
Concept
Idea Generation
Research
Assess
Opportunity
Analysis
Business
Assessment
Develop
Create
Launch
Delivery
36. Understanding the problem to solve
Concept
Idea Generation
Research
Assess
Opportunity
Analysis
Business
Assessment
Develop
Create
Launch
Delivery
Know the problem to solve and what action will be taken
● Identify the stakeholders
● Document the possible questions your stakeholders have
● Dive deep to find the root problem the stakeholders need to solve
● Identify the action they’re going to take once they have the information
37. What is the scope of needs for to answer the question and
figuring out who needs to be involved
● What are the short-term and long-term goals for data?
● Who are the supporters and who are the opponents?
● Assuming we do this perfectly, what will we build first?
● What is the most evil thing which can be done?
Concept
Idea Generation
Research
Assess
Opportunity
Analysis
Business
Assessment
Develop
Create
Launch
Delivery
Assess what other opportunities there are
41. Concept
Idea Generation
Research
Assess
Opportunity
Analysis
Business
Assessment
Develop
Create
Launch + Maintain
Delivery
Data reports can become irrelevant and errors can arise so it
is important to do ongoing reviews of the data
● Review dashboards: is data still relevant and actionable?
● Metrics meetings: does everyone still understand the data and are there new
definitions which need to be evaluated?
● Domain specific reviews: meet with stakeholders and see what data is
valuable to them and what actions they take.
Plan to review the value of your insights
42. You win by continuing the product development
You win by continuing the product development lifecycle,
starting with data basics, and progressing data complexity
over time.
Concept
Idea Generation
Research
Assess
Opportunity
Analysis
Business
Assessment
Develop
Create
Launch + Maintain
Delivery
43. Rinse and Repeat
Concept
Idea Generation
Research
Assess
Opportunity
Analysis
Business
Assessment
Develop
Create
Launch +
Maintain
Delivery
44. Know what problem you are trying
to solve
Start simple and mature
complexity over time
Practice good product
development and iterate
Strategies:
45. Know what problem you are
trying to solve
Start simple and mature
complexity over time
Practice good product
development and iterate
Concept
Idea Generation
Research
Assess
Opportunity
Analysis
Business
Assessment
Develop
Create
Launch +
Maintain
Delivery
Data science is successful when
you learn something about the real
world which helps you solve a
problem by taking an action.
Strategies:
46. References and Resources
● Rachel Schutt & Cathy O’Neil (2013) Doing Data Science: Straight Talk From the
Frontline, Sebastopol, CA: O’Reilly
● DJ Patil & Hilary Mason (2015) Data Driven. Sebastopol, CA: O’Reilly
● DJ Patil (2011) Building Data Science Teams. Sebastopol, CA: O’Reilly
● Monica Rogati (2017) The AI Hierarchy of Needs
● Nick Crocker (2014) Thirty Things I’ve Learned
● Tavish Srivastava (2015) 13 Tips to make you awesome in Data Science / Analytics Jobs
● Daniel Tunkelang (2017) 10 Things Everyone Should Know About Machine Learning
● DJ Patil - Everything We Wish We'd Known About Building Data Products
47. Know what problem you are
trying to solve
Start simple and mature
complexity over time
Practice good product
development and iterate
Concept
Idea Generation
Research
Assess
Opportunity
Analysis
Business
Assessment
Develop
Create
Launch +
Maintain
Delivery
Data science is successful when
you learn something about the real
world which helps you solve a
problem by taking an action.
Strategies: