Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

What is your data strategy and why is it wrong?

The industry of data science is still in its youth. To those of us forging the path, we face many challenges: what infrastructure should we use? How do we scale to mountains of data? How do we maintain data integrity? However the center of data science complexity is not a technical challenge, it's a people challenge. How do educate our business leaders? How do we collaborate with other departments? What is our strategy for building data-driven organizations?

In this talk, Dylan will be arguing how we use the data science hierarchy of needs to guide data strategy and lead a discussion about best practices in data science.

Dylan Gregersen is a data scientist at Teem, which provides intelligent workplace tools and analytics.

Presented at https://www.meetup.com/utah-data-engineering-meetup/events/253208050/
Sponsored by Google Cloud Platform, Pluralsight and Overstock.

  • Login to see the comments

What is your data strategy and why is it wrong?

  1. 1. What is your data strategy and why is it wrong? Dylan Gregersen Data Engineering Meetup Aug 2018
  2. 2. My name is Dylan Gregersen I like these things... You can find me at… dylangregersen I am the lead data scientist at...
  3. 3. How do you define Data Science?
  4. 4. Rachel Schutt & Cathy O’Neil in Doing Data Science: Straight Talk From the Frontline Data Science is the process of collecting, cleaning, analyzing, visualizing, and communicating data in order to solve problems in the real world. Data science is...
  5. 5. What people think data science is... People often think data science is all about mathematics, algorithms, and something call “machine learning” Rachel Schutt & Cathy O’Neil in Doing Data Science: Straight Talk From the Frontline
  6. 6. What most data science is... Data science actually consists mostly of data collection, cleaning, and organization (often 80% of the work) Rachel Schutt & Cathy O’Neil in Doing Data Science: Straight Talk From the Frontline
  7. 7. What people forget that data science is People tend to forget the skills needed in data science to communicate results so someone can take an action in the real worldRachel Schutt & Cathy O’Neil in Doing Data Science: Straight Talk From the Frontline
  8. 8. Rachel Schutt & Cathy O’Neil in Doing Data Science: Straight Talk From the Frontline Data science is a process When doing data science we... 1. Collect Data: We must first collect and store information about real world phenomena 2. Structure Data: Then we structure that data into conceptual models of the phenomena. 3. Extract Insight: We use our data model to understand something about the phenomena 4. Solve Problems: We apply our understanding to solve a problem by taking an action
  9. 9. Data science is successful when you learn something about the real world which helps you solve a problem by taking an action.
  10. 10. Data Strategy #1 Know what problem you are trying to solve
  11. 11. “Can I have the number of X for last month?”
  12. 12. Identifying the problem U: What is my conference room utilization?
  13. 13. Identifying the problem U: What is my conference room utilization? Me: What problem are you trying to solve? U: I want to know which rooms are underutilized Me: Why do you want to know? U: To improve the efficiency of conference rooms use Me: What are you going to do with that information? A: Repurpose rooms who’s meeting usage is less than 50%
  14. 14. Problem: Conference rooms should be used efficiently Action: repurpose rooms with usage less than 50%, also heavily used areas Metric: room utilization = hours in use / available hours per day Identifying the problem U: What is my conference room utilization? Me: What problem are you trying to solve? U: I want to know which rooms are underutilized Me: Why do you want to know? U: To improve the efficiency of conference rooms use Me: What are you going to do with that information? A: Repurpose rooms who’s meeting usage is less than 50%
  15. 15. Identifying the problem U: What is my conference room utilization?
  16. 16. U: What is my conference room utilization? Me: What problem are you trying to solve? U: I want to know which departments are using the rooms the most. Me: Why do you want to know? U: To adjust the rooms to meet their needs Me: What are you going to do with that information? A: Buy new technology or furniture to better meet those needs Identifying the problem
  17. 17. Problem: Change meeting rooms to fit the needs of department Action: make purchasing decisions about technology or furniture Metrics: room utilization, organizer’s department, occupancy size, technology or furniture used U: What is my conference room utilization? Me: What problem are you trying to solve? U: I want to know which departments are using the rooms the most. Me: Why do you want to know? U: To adjust the rooms to meet their needs Me: What are you going to do with that information? A: Buy new technology or furniture to better meet those needs Identifying the problem
  18. 18. What problem are you trying to solve? What action will you take with this number?
  19. 19. What problem are you trying to solve? What action will you take with this number?
  20. 20. Data Strategy #2 Start simple and mature complexity over time
  21. 21. “Can you predict which customers will renew?”
  22. 22. The data science hierarchy of needs describes the stages of data complexity and insights Say hello to….
  23. 23. The Data Science Process
  24. 24. The Data Science Process
  25. 25. The Data Science Process
  26. 26. The Data Science Process
  27. 27. First point of value Descriptive Analytics are your first stage where you can actually solve a problem and take an action. Especially important for business end users who want to apply the results of your analysis.
  28. 28. First point of value Descriptive Analytics are your first stage where you can actually solve a problem and take an action. Especially important for business end users who want to apply the results of your analysis. Your early projects should not try to extend beyond this stage
  29. 29. First point of value Focus first on counting These will be... ● Easier to explain to your stakeholders ● Faster to build and for stakeholders to realize value ● Easier to focus on good infrastructure and process. Including tests and alerting.
  30. 30. First point of value Businesses spend 1-3 months to get this into production the first time They spend 1-3 years to really get this right Descriptive Analytics are your first stage where you can actually solve a problem and take an action.
  31. 31. Businesses spend 1-3 months to get this into production the first time They spend 1-3 years to really get this right 1-2 years to do this well 1-2 years integrate these 1+ years modeling to integrate optimizations
  32. 32. Businesses spend 1-3 months to get this into production the first time They spend 1-3 years to really get this right 1-2 years to do this well 1-2 years integrate these 1+ years modeling to integrate optimizations
  33. 33. Data Strategy #3 Practice good product development and iterate
  34. 34. “Can you also include...?”
  35. 35. Traditional product development lifecycle Developing a data product is the same as any product. Having this process in place will mean more success in your data endeavours. Concept Idea Generation Research Assess Opportunity Analysis Business Assessment Develop Create Launch Delivery
  36. 36. Understanding the problem to solve Concept Idea Generation Research Assess Opportunity Analysis Business Assessment Develop Create Launch Delivery Know the problem to solve and what action will be taken ● Identify the stakeholders ● Document the possible questions your stakeholders have ● Dive deep to find the root problem the stakeholders need to solve ● Identify the action they’re going to take once they have the information
  37. 37. What is the scope of needs for to answer the question and figuring out who needs to be involved ● What are the short-term and long-term goals for data? ● Who are the supporters and who are the opponents? ● Assuming we do this perfectly, what will we build first? ● What is the most evil thing which can be done? Concept Idea Generation Research Assess Opportunity Analysis Business Assessment Develop Create Launch Delivery Assess what other opportunities there are
  38. 38. Concept Idea Generation Research Assess Opportunity Analysis Business Assessment Develop Create Launch Delivery Create a requirements documentation which outlines what you plan to deliver ● Determine your project’s definition of success, when are you successful? ● Do product, design, and architecture reviews ● Determine team dependencies and business requirements ● Estimate costs, timelines and milestones Figure out a plan for answering the question
  39. 39. Concept Idea Generation Research Assess Opportunity Analysis Business Assessment Develop Create Launch Delivery As you develop the end deliverables you’re also building the infrastructure, testing & QA, alerting ● Stay focused ● Document other questions and possible data sources ● Build good architecture with testing and alerts ● Manage quality, only let clean data in! ● Backup and security Create something magical!
  40. 40. Concept Idea Generation Research Assess Opportunity Analysis Business Assessment Develop Create Launch Delivery Once you’ve completed, you need to package deliver in a way which your stakeholders can utilize ● Learn to speak the language of your stakeholders (executives or engineers) ● Review with stakeholders ● Evaluate expectations Communicate your insights
  41. 41. Concept Idea Generation Research Assess Opportunity Analysis Business Assessment Develop Create Launch + Maintain Delivery Data reports can become irrelevant and errors can arise so it is important to do ongoing reviews of the data ● Review dashboards: is data still relevant and actionable? ● Metrics meetings: does everyone still understand the data and are there new definitions which need to be evaluated? ● Domain specific reviews: meet with stakeholders and see what data is valuable to them and what actions they take. Plan to review the value of your insights
  42. 42. You win by continuing the product development You win by continuing the product development lifecycle, starting with data basics, and progressing data complexity over time. Concept Idea Generation Research Assess Opportunity Analysis Business Assessment Develop Create Launch + Maintain Delivery
  43. 43. Rinse and Repeat Concept Idea Generation Research Assess Opportunity Analysis Business Assessment Develop Create Launch + Maintain Delivery
  44. 44. Know what problem you are trying to solve Start simple and mature complexity over time Practice good product development and iterate Strategies:
  45. 45. Know what problem you are trying to solve Start simple and mature complexity over time Practice good product development and iterate Concept Idea Generation Research Assess Opportunity Analysis Business Assessment Develop Create Launch + Maintain Delivery Data science is successful when you learn something about the real world which helps you solve a problem by taking an action. Strategies:
  46. 46. References and Resources ● Rachel Schutt & Cathy O’Neil (2013) Doing Data Science: Straight Talk From the Frontline, Sebastopol, CA: O’Reilly ● DJ Patil & Hilary Mason (2015) Data Driven. Sebastopol, CA: O’Reilly ● DJ Patil (2011) Building Data Science Teams. Sebastopol, CA: O’Reilly ● Monica Rogati (2017) The AI Hierarchy of Needs ● Nick Crocker (2014) Thirty Things I’ve Learned ● Tavish Srivastava (2015) 13 Tips to make you awesome in Data Science / Analytics Jobs ● Daniel Tunkelang (2017) 10 Things Everyone Should Know About Machine Learning ● DJ Patil - Everything We Wish We'd Known About Building Data Products
  47. 47. Know what problem you are trying to solve Start simple and mature complexity over time Practice good product development and iterate Concept Idea Generation Research Assess Opportunity Analysis Business Assessment Develop Create Launch + Maintain Delivery Data science is successful when you learn something about the real world which helps you solve a problem by taking an action. Strategies:

×