Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How to deliver effective data science projects

Aravind Chiruvelli's presentation at IDEAS Dallas Data Science Conference
(Lead Data Scientist at ThoughtWorks)

  • Login to see the comments

How to deliver effective data science projects

  1. 1. Un-siloing Data Science Teams Aravind Chiruvelli, PhD Lead Data Scientist, ThoughtWorks
  2. 2. Patterns I came across “We are on the pace of transforming ourselves into a tech company, we must explore some data science PoC’s” “We need to make better use of our data, must be good case for data science” “We can automate many tasks using machine learning, lets do a PoC” “Let’s build a cool machine learning model and take it business”
  3. 3. Hindsight Insight Foresight Dr. Ken Collier, Director -AgileAnalytics, ThoughtWorks. Value from Data
  4. 4. Data First Businesses
  5. 5. Data is mine Model is mine Data First Businesses
  6. 6. ! Data Science projects often start as PoC’s ! Works great to mitigate the hype ! Low cost The Proof of Concept (PoC) Mode
  7. 7. ! Not always business value focussed ! Suffers from ad hoc prioritization ! Expectation mismatch ! Often unclear roadmap/vision Limited Value with PoC
  8. 8. ! ! ! Business first approach Empower Data team with product mindset Focus on reusability (code, infrastructure) ! Poly-skilled team ! Avoids standalone tabletop solutions ! Iterate with a vision ! Enables build platform to support multiple solutions From PoC to MVP
  9. 9. POC MVP
  10. 10. ! ! Data Science projects are Not Requirement driven (Well, For the most part) Data Science projects are not always Test driven (still very important to write tests) ! Data Science projects are always Hypothesis driven Data Exploration Feature Analysis Feature Engineering BuildEvaluate Model Domain Knowledge DSLC: The Data Science Life Cycle
  11. 11. Data Exploration Feature Analysis Feature Engineering BuildEvaluate Model But this is PoC DoMmodaein Knowled ge DSLC: The Data Science Life Cycle
  12. 12. Domain Knowledge Productionalize Model Data Exploration Feature Analysis Feature Engineering BuildEvaluate Model DPLC: The Data Product Life Cycle
  13. 13. Productionalize Model Data Exploration Feature Analysis Feature Engineering BuildEvaluate Model ! Productionalizing advanced analytics is not an afterthought ! Requires Data Scientists work with Data engineers !DomainPushes product thinking for the entire Kntea om wledg! Enables production ready code e DPLC: The Data Product Life Cycle
  14. 14. Productionalize Model Data Exploration Feature Analysis Feature Engineering BuildEvaluate Model Iteration 1 Learn Measure Adopt Agile Discipline
  15. 15. Data Exploration Feature Analysis Feature Engineering Build Evaluate Model Iteration 1 Iteration 2 Productionalize Model Learn Measure Adopt Agile Discipline
  16. 16. Getting data, data quality checks, begin to build the pipeline Exploration , Experimentation and Model building Now that we have data and certain transformations, Time to start on underlying computation framework Feature engineering, Experimentation, and Model building & evaluation Build CD, Model management,integrate Agile DPLC in action
  17. 17. ●A machine learning platform allows rapid experimentation ●Allows feature sharing between teams ● Model management and versioning ● Faster path to production ●A collaborative and shareable infrastructure Product Thinking + Platform Approach Accelerate with Platform
  18. 18. Pay attention to Underlying Math “I would rather have questions that can't be answered than answers that can't be questioned.” - Richard P.Feynman, Physicist
  19. 19. “Data Science is a team sport” -DJ Patil
  20. 20. Thank you archiru@thoughtworks.com

×