Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Giovanni Lanzani GoDataDriven

Applied data science: where does it go wrong when developing data products?
Giovanni Lanzani

  • Login to see the comments

  • Be the first to like this

Giovanni Lanzani GoDataDriven

  1. 1. APPLIED DATA SCIENCE Giovanni Lanzani – Chief Science Officer GoDataDriven @gglanzani
  2. 2. WHO AM I Italy 01 Leiden University 02 KPMG 03 GoDataDriven 04
  4. 4. LEARNING FROM DATA • You have some (lots) of data • You need to generalize
  5. 5. BEST MODEL • Which one would you choose here? • It’s about making a tradeoff • This trade off is the most important job of the PO • A 100% correct answer might not exist!!!
  7. 7. ULTIMATELY • It’s about creating value from data • Using Machine Learning, Advanced Analytics, and visualization
  8. 8. WHEN YOU SAY DATA SCIENCE, COMPANIES UNDERSTAND • All the things big data • Predictive modeling & Advanced Analytics • More money • Do all the cool things the others are doing
  10. 10. TRADITIONAL DATA WAREHOUSE ARCHITECTURE EDW Data consumer Web app Dashboard / Reporting Traditional Business app
  11. 11. AND NOW? ? Data consumer Web app Dashboard / Reporting Traditional Business app API
  12. 12. WHAT COMPANIES GOT • A lot of POCs • A lot of screenshots/presentations/dashboards on a laptop • Nice stories to tell to their network, about those screenshots and especially those dashboards • Headaches with data and infra even more scattered
  13. 13. BUT… • We got a data scientist working on trees, and forests • Neural networks! • Deep learning!!!
  14. 14. WHAT DO COMPANIES ACTUALLY NEED • Put things into production • They don’t teach that in any data science course or MOOC (that I know)
  15. 15. THE THREE HURDLES Credit to Jon Shave
  16. 16. OVERSIMPLIFYING Requirements Data Sources Exploration Modeling Products Feedback Data scientist ML engineer Data engineer Data engineer 🤦🤦♀️ 🤦🤦 Customers
  17. 17. KAGGLE CURSE • • Many data scientists approach the problem at hand with a Kaggle-like mentality: delivering the best model in absolute terms, no matter what the practical implications are. • In reality it's not the best model that we implement, but the one that combines quality and practicality: a continuous balancing act • Netflix competition
  18. 18. SOLVING THEM
  19. 19. BUSINESS CASE Business case for • True Positives • True Negatives Cost of • False Positives • False Negatives
  20. 20. DATA Data {insert something here} should be pro grade
  21. 21. SKILLS • Participate in actually building production quality systems OR being proficient enough in R or python to hack together a prototype on a very small dataset? • Supply of the second group keeps growing while demand is flat or shrinking • Especially as executives get burned by “data scientists” who don't know how to help them build things of value
  22. 22. HIRING • Companies that are not engineering driven, often have trouble hiring good technical people • The “IQ” test is not really representative of applied data science • At GoDataDriven we do a “at home, at your convenience” assessment • Real dataset, real business question, real product • Models are software: treat them as such
  23. 23. TAKEAWAYS • POs should know “their stuff” • Automate all the data movements • Hire data scientists that are good at programming (or hire machine learning engineers)
  24. 24. QUESTIONS? • We’re hiring • Data & Machine Learning Engineers! •