Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Retooling on the Modern Data and Analytics Tech Stack

Presentation to TDWI STL from Adam Doyle and Susan King on a framework for retooling on the modern data and analytics tech stack.

  • Be the first to comment

Retooling on the Modern Data and Analytics Tech Stack

  1. 1. Confidential and Proprietary to Daugherty Business Solutions Retooling on the Modern Data and Analytics Stack 2/2020
  2. 2. Confidential and Proprietary to Daugherty Business Solutions 2 What is the Modern Tech Stack? The tools and technologies needed to solve difficult problems due to their size, speed, and complexity.
  3. 3. Confidential and Proprietary to Daugherty Business Solutions 3 Competencies Information Management Data Solutions Modern Data Architectures Data Science Data Governance
  4. 4. Confidential and Proprietary to Daugherty Business Solutions 4 Information Management Data loading Data modeling Querying
  5. 5. Confidential and Proprietary to Daugherty Business Solutions 5 IM Focus: NoSQL
  6. 6. Confidential and Proprietary to Daugherty Business Solutions 6 IM Focus: Platforms & Services
  7. 7. Confidential and Proprietary to Daugherty Business Solutions 7 IM Focus: Serialization JSON
  8. 8. Confidential and Proprietary to Daugherty Business Solutions 8 Data Solutions Tell me a story…
  9. 9. Confidential and Proprietary to Daugherty Business Solutions 9 Data Solutions Focus Profiling Sampling Aggregation
  10. 10. Confidential and Proprietary to Daugherty Business Solutions 10 Data Governance Governance outputs are eternal.
  11. 11. Confidential and Proprietary to Daugherty Business Solutions 11 Focus On… Scale
  12. 12. Confidential and Proprietary to Daugherty Business Solutions 12 Data Engineering Data Science Decision ScienceData Science Decision Science
  13. 13. Confidential and Proprietary to Daugherty Business Solutions 13 Data Science
  14. 14. Confidential and Proprietary to Daugherty Business Solutions 14 Focus On…
  15. 15. Confidential and Proprietary to Daugherty Business Solutions 15 Modern Data Architecture Programmatic data manipulation
  16. 16. Confidential and Proprietary to Daugherty Business Solutions 16 Cloud
  17. 17. Confidential and Proprietary to Daugherty Business Solutions 17 Big Data
  18. 18. Confidential and Proprietary to Daugherty Business Solutions 18 Streaming Kafka for Publish/Subscribe KSQL – Kafka + SQL Debezium – Change Data Capture
  19. 19. Confidential and Proprietary to Daugherty Business Solutions 19 Data Engineering https://www.logicalclocks.com/blog/feature-store-the-missing-data-layer-in-ml-pipelines
  20. 20. Confidential and Proprietary to Daugherty Business Solutions 20 Focus on…
  21. 21. Confidential and Proprietary to Daugherty Business Solutions 21 Five Steps to Retooling Awareness Exposure Guided Practice Evolving Practice Growing Expertise
  22. 22. Confidential and Proprietary to Daugherty Business Solutions AWARENESS AWARENESS AWARE, -ISH
  23. 23. Confidential and Proprietary to Daugherty Business Solutions AWARENESS Podcasts Data science / advanced tech groups Major tech companies
  24. 24. Confidential and Proprietary to Daugherty Business Solutions EXPOSURE Use case studies Blogs Try-it-for-free
  25. 25. Confidential and Proprietary to Daugherty Business Solutions GUIDED PRACTICE Online courses Free AWS and Azure accounts Open-source downloads
  26. 26. Confidential and Proprietary to Daugherty Business Solutions EVOLVING PRACTICE 1. Pick something familiar 2. Make it a little strange 3. Rinse & repeat
  27. 27. Confidential and Proprietary to Daugherty Business Solutions Pick Best Candidate Ingest People Data Find Match Candidates Clean People Data Save Results MySQL (Local) Python (Local) Python (Local) Python (Local) Python (Local) Python (Local) Local File Local File
  28. 28. Confidential and Proprietary to Daugherty Business Solutions Pick Best Candidate Ingest People Data Find Match Candidates Clean People Data Save Results MySQL (Local) Python (Local) Python (Local) Python (Local) Python (Local) Python (Local) AWS S3 AWS S3
  29. 29. Confidential and Proprietary to Daugherty Business Solutions Pick Best Candidate Ingest People Data Find Match Candidates Clean People Data Save Results Python (Local) Python (Local) Python (Local) Python (Local) Python (Local) AWS S3 AWS S3 AWS RDS
  30. 30. Confidential and Proprietary to Daugherty Business Solutions Pick Best Candidate Ingest People Data Find Match Candidates Clean People Data Save Results AWS S3 AWS S3 AWS RDS Python Lambda Python Lambda Python Lambda Python Lambda Python Lambda
  31. 31. Confidential and Proprietary to Daugherty Business Solutions AWS Step Function Pick Best Candidate Ingest People Data Find Match Candidates Clean People Data Save Results AWS S3 AWS S3 AWS RDS Python Lambda Python Lambda Python Lambda Python Lambda Python Lambda
  32. 32. Confidential and Proprietary to Daugherty Business Solutions GROWING EXPERTISE 1. Add new features
  33. 33. Confidential and Proprietary to Daugherty Business Solutions AWS Step Function Pick Best Candidate Ingest People Data Find Match Candidates Clean People Data Save Results AWS S3 AWS S3 AWS RDS Python Lambda Python Lambda Python Lambda Python Lambda Python Lambda Dynamo DB Add New Features
  34. 34. Confidential and Proprietary to Daugherty Business Solutions GROWING EXPERTISE 1. Add new features 2. Improve scalability
  35. 35. Confidential and Proprietary to Daugherty Business Solutions AWS Step Function Pick Best Candidate Ingest People Data Find Match Candidates Clean People Data Save Results AWS S3 AWS S3 AWS RDS Python Lambda ECS Task ECS Task ECS Task Python Lambda Dynamo DB Improve Scalability
  36. 36. Confidential and Proprietary to Daugherty Business Solutions GROWING EXPERTISE 1. Add new features 2. Improve scalability 3. Improve performance
  37. 37. Confidential and Proprietary to Daugherty Business Solutions AWS Step Function Pick Best Candidate Ingest People Data Find Match Candidates Clean People Data Save Results AWS S3 AWS S3 Python Lambda ECS Task EMR ECS Task Python Lambda Dynamo DB Improve Performance Spark Scala
  38. 38. Confidential and Proprietary to Daugherty Business Solutions GROWING EXPERTISE 1. Add new features 2. Improve scalability 3. Improve performance 4. Batch vs. stream
  39. 39. Confidential and Proprietary to Daugherty Business Solutions GROWING EXPERTISE 1. Add new features 2. Improve scalability 3. Improve performance 4. Batch vs. stream 5. Automation
  40. 40. Confidential and Proprietary to Daugherty Business Solutions 40 Conclusion Don’t do too much at once!
  41. 41. Confidential and Proprietary to Daugherty Business Solutions 41 Questions?
  42. 42. Confidential and Proprietary to Daugherty Business Solutions Resources General • https://www.analyticsvidhya.com/blog/2018/11/data-engineer-comprehensive-list-resources-get- started/ • https://towardsdatascience.com/who-is-a-data-engineer-how-to-become-a-data-engineer- 1167ddc12811 • https://www.dataquest.io/path/data-engineer/ • https://dataengweekly.com/ Podcasts • https://towardsdatascience.com/our-podcast-c5c1129bc5cf • https://www.stitcher.com/podcast/httpanalyticshourlibsyncom/the-digital-analytics-power-hour • https://www.stitcher.com/podcast/data-stories-podcast/data-stories • https://www.stitcher.com/podcast/data-skeptic-podcast/the-data-skeptic-podcast (Data Science focused) • https://www.stitcher.com/podcast/oreilly-media-2/the-oreilly-data-show-podcast?refid=stpr • https://www.dataengineeringpodcast.com/ Reference Architectures • https://medium.com/refraction-tech-everything/how-netflix-works-the-hugely-simplified-complex- stuff-that-happens-every-time-you-hit-play-3a40c9be254b • http://highscalability.com/blog/2015/11/9/a-360-degree-view-of-the-entire-netflix-stack.html (older but interesting) • https://medium.com/airbnb-engineering/airbnb-engineering-infrastructure/home • https://towardsdatascience.com/how-linkedin-uber-lyft-airbnb-and-netflix-are-solving-data- management-and-discovery-for-machine-9b79ee9184bb
  43. 43. Confidential and Proprietary to Daugherty Business Solutions Resources – continued Use Cases • https://www.mongodb.com/use-cases • https://www.confluent.io/blog/category/use-cases/ • https://kafka.apache.org/uses • https://aws.amazon.com/big-data/use-cases/ • https://www.dataversity.net/eight-big-data-analytics-options-on-microsoft-azure/ • https://www.toptal.com/spark/introduction-to-apache-spark Try it for Free • https://neo4j.com/sandbox/ (Neo4J • https://www.mongodb.com/cloud/atlas/lp/general/try (MongoDB) • https://www.postman.com/ + https://www.guru99.com/postman-tutorial.html (trying out APIs) • https://databricks.com/try-databricks (Spark) • https://jupyter.org/try
  44. 44. Confidential and Proprietary to Daugherty Business Solutions Resources – continued Open Source Downloads + Guides • https://spark.apache.org/docs/latest/index.html • https://kafka.apache.org/documentation/#gettingStarted • https://www.mongodb.com/download-center/community • https://www.python.org/about/gettingstarted/ Free Cloud Accounts • https://aws.amazon.com/free/ • https://azure.microsoft.com/en-us/free/ Online Training • www.acloud.guru *Recommended • www.coursera.org • www.udemy.com

    Be the first to comment

  • SirishaKadamalakalva

    Feb. 14, 2021

Presentation to TDWI STL from Adam Doyle and Susan King on a framework for retooling on the modern data and analytics tech stack.

Views

Total views

177

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

5

Shares

0

Comments

0

Likes

1

×