Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Just Enough DevOps for Data Scientists
abida@salesforce.com
@ anyabida1
Anya Bida, SRE at Salesforce
About Anya
Sr. Member of Technical Staff (SRE)
Salesforce Production Engineering
Salesforce Einstein Platform
Co-organizer...
What I am going to talk about
What is DevOps
Salesforce Einstein Scales
Our goal
Top 10 tips
What’s next?
What is DevOps?
Software Development
Network &
SecurityInfrastructure
Build & Release
What is DevOps?
Software Development
Network &
SecurityInfrastructure
Build & Release
Data Science
What is DevOps?
Software Development
Network &
SecurityInfrastructure
Build & Release
Data Science
• Awesome library
on Sp...
Fastest Growing Top 5
Enterprise Software Company
$5.4B
FY15
$4.1B
FY14
$3.1B
FY13
$6.7B
FY16
$2.3B
FY12
$1.7B
FY11
$2.56B...
Our Goal
Time
Number of Predictions
Infrastructure Costs
Tip 1: Plan for Failure
Take off that Data Scientist hat now.
Simple Dashboard with KPIs
Tip 1: Plan for Failure
Take off that Data Scientist hat now.
Tip 1: Plan for Failure
Take off that Data Scientist hat now.
https://www.slideshare.net/jiboumans/how-to-measure-everythi...
Tip 1: Plan for Failure
Take off that Data Scientist hat now.
https://www.slideshare.net/jiboumans/how-to-measure-everythi...
Tip 2: Blue Green Deployments
https://docs.mobingi.com/official/guide/bg-deploy
Blue Machine
(old)
Green Machine
(new)
Use...
Tip 3: Assume people make mistakes
Technical debt
• Every manual change
• Duplicate metrics
Scale down resources
• Termina...
Tip 4: Changes should be auditable
Schaper - the tool to compare schemas
https://www.linkedin.com/in/huqixiu/
Qixiu “Q” Hu
Tip 4: Changes should be auditable
Schaper - the tool to compare schemas
https://www.linkedin.com/in/huqixiu/
Qixiu “Q” Hu...
Tip 4: Changes should be auditable
Schaper - the tool to compare schemas
https://www.linkedin.com/in/huqixiu/
Qixiu “Q” Hu...
Tip 5: Configuration management
Network Connectivity
• 20 parameters
User Access
• 50 parameters
Deploy cluster (eg Mesos)...
Templates for Automation
Service discovery
Creating dashboards
• Prod, non-prod, …
Log queries
Cost analysis
Tip 6: Pick a...
Tip 7: Permissions
Every user, service, & job should have specific, auditable permissions.
Cluster Manager
Scheduler
IAM
I...
Understanding Memory Management in Spark For Fun And Profit Shivnath Babu (Duke University, Unravel Data Systems)
Mayuresh...
Node
Memory
Node
Memory
Node
Memory
4Gb
used
8Gb
total
Can my 8Gb container launch on this cluster?
8Gb
Tip 9: Monitor multiple viewpoints
https://light.co/camera
Tip 9: Monitor multiple viewpoints
Connectivity Viewer
https://www.linkedin.com/in/vaibhavt/
Vaibhav Tandon
Tip 9: Monitor multiple viewpoints
Connectivity Viewer
https://www.linkedin.com/in/vaibhavt/
Vaibhav Tandon
Tip 9: Monitor multiple viewpoints
Connectivity Viewer
https://www.linkedin.com/in/vaibhavt/
Vaibhav Tandon
Getting started tips:
1. Plan for failure
2. Blue / Green Deployments
3. Assume people make mistakes
4. Changes should be ...
Getting started tips: 1. Plan for failure
2. Blue / Green Deployments
3. Assume people make mistakes
4. Changes should be ...
Did we just automate ourselves
out of our jobs?
Nope. Now we have time to take on new projects and grow…
More info:
Jos Boumans,
Salesforce DMP
slides
SRE How Google Runs
Production Systems book
James Ward,
Engineering & Open S...
More info:
Real Time ML Pipelines in Multi-Tenant Environments
Director of Engineering Karl Skucha & Lead Engineer Yan Yan...
abida@salesforce.com
@ anyabida1
Anya Bida, SRE at Salesforce
Questions?
Extra, unused slides
JustEnoughDevOpsForDataScientists
JustEnoughDevOpsForDataScientists
JustEnoughDevOpsForDataScientists
Upcoming SlideShare
Loading in …5
×

JustEnoughDevOpsForDataScientists

349 views

Published on

Let's say you're a data scientist, and you've been asked to build infrastructure. Here I've distilled some best practices as an introduction for people who are new to DevOps.

Published in: Technology
  • Login to see the comments

JustEnoughDevOpsForDataScientists

  1. 1. Just Enough DevOps for Data Scientists abida@salesforce.com @ anyabida1 Anya Bida, SRE at Salesforce
  2. 2. About Anya Sr. Member of Technical Staff (SRE) Salesforce Production Engineering Salesforce Einstein Platform Co-organizer SF Big Analytics Spark Tuning • Cheat-sheet • Talks Previously at Alpine Data, SRI PhD Mayo Clinic, BS Johns Hopkins @anyabida1
  3. 3. What I am going to talk about What is DevOps Salesforce Einstein Scales Our goal Top 10 tips What’s next?
  4. 4. What is DevOps? Software Development Network & SecurityInfrastructure Build & Release
  5. 5. What is DevOps? Software Development Network & SecurityInfrastructure Build & Release Data Science
  6. 6. What is DevOps? Software Development Network & SecurityInfrastructure Build & Release Data Science • Awesome library on SparkML • Spark clusters • Microservices • Cluster, Containers
  7. 7. Fastest Growing Top 5 Enterprise Software Company $5.4B FY15 $4.1B FY14 $3.1B FY13 $6.7B FY16 $2.3B FY12 $1.7B FY11 $2.56BFY18Q2 revenue $8.4BFY17 revenue 2009 • 2010 • 2011 2012 • 2013 • 2014 2015 • 2016 • 2017 September 2016 2011 • 2012 • 2013 2014 • 2015 • 2016 • 2017 The world’s most innovative companies “Innovator of the Decade”
  8. 8. Our Goal Time Number of Predictions Infrastructure Costs
  9. 9. Tip 1: Plan for Failure Take off that Data Scientist hat now.
  10. 10. Simple Dashboard with KPIs Tip 1: Plan for Failure Take off that Data Scientist hat now.
  11. 11. Tip 1: Plan for Failure Take off that Data Scientist hat now. https://www.slideshare.net/jiboumans/how-to-measure-everything-a-million-metrics-per-second-with-minimal-developer-overhead Simple Dashboard with KPIs • Request & error rates • Longest response times - upper 95th & 99th percentile • Capacity • Events Jos Boumans, Salesforce DMP slides
  12. 12. Tip 1: Plan for Failure Take off that Data Scientist hat now. https://www.slideshare.net/jiboumans/how-to-measure-everything-a-million-metrics-per-second-with-minimal-developer-overhead Simple Dashboard with KPIs • Request & error rates • Longest response times - upper 95th & 99th percentile • Capacity • Events Collect metrics from every machine. Troubleshoot with all the metrics at your disposal
  13. 13. Tip 2: Blue Green Deployments https://docs.mobingi.com/official/guide/bg-deploy Blue Machine (old) Green Machine (new) Users
  14. 14. Tip 3: Assume people make mistakes Technical debt • Every manual change • Duplicate metrics Scale down resources • Terminate unused machines • Janitor Monkey • Understand the cost per job • Jobs should not accumulate files on disk
  15. 15. Tip 4: Changes should be auditable Schaper - the tool to compare schemas https://www.linkedin.com/in/huqixiu/ Qixiu “Q” Hu
  16. 16. Tip 4: Changes should be auditable Schaper - the tool to compare schemas https://www.linkedin.com/in/huqixiu/ Qixiu “Q” Hu CREATE TABLE myConferences ( name text , city text, early_bird timeuuid, late_bird timeuuid, PRIMARY KEY ((name, city), early_bird) ) WITH CLUSTERING ORDER BY (early_bird DESC); CREATE TABLE myConferences ( name text , city text, early_bird timeuuid, late_bird timeuuid, PRIMARY KEY ((name, city), early_bird) ) WITH CLUSTERING ORDER BY (early_bird DESC);
  17. 17. Tip 4: Changes should be auditable Schaper - the tool to compare schemas https://www.linkedin.com/in/huqixiu/ Qixiu “Q” Hu CREATE TABLE myConferences ( name text , city text, early_bird timeuuid, late_bird timeuuid, PRIMARY KEY ((name, city), early_bird) ) WITH CLUSTERING ORDER BY (early_bird DESC); CREATE TABLE myConferences ( name text , city text, early_bird timeuuid, late_bird timeuuid, discount_code string, PRIMARY KEY ((name, city), early_bird) ) WITH CLUSTERING ORDER BY (early_bird DESC);
  18. 18. Tip 5: Configuration management Network Connectivity • 20 parameters User Access • 50 parameters Deploy cluster (eg Mesos) • 20 non-default parameters Deploy a microservice • 50 parameters Schedule a job • 3 parameters SUM X 3 regions X 20 metrics Approx.6000
  19. 19. Templates for Automation Service discovery Creating dashboards • Prod, non-prod, … Log queries Cost analysis Tip 6: Pick a naming convention <service>. <environment>. <region>. <hostname>. <metric>
  20. 20. Tip 7: Permissions Every user, service, & job should have specific, auditable permissions. Cluster Manager Scheduler IAM IAM Roles • User has an IAM Role • Job has an IAM Role • IAM Roles determine read / write access to data IAM Out Logs IAM In
  21. 21. Understanding Memory Management in Spark For Fun And Profit Shivnath Babu (Duke University, Unravel Data Systems) Mayuresh Kunjir (Duke University) Tip 8: Understand resource allocation Node Memory Container Memory 8Gb Node Memory Container Memory 8Gb
  22. 22. Node Memory Node Memory Node Memory 4Gb used 8Gb total Can my 8Gb container launch on this cluster? 8Gb
  23. 23. Tip 9: Monitor multiple viewpoints https://light.co/camera
  24. 24. Tip 9: Monitor multiple viewpoints Connectivity Viewer https://www.linkedin.com/in/vaibhavt/ Vaibhav Tandon
  25. 25. Tip 9: Monitor multiple viewpoints Connectivity Viewer https://www.linkedin.com/in/vaibhavt/ Vaibhav Tandon
  26. 26. Tip 9: Monitor multiple viewpoints Connectivity Viewer https://www.linkedin.com/in/vaibhavt/ Vaibhav Tandon
  27. 27. Getting started tips: 1. Plan for failure 2. Blue / Green Deployments 3. Assume people make mistakes 4. Changes should be auditable 5. Configuration management 6. Pick a naming convention 7. Permissions • user, service, job 8. Understand resource allocation 9. Monitor multiple viewpoints
  28. 28. Getting started tips: 1. Plan for failure 2. Blue / Green Deployments 3. Assume people make mistakes 4. Changes should be auditable 5. Configuration management 6. Pick a naming convention 7. Permissions • user, service, job 8. Understand resource allocation 9. Monitor multiple viewpoints 10. Infrastructure as Code
  29. 29. Did we just automate ourselves out of our jobs? Nope. Now we have time to take on new projects and grow…
  30. 30. More info: Jos Boumans, Salesforce DMP slides SRE How Google Runs Production Systems book James Ward, Engineering & Open Source Ambassador at Salesforce High Performance spark book
  31. 31. More info: Real Time ML Pipelines in Multi-Tenant Environments Director of Engineering Karl Skucha & Lead Engineer Yan Yang Introduction to Machine Learning Engineering & Open Source Ambassador James Ward Fantastic ML apps and how to build them Principal Engineer, Matthew Tovbin Fireworks - lighting up the sky with millions of Sparks Director of Engineering Thomas Gerber Functional Linear Algebra in Scala Engineer & Professor Vlad Patryshev Panel: Functional Programming for Machine Learning Saturday @ 2:10pm —Complex Machine Learning Pipelines Made Easy Machine Learning Engineers Till Bergmann & Chris Rupley
  32. 32. abida@salesforce.com @ anyabida1 Anya Bida, SRE at Salesforce
  33. 33. Questions?
  34. 34. Extra, unused slides

×