SlideShare a Scribd company logo
1 of 37
Scaling
Operations
atSpotify
Service Manager Dag. April 2015
David Poblador i Garcia - @davidpoblador
About Spotify…
About David…
‣ JoinedSpotifyin2011
‣ Infrastructure/Operationsbackground
‣ LedtheSiteReliabilityteamatSpotifyfor3+years
‣ CurrentlyleadingtheServiceAvailabilityteam
RealTime Monitoring, Security, Network Engineering, Service Capacity, Operating System
Spotify nowadays
Some numbers
Over 15 million
paying subscribers
Paying subscribers
Over 60 million
active users
Active users
Over 30 million
songs
Number of songs
Over 20,000 new
songs per day
Added songs per day
Over 1.5 billion
playlists
Number of playlists
Available in 58
markets
Number of markets
Spotify nowadays…
‣ Over15millionpayingsubscribers
‣ Over60millionactiveusers
‣ Over30millionsongs
Morethan 20,000 added everyday
‣ Over1.5billionplaylists
‣ Availablein58markets
Andorra,Argentina,Austria,Australia, Belgium, Bolivia, Brazil, Bulgaria, Canada, Chile, Colombia, Costa Rica, Cyprus,
Czech Republic, Denmark, Dominican Republic, Ecuador, El Salvador, Estonia, Finland, France, Germany, Greece,
Guatemala, Honduras, Hong Kong, Hungary, Iceland, Ireland, Italy, Latvia, Liechtenstein, Lithuania, Luxembourg,
Malaysia, Malta, Mexico, Monaco, NewZealand, Netherlands, Nicaragua, Norway, Panama, Paraguay, Peru,
Philippines, Poland, Portugal, Singapore, Slovakia, Spain, Sweden, Switzerland,Taiwan,Turkey, UK, Uruguayand
USA.
But this talk is about
how to scale an
Operations team…
Let’s have a look

at the past…
Late
2011
Operations team

in 2011
Operations. Now and then
2011
Spread too thin
5 people
Operations. Now and then
2011
Spread too thin
5 people
Now
?
Operations. Now and then
2011
Spread too thin
5 people
Now
No team
Timeline
Backend Infrastructure
SRE
Internal IT
I/O
Early 2011 Mid 2012 Sep 2013
Operations
Dev
Feature teams
2008
How do we operate
our services?
How Spotify works
System Ownership
at Spotify…
Spotify Engineering Culture
Operations in
Squads
Ops in Squads Background
• Impossible to scale a central operations team
• Understaffed
• Difficult to find generalists
• We believe that operation has to sit close to development
• Our bet for autonomy
• Break dependencies
• End to end responsibility
Vicious

circle
Operations does not have enough
time to support squads
Squads invent a non-standard
square wheel for their particular
problem
Increasing technical debt due to a
lot of differently shaped wheels
System ownership and
operational support is complex
We need highly
skilled systems
engineers
It's difficult to hire
skilled engineers
Operations in Squads
Timeline
Backend Infrastructure
SRE
Internal IT
I/O
Early 2011 Mid 2012 Sep 2013
Operations
Dev
Feature teams
2008
Current status
‣ IncidentManagersonCall(IMOC)
Groupthat coordinates incidents affecting multipleteams.
‣ Increasedavailability
Ouravailabilitykeeps improving.
Areas of improvement
‣ Theexpectationsweplaceonsquadsaresometimesunclear
Too manythingsto do.
‣ Communicationbetweenfeatureteamsandinfrastructureteams
Questions squads have are not fullyunderstood/answered byteams providing infrastructure.
Ops in Squads
ExpectationsCapacity Planning
Alerting
Graphing
Define SLA
Backups
Restore tests Service Operational

Quality Checklist
Recoverability
Identify high level

metrics
Security Reviews
Incident Tracking
Remediate incidents
Manageability
RedundancyHigh Availability
Deprecate
Deployment
Manage perimeter
Recoverability
Graceful

Degradation
System

Review
Upgrade OS
To summarize
DevOps → Dev == Ops
DevOps > Dev + Ops
It is easier to learn if you
can work on the full stack
Howeveryengineerat
Spotifybecameasysadmin
andtheOpsteamstopped
gettingupatnight
Service Manager Dag. April 2015
David Poblador i Garcia - @davidpoblador
Thank you.
@davidpoblador

More Related Content

What's hot

The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
Kai Wähner
 
A/B Testing Pitfalls and Lessons Learned at Spotify
A/B Testing Pitfalls and Lessons Learned at SpotifyA/B Testing Pitfalls and Lessons Learned at Spotify
A/B Testing Pitfalls and Lessons Learned at Spotify
Danielle Jabin
 
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Spark Summit
 

What's hot (20)

Spotify architecture - Pressing play
Spotify architecture - Pressing playSpotify architecture - Pressing play
Spotify architecture - Pressing play
 
UiPath Test Suite
UiPath Test Suite UiPath Test Suite
UiPath Test Suite
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
 
Data Stores @ Netflix
Data Stores @ NetflixData Stores @ Netflix
Data Stores @ Netflix
 
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
 
Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
Tableau Conference 2018: Binging on Data - Enabling Analytics at NetflixTableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
 
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
 
A/B Testing Pitfalls and Lessons Learned at Spotify
A/B Testing Pitfalls and Lessons Learned at SpotifyA/B Testing Pitfalls and Lessons Learned at Spotify
A/B Testing Pitfalls and Lessons Learned at Spotify
 
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
 
Spotify: behind the scenes
Spotify: behind the scenesSpotify: behind the scenes
Spotify: behind the scenes
 
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiReal-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
 
Splunk at Airbus
Splunk at AirbusSplunk at Airbus
Splunk at Airbus
 
UiPath Test Suite Overview
UiPath Test Suite OverviewUiPath Test Suite Overview
UiPath Test Suite Overview
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
 
Summit supercomputer
Summit supercomputerSummit supercomputer
Summit supercomputer
 
Distributed Trace & Log Analysis using ML
Distributed Trace & Log Analysis using MLDistributed Trace & Log Analysis using ML
Distributed Trace & Log Analysis using ML
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysis
 
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDBComparing Apache Cassandra 4.0, 3.0, and ScyllaDB
Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB
 
Kubernetes for machine learning
Kubernetes for machine learningKubernetes for machine learning
Kubernetes for machine learning
 

Viewers also liked

Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At Spotify
Adam Kawa
 

Viewers also liked (20)

DevOps at Spotify: There and Back Again
DevOps at Spotify: There and Back AgainDevOps at Spotify: There and Back Again
DevOps at Spotify: There and Back Again
 
Full stackagile - Squads Chapters Tribes and Guilds
Full stackagile - Squads Chapters Tribes and GuildsFull stackagile - Squads Chapters Tribes and Guilds
Full stackagile - Squads Chapters Tribes and Guilds
 
How Spotify Builds Products (Organization. Architecture, Autonomy, Accountabi...
How Spotify Builds Products (Organization. Architecture, Autonomy, Accountabi...How Spotify Builds Products (Organization. Architecture, Autonomy, Accountabi...
How Spotify Builds Products (Organization. Architecture, Autonomy, Accountabi...
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At Spotify
 
Microservices at Spotify
Microservices at SpotifyMicroservices at Spotify
Microservices at Spotify
 
Agile at Spotify
Agile at SpotifyAgile at Spotify
Agile at Spotify
 
Activation: From thinking to tweaking it, how we do it at Spotify
Activation: From thinking to tweaking it, how we do it at Spotify Activation: From thinking to tweaking it, how we do it at Spotify
Activation: From thinking to tweaking it, how we do it at Spotify
 
Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved
 
Data at Spotify
Data at SpotifyData at Spotify
Data at Spotify
 
Quality Built In @ Spotify
Quality Built In @ SpotifyQuality Built In @ Spotify
Quality Built In @ Spotify
 
SREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsSREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREs
 
Spotify for Brands
Spotify for BrandsSpotify for Brands
Spotify for Brands
 
Machine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyMachine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at Spotify
 
ContainerCon - Test Driven Infrastructure
ContainerCon - Test Driven InfrastructureContainerCon - Test Driven Infrastructure
ContainerCon - Test Driven Infrastructure
 
Playlists at Spotify - Using Cassandra to store version controlled objects
Playlists at Spotify - Using Cassandra to store version controlled objectsPlaylists at Spotify - Using Cassandra to store version controlled objects
Playlists at Spotify - Using Cassandra to store version controlled objects
 
Taking AppSec to 11 - BSides Austin 2016
Taking AppSec to 11 - BSides Austin 2016Taking AppSec to 11 - BSides Austin 2016
Taking AppSec to 11 - BSides Austin 2016
 
Docker at Spotify - Dockercon14
Docker at Spotify - Dockercon14Docker at Spotify - Dockercon14
Docker at Spotify - Dockercon14
 
Empowering Engineering Talent - an update from Spotify
Empowering Engineering Talent - an update from SpotifyEmpowering Engineering Talent - an update from Spotify
Empowering Engineering Talent - an update from Spotify
 
How Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyHow Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At Spotify
 
BsidesMCR_2016-what-can-infosec-learn-from-devops
BsidesMCR_2016-what-can-infosec-learn-from-devopsBsidesMCR_2016-what-can-infosec-learn-from-devops
BsidesMCR_2016-what-can-infosec-learn-from-devops
 

Similar to Scaling Operations At Spotify

NXTPLabs 2015 - #OpenCedex
NXTPLabs 2015 - #OpenCedexNXTPLabs 2015 - #OpenCedex
NXTPLabs 2015 - #OpenCedex
Pablo Ruiz
 
International Search Engine Optimization and Website Translation Best Practices
International Search Engine Optimization and Website Translation Best PracticesInternational Search Engine Optimization and Website Translation Best Practices
International Search Engine Optimization and Website Translation Best Practices
Boulder SEO Marketing
 

Similar to Scaling Operations At Spotify (20)

Эволюция службы эксплуатации «Spotify» / Лев Попов (Spotify)
Эволюция службы эксплуатации «Spotify» / Лев Попов (Spotify)Эволюция службы эксплуатации «Spotify» / Лев Попов (Spotify)
Эволюция службы эксплуатации «Spotify» / Лев Попов (Spotify)
 
2015 1029 webinar_meet_the_tech_savvy_cfo
2015 1029 webinar_meet_the_tech_savvy_cfo2015 1029 webinar_meet_the_tech_savvy_cfo
2015 1029 webinar_meet_the_tech_savvy_cfo
 
Atmosphere Conference 2015: Service Operations Evolution at Spotify
Atmosphere Conference 2015: Service Operations Evolution at SpotifyAtmosphere Conference 2015: Service Operations Evolution at Spotify
Atmosphere Conference 2015: Service Operations Evolution at Spotify
 
Google
GoogleGoogle
Google
 
NXTPLabs 2015 - #OpenCedex
NXTPLabs 2015 - #OpenCedexNXTPLabs 2015 - #OpenCedex
NXTPLabs 2015 - #OpenCedex
 
2017March-Future of Artificaial Intelligence in IT
2017March-Future of Artificaial Intelligence in IT2017March-Future of Artificaial Intelligence in IT
2017March-Future of Artificaial Intelligence in IT
 
TCS North America Presentation 5 07 2009
TCS North America Presentation   5 07 2009TCS North America Presentation   5 07 2009
TCS North America Presentation 5 07 2009
 
Crawl, Walk, Run: How to Get Started with Hadoop
Crawl, Walk, Run: How to Get Started with HadoopCrawl, Walk, Run: How to Get Started with Hadoop
Crawl, Walk, Run: How to Get Started with Hadoop
 
Agtech Industry and Technologies
Agtech Industry and TechnologiesAgtech Industry and Technologies
Agtech Industry and Technologies
 
Getting to Know the Community
Getting to Know the CommunityGetting to Know the Community
Getting to Know the Community
 
International Search Engine Optimization and Website Translation Best Practices
International Search Engine Optimization and Website Translation Best PracticesInternational Search Engine Optimization and Website Translation Best Practices
International Search Engine Optimization and Website Translation Best Practices
 
Will IT still exist in 2020?
Will IT still exist in 2020?Will IT still exist in 2020?
Will IT still exist in 2020?
 
IT in 2020
IT in 2020IT in 2020
IT in 2020
 
Michel hebert info tech - misa presentation
Michel hebert   info tech - misa presentationMichel hebert   info tech - misa presentation
Michel hebert info tech - misa presentation
 
Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...
Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...
Cigniti joint webinar with Soasta - Agile DevOps: Test-driven IT Environment ...
 
The Venture Capital Opportunity in Asia
The Venture Capital Opportunity in AsiaThe Venture Capital Opportunity in Asia
The Venture Capital Opportunity in Asia
 
Generalist.pdf
Generalist.pdfGeneralist.pdf
Generalist.pdf
 
How the Telegraph Transitioned from Web Support to a DevOps Culture
How the Telegraph Transitioned from Web Support to a DevOps CultureHow the Telegraph Transitioned from Web Support to a DevOps Culture
How the Telegraph Transitioned from Web Support to a DevOps Culture
 
The State of Streaming Analytics: The Need for Speed and Scale
The State of Streaming Analytics: The Need for Speed and ScaleThe State of Streaming Analytics: The Need for Speed and Scale
The State of Streaming Analytics: The Need for Speed and Scale
 
Blending ITIL, Agile, DevOps and LeanUX at Auto Trader UK
Blending ITIL, Agile, DevOps and LeanUX at Auto Trader UKBlending ITIL, Agile, DevOps and LeanUX at Auto Trader UK
Blending ITIL, Agile, DevOps and LeanUX at Auto Trader UK
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Scaling Operations At Spotify