SlideShare a Scribd company logo
1 of 45
Scaling Data Infrastructure
@ Spotify
matti@spotify.com
kalvans@spotify.com
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
spotify-data-infrastructure
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
Mārtiņš Kalvāns
kalvans@spotify.com
Matti Pehrs
matti@spotify.com
Agenda
1. Data at Spotify
2. Summer of 2015
3. Challenges & Victory
○ Datamon
○ Styx
○ GABO
Spotify big-data context
● Over 100 million monthly active users
● Over 30 million song
● Over 2 billion playlists
● Active in 60 markets
Data is at the heart of Spotify
In 2007
- Monthly Royalty Report
In 2016
- Monthly Royalty Report
- Weekly Billboard
- Daily reports to partners
- ...
- AB-Testing
- Discover weekly
- Daily Mix
- ...
Our growth in Data
Users
+50 TB/day
+100M Users
Developers
+60 TB/day
+10k M/R jobs
Hadoop
Autonomy & Dependencies
Team A
Team B
Team C
Autonomy & Dependencies
Autonomy & Dependencies
Autonomy & Dependencies
Summer of Incidents
● A strain of incidents
Summer of Incidents
● A strain of incidents
● War-room
Summer of Incidents
● A strain of incidents
● War-room
● Hadoop on it’s knees
Summer of Incidents
● A strain of incidents
● War-room
● Hadoop on it’s knees
● Event Delivery Catch up
Summer of Incidents
● A strain of incidents
● War-room
● Hadoop on it’s knees
● Event Delivery Catch up
● Reprocessing of data
Summer of Incidents
● A strain of incidents
● War-room
● Hadoop on it’s knees
● Event Delivery Catch up
● Reprocessing of data
● Hard to debug data issues
Summer of Incidents
Challenges and the path to victory...
1. Early Warning Datamon - Data monitoring
Challenges and the path to victory...
1. Early Warning Datamon - Data monitoring
2. Debuggability & Control Styx - Scheduling and control
Challenges and the path to victory...
1. Early Warning Datamon - Data monitoring
2. Debuggability & Control Styx - Scheduling and control
3. Automate Capacity GABO - Event Delivery
Challenges and the path to victory...
1. Early Warning Datamon - Data monitoring
2. Debuggability & Control Styx - Scheduling and control
3. Automate Capacity GABO - Event Delivery
Challenges and the path to victory...
Early Warning - Datamon
● Unified view
○ Alignment between teams
● Ownership
○ Clear ownership of data
● SLA
○ Alert on late data
Early Warning - Datamon
● Define terminology
● Provide metadata language
● Implement a Datamon service
Early Warning - Datamon
1. Early Warning Datamon - Data monitoring
2. Debuggability & Control Styx - Scheduling and control
3. Automate Capacity GABO - Event Delivery
Challenges and the path to victory...
- Execution control
- Self service for data users
- Execution information
- Expose debug information
- Execution isolation
- Docker for data jobs
Debuggability & Control - Styx
The river Styx
● Execution control
○ Centralized execution API
Debuggability & Control - Styx
Debuggability & Control - Styx
● Execution control
○ Centralized execution API
○ Backfilling and reprocessing
● Execution control
● Execution information
○ Timeline
Debuggability & Control - Styx
Debuggability & Control - Styx
● Execution control
● Execution information
○ Timeline
○ Google Cloud Logging
Debuggability & Control - Styx
● Execution control
● Execution information
● Execution isolation
○ Docker
1. Early Warning Datamon - Data monitoring
2. Debuggability & Control Styx - Scheduling and control
3. Automate Capacity GABO - Event Delivery
Challenges and the path to victory...
● Complex and manual config
Automate Capacity - GABO/Event Delivery
● Complex and manual config
● Pubsub & Dataflow streaming
Automate Capacity - GABO/Event Delivery
● Complex and manual config
● Pubsub & Dataflow streaming
● Pubsubs at scale
Automate Capacity - GABO/Event Delivery
● Complex and manual config
● Pubsub & Dataflow streaming
● Pubsubs at scale
● Dataflow streaming
Automate Capacity - GABO/Event Delivery
● Complex and manual config
● Pubsub & Dataflow streaming
● Pubsubs at scale
● Dataflow streaming :-(
● 2 micro services + 1 Map/Reduce job
Automate Capacity - GABO/Event Delivery
● Complex and manual config
● Pubsub & Dataflow streaming
● Pubsubs at scale
● Dataflow streaming :-(
● 2 micro services + 1 Map/Reduce job
● Autoscaling & The Stuffer
Automate Capacity - GABO/Event Delivery
● Handles at least 10x our load
● Darkloading
● Autoscale everything
● Self service
GABO - WIP
● Make sure you have the right
tools to deal with data incidents
○ Make sure you have time to
implement the tools you need
● Remember that your capacity
model can fail at larger scale
○ Keep track of your scale and
Automate, automate, automate...
Summary
Thank you!
kalvans@spotify.com
matti@spotify.com
Want to join the band?
http://spoti.fi/jobs
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
spotify-data-infrastructure

More Related Content

Viewers also liked

Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved Peter Antman
 
How Spotify Builds Products (Organization. Architecture, Autonomy, Accountabi...
How Spotify Builds Products (Organization. Architecture, Autonomy, Accountabi...How Spotify Builds Products (Organization. Architecture, Autonomy, Accountabi...
How Spotify Builds Products (Organization. Architecture, Autonomy, Accountabi...Kevin Goldsmith
 
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Big Data Spain
 
JavaScript @ Spotify (Felipe Ribeiro Technology Stream)
JavaScript @ Spotify (Felipe Ribeiro Technology Stream)JavaScript @ Spotify (Felipe Ribeiro Technology Stream)
JavaScript @ Spotify (Felipe Ribeiro Technology Stream)IT Arena
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogC4Media
 
Evolution of Spotify's ad architecture (Qcon 2016 Shanghai)
Evolution of Spotify's ad architecture (Qcon 2016 Shanghai)Evolution of Spotify's ad architecture (Qcon 2016 Shanghai)
Evolution of Spotify's ad architecture (Qcon 2016 Shanghai)Kinshuk Mishra
 
Playlists at Spotify - Using Cassandra to store version controlled objects
Playlists at Spotify - Using Cassandra to store version controlled objectsPlaylists at Spotify - Using Cassandra to store version controlled objects
Playlists at Spotify - Using Cassandra to store version controlled objectsJimmy Mårdell
 
Africa DevOps Day 2015
Africa DevOps Day 2015Africa DevOps Day 2015
Africa DevOps Day 2015Danielle Jabin
 
Docker at Spotify - Dockercon14
Docker at Spotify - Dockercon14Docker at Spotify - Dockercon14
Docker at Spotify - Dockercon14dotCloud
 
How the economist with cloud BI and Looker have improved data-driven decision...
How the economist with cloud BI and Looker have improved data-driven decision...How the economist with cloud BI and Looker have improved data-driven decision...
How the economist with cloud BI and Looker have improved data-driven decision...Looker
 
Music Personalization At Spotify
Music Personalization At SpotifyMusic Personalization At Spotify
Music Personalization At SpotifyVidhya Murali
 
How data drives spotify
How data drives spotifyHow data drives spotify
How data drives spotifyAli Sarrafi
 
Spotify Business Case
Spotify Business CaseSpotify Business Case
Spotify Business CaseDavid Gorgan
 
Empowering Engineering Talent - an update from Spotify
Empowering Engineering Talent - an update from SpotifyEmpowering Engineering Talent - an update from Spotify
Empowering Engineering Talent - an update from SpotifyKevin Goldsmith
 
How Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyHow Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyJosh Baer
 
Spotify Company presentation
Spotify Company presentationSpotify Company presentation
Spotify Company presentationalifost
 
Scaling Dropbox
Scaling DropboxScaling Dropbox
Scaling DropboxC4Media
 
Making Better Mistakes Tomorrow
Making Better Mistakes TomorrowMaking Better Mistakes Tomorrow
Making Better Mistakes TomorrowDanielle Jabin
 
How Spotify Payments Creates APIs to Manage Complexity (Horia Jurcut)
How Spotify Payments Creates APIs to Manage Complexity (Horia Jurcut)How Spotify Payments Creates APIs to Manage Complexity (Horia Jurcut)
How Spotify Payments Creates APIs to Manage Complexity (Horia Jurcut)Nordic APIs
 

Viewers also liked (20)

Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved
 
How Spotify Builds Products (Organization. Architecture, Autonomy, Accountabi...
How Spotify Builds Products (Organization. Architecture, Autonomy, Accountabi...How Spotify Builds Products (Organization. Architecture, Autonomy, Accountabi...
How Spotify Builds Products (Organization. Architecture, Autonomy, Accountabi...
 
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
 
JavaScript @ Spotify (Felipe Ribeiro Technology Stream)
JavaScript @ Spotify (Felipe Ribeiro Technology Stream)JavaScript @ Spotify (Felipe Ribeiro Technology Stream)
JavaScript @ Spotify (Felipe Ribeiro Technology Stream)
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
 
Spotify Teknikdagarna
Spotify TeknikdagarnaSpotify Teknikdagarna
Spotify Teknikdagarna
 
Evolution of Spotify's ad architecture (Qcon 2016 Shanghai)
Evolution of Spotify's ad architecture (Qcon 2016 Shanghai)Evolution of Spotify's ad architecture (Qcon 2016 Shanghai)
Evolution of Spotify's ad architecture (Qcon 2016 Shanghai)
 
Playlists at Spotify - Using Cassandra to store version controlled objects
Playlists at Spotify - Using Cassandra to store version controlled objectsPlaylists at Spotify - Using Cassandra to store version controlled objects
Playlists at Spotify - Using Cassandra to store version controlled objects
 
Africa DevOps Day 2015
Africa DevOps Day 2015Africa DevOps Day 2015
Africa DevOps Day 2015
 
Docker at Spotify - Dockercon14
Docker at Spotify - Dockercon14Docker at Spotify - Dockercon14
Docker at Spotify - Dockercon14
 
How the economist with cloud BI and Looker have improved data-driven decision...
How the economist with cloud BI and Looker have improved data-driven decision...How the economist with cloud BI and Looker have improved data-driven decision...
How the economist with cloud BI and Looker have improved data-driven decision...
 
Music Personalization At Spotify
Music Personalization At SpotifyMusic Personalization At Spotify
Music Personalization At Spotify
 
How data drives spotify
How data drives spotifyHow data drives spotify
How data drives spotify
 
Spotify Business Case
Spotify Business CaseSpotify Business Case
Spotify Business Case
 
Empowering Engineering Talent - an update from Spotify
Empowering Engineering Talent - an update from SpotifyEmpowering Engineering Talent - an update from Spotify
Empowering Engineering Talent - an update from Spotify
 
How Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyHow Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At Spotify
 
Spotify Company presentation
Spotify Company presentationSpotify Company presentation
Spotify Company presentation
 
Scaling Dropbox
Scaling DropboxScaling Dropbox
Scaling Dropbox
 
Making Better Mistakes Tomorrow
Making Better Mistakes TomorrowMaking Better Mistakes Tomorrow
Making Better Mistakes Tomorrow
 
How Spotify Payments Creates APIs to Manage Complexity (Horia Jurcut)
How Spotify Payments Creates APIs to Manage Complexity (Horia Jurcut)How Spotify Payments Creates APIs to Manage Complexity (Horia Jurcut)
How Spotify Payments Creates APIs to Manage Complexity (Horia Jurcut)
 

More from C4Media

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoC4Media
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileC4Media
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020C4Media
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsC4Media
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No KeeperC4Media
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like OwnersC4Media
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaC4Media
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideC4Media
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 

More from C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

Recently uploaded

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 

Recently uploaded (20)

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Scaling the Data Infrastructure @Spotify

  • 1. Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com
  • 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ spotify-data-infrastructure
  • 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco www.qconsf.com
  • 5. Agenda 1. Data at Spotify 2. Summer of 2015 3. Challenges & Victory ○ Datamon ○ Styx ○ GABO
  • 6. Spotify big-data context ● Over 100 million monthly active users ● Over 30 million song ● Over 2 billion playlists ● Active in 60 markets
  • 7. Data is at the heart of Spotify In 2007 - Monthly Royalty Report In 2016 - Monthly Royalty Report - Weekly Billboard - Daily reports to partners - ... - AB-Testing - Discover weekly - Daily Mix - ...
  • 8. Our growth in Data Users +50 TB/day +100M Users Developers +60 TB/day +10k M/R jobs
  • 14. ● A strain of incidents Summer of Incidents
  • 15. ● A strain of incidents ● War-room Summer of Incidents
  • 16. ● A strain of incidents ● War-room ● Hadoop on it’s knees Summer of Incidents
  • 17. ● A strain of incidents ● War-room ● Hadoop on it’s knees ● Event Delivery Catch up Summer of Incidents
  • 18. ● A strain of incidents ● War-room ● Hadoop on it’s knees ● Event Delivery Catch up ● Reprocessing of data Summer of Incidents
  • 19. ● A strain of incidents ● War-room ● Hadoop on it’s knees ● Event Delivery Catch up ● Reprocessing of data ● Hard to debug data issues Summer of Incidents
  • 20. Challenges and the path to victory...
  • 21. 1. Early Warning Datamon - Data monitoring Challenges and the path to victory...
  • 22. 1. Early Warning Datamon - Data monitoring 2. Debuggability & Control Styx - Scheduling and control Challenges and the path to victory...
  • 23. 1. Early Warning Datamon - Data monitoring 2. Debuggability & Control Styx - Scheduling and control 3. Automate Capacity GABO - Event Delivery Challenges and the path to victory...
  • 24. 1. Early Warning Datamon - Data monitoring 2. Debuggability & Control Styx - Scheduling and control 3. Automate Capacity GABO - Event Delivery Challenges and the path to victory...
  • 25. Early Warning - Datamon
  • 26. ● Unified view ○ Alignment between teams ● Ownership ○ Clear ownership of data ● SLA ○ Alert on late data Early Warning - Datamon
  • 27. ● Define terminology ● Provide metadata language ● Implement a Datamon service Early Warning - Datamon
  • 28. 1. Early Warning Datamon - Data monitoring 2. Debuggability & Control Styx - Scheduling and control 3. Automate Capacity GABO - Event Delivery Challenges and the path to victory...
  • 29. - Execution control - Self service for data users - Execution information - Expose debug information - Execution isolation - Docker for data jobs Debuggability & Control - Styx The river Styx
  • 30. ● Execution control ○ Centralized execution API Debuggability & Control - Styx
  • 31. Debuggability & Control - Styx ● Execution control ○ Centralized execution API ○ Backfilling and reprocessing
  • 32. ● Execution control ● Execution information ○ Timeline Debuggability & Control - Styx
  • 33. Debuggability & Control - Styx ● Execution control ● Execution information ○ Timeline ○ Google Cloud Logging
  • 34. Debuggability & Control - Styx ● Execution control ● Execution information ● Execution isolation ○ Docker
  • 35. 1. Early Warning Datamon - Data monitoring 2. Debuggability & Control Styx - Scheduling and control 3. Automate Capacity GABO - Event Delivery Challenges and the path to victory...
  • 36. ● Complex and manual config Automate Capacity - GABO/Event Delivery
  • 37. ● Complex and manual config ● Pubsub & Dataflow streaming Automate Capacity - GABO/Event Delivery
  • 38. ● Complex and manual config ● Pubsub & Dataflow streaming ● Pubsubs at scale Automate Capacity - GABO/Event Delivery
  • 39. ● Complex and manual config ● Pubsub & Dataflow streaming ● Pubsubs at scale ● Dataflow streaming Automate Capacity - GABO/Event Delivery
  • 40. ● Complex and manual config ● Pubsub & Dataflow streaming ● Pubsubs at scale ● Dataflow streaming :-( ● 2 micro services + 1 Map/Reduce job Automate Capacity - GABO/Event Delivery
  • 41. ● Complex and manual config ● Pubsub & Dataflow streaming ● Pubsubs at scale ● Dataflow streaming :-( ● 2 micro services + 1 Map/Reduce job ● Autoscaling & The Stuffer Automate Capacity - GABO/Event Delivery
  • 42. ● Handles at least 10x our load ● Darkloading ● Autoscale everything ● Self service GABO - WIP
  • 43. ● Make sure you have the right tools to deal with data incidents ○ Make sure you have time to implement the tools you need ● Remember that your capacity model can fail at larger scale ○ Keep track of your scale and Automate, automate, automate... Summary
  • 44. Thank you! kalvans@spotify.com matti@spotify.com Want to join the band? http://spoti.fi/jobs
  • 45. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ spotify-data-infrastructure