SlideShare a Scribd company logo
1 of 23
Distributed “Web Scale” Systems



                      Ricardo Vice Santos
                            @ricardovice
Who am I?
•  I’m Ricardo!
•  Lead Engineer at Spotify
•  ricardovice on twitter, spotify, about.me, kiva, slideshare, github,
   bitbucket, delicious…
•  Portuguese
•  Previously working in the video streaming industry
•  (only) Discovered Spotify late 2009
•  Joined in 2010
spotifiera:           to use Spotify;
spo·ti·fie·ra   Verb to provide a service free of cost;
What’s Spotify all about?
•  A big catalogue, tons of music
•  Available everywhere
•  Great user experience
•  More convenient than piracy
•  Reliable, high availability
•  Scalable for many, many users
But what really got me hooked up:
•  Free, legal ad-supported service
•  Very fast
The importance of being fast
•  High latency can be a problem, not only in First
   Person Shooters
•  Slow performance is a major user experience killer
•  At Velocity 2009, Eric Schurman (Bing) and Jake
   Brutlag (Google Search) showed that increased
   latency directly hurt usage and revenue per user[1].
•  Latency leads to users leaving, many wont ever
   come back
•  Users will share their experience with friends


          [1] http://radar.oreilly.com/2009/07/velocity-making-your-site-fast.html
So how fast is Spotify?
•  We monitor playback latency on the client side
•  Current median latency to play any track is 265ms
•  On average, the human notion of “instant” is
   anything under 200ms
•  Due to disk lookup, at times it's actually faster to
   start playing a track from network than from disk
•  Below 1% of playbacks experienced stutter
“Spotify is fast due to P2P”
•  This is something I read a lot around the web
•  P2P does play a crucial role in the picture, but…
•  Experience at Spotify showed me that most latency issues are
   directly linked to backend problems
•  It’s a mistake to think that we could be this fast without a smart and
   scalable backend architecture

So let’s give credit where credit is due.
Going web scale!!1




“Scaling Twitter”
Blaine Cook, 2007
http://www.slideshare.net/Blaine/scaling-twitter
Handling growth
Things to keep in mind:
•  Scaling is not an exact science
•  There is no such thing as a magic formula
•  Usage patterns differ
•  There is always a limit to what you can handle
•  Fail gracefully
•  Continuous evolution process
Scaling horizontally
•    You can always add more machines!
•    Stateless services
•    Several processes can share memcached
•    Possible to run in “the cloud” (EC2, Rackspace)
•    Need some kind of load balancer
•    Data sharing/synchronization can be hard
•    Complexity: many pieces, maybe hidden SPOFs
•    Fundamental to the application’s design
Usage patterns
Typically, some services are more demanding than
others, this can be due to:
•  Higher popularity
•  Higher complexity
•  Low latency expectation
•  All combined
Decoupling
•    Divide and conquer!
•    The Unix way
•    Resources assigned individually
•    Using the right tools to address each problem
•    Organization and delegation
•    Problems are isolated
•    Easier to handle growth
Read only services
•    The easiest to scale
•    Stateless
•    Use indices, large read-optimized data containers
•    Each node has its local copy
•    Data structured according to service
•    Updated periodically, during off-peak hours
•    Take advantage of OS page cache
Read-write services
•  User generated content, e.g. playlists
•  Hard to ensure consistence of data across instances

Solutions:
•  Eventual consistency:
   •  Reads of just written data not guaranteed to be up-to-date
•  Locking, atomic operations
    •  Creating globally unique keys, e.g. usernames
    •  Transactions, e.g. billing
Decoupling at Spotify
Finding a service via DNS
Each service has an SRV DNS record:
•  One record with same name for each service instance
•  Clients (AP) resolve to find servers providing that service
•  Lowest priority record is chosen with weighted shuffle
•  Clients retry other instances in case of failures

Example SRV record
_frobnicator._http.example.com. 3600 SRV 10     50   8081 frob1.example.com.!
       name                     TTL type prio weight port      host!
Request assignment
•    Hardware load balancers
•    Round-robin DNS
•    Proxy servers
•    Sharding:
      •  Each server/instance responsible for subset of data
      •  Directs client to instance that has its data
      •  Easy if nothing is shared
      •  Hard if you require replication
Sharding using a DHT
Some Spotify services use Dynamo inspired DHTs[1]:
•  Each request has a key
•  Each service node is responsible for a range of hash keys
•  Data is distributed among service nodes
•  Redundancy is ensured by re-hashing and writing to replica node
•  Data must be transitioned when ring changes
!




         [1] http://dl.acm.org/citation.cfm?id=1294281
DHT example
Spotify’s DNS powered DHT
Configuration of DHT
config._frobnicator._http.example.com.     3600    TXT          “slaves=0”!
      config.srv_name.                     TTL     type   !   no replication!
!
config._frobnicator._http.example.com.     3600    TXT      “slaves=2 redundancy=host”!
      config.srv_name.                     TTL!    type   !      three replicas!
                                                                on separate hosts!

Ring segment, one per node
tokens.8081.frob1.example.com.   3600    TXT      “00112233445566778899aabbccddeeff”!
      tokens.port.host.          TTL     type                last key!
!
And if none of this works for you
Remember
/dev/null is
web scale!!




          http://www.xtranormal.com/watch/6995033/
Questions?
                     get in touch!
                    @ricardovice
             ricardo@spotify.com
Thank you.

                    @ricardovice
             ricardo@spotify.com

More Related Content

What's hot

Playlist Recommendations @ Spotify
Playlist Recommendations @ SpotifyPlaylist Recommendations @ Spotify
Playlist Recommendations @ SpotifyNikhil Tibrewal
 
How data drives spotify
How data drives spotifyHow data drives spotify
How data drives spotifyAli Sarrafi
 
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCSpotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCJosh Baer
 
The Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyThe Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyJosh Baer
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At SpotifyAdam Kawa
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services
 
How Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyHow Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyJosh Baer
 
Apache Kafka as Message Queue for your microservices and other occasions
Apache Kafka as Message Queue for your microservices and other occasionsApache Kafka as Message Queue for your microservices and other occasions
Apache Kafka as Message Queue for your microservices and other occasionsMichael Reinsch
 
Storm at Spotify
Storm at SpotifyStorm at Spotify
Storm at SpotifyNeville Li
 
How to Monitor DOCSIS Devices Using SNMP, InfluxDB, and Telegraf
How to Monitor DOCSIS Devices Using SNMP, InfluxDB, and TelegrafHow to Monitor DOCSIS Devices Using SNMP, InfluxDB, and Telegraf
How to Monitor DOCSIS Devices Using SNMP, InfluxDB, and TelegrafInfluxData
 
Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Grafana Labs
 
Static code analysis with sonar qube
Static code analysis with sonar qubeStatic code analysis with sonar qube
Static code analysis with sonar qubeHayi Nukman
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per SecondAmazon Web Services
 
Jenkins multi configuration (matrix)
Jenkins multi configuration (matrix)Jenkins multi configuration (matrix)
Jenkins multi configuration (matrix)Muhammad Zbeedat
 

What's hot (20)

Playlist Recommendations @ Spotify
Playlist Recommendations @ SpotifyPlaylist Recommendations @ Spotify
Playlist Recommendations @ Spotify
 
How data drives spotify
How data drives spotifyHow data drives spotify
How data drives spotify
 
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCSpotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
 
The Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyThe Evolution of Big Data at Spotify
The Evolution of Big Data at Spotify
 
Sonar qube
Sonar qubeSonar qube
Sonar qube
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At Spotify
 
Data at Spotify
Data at SpotifyData at Spotify
Data at Spotify
 
Sonarqube
SonarqubeSonarqube
Sonarqube
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
How Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyHow Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At Spotify
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Apache Kafka as Message Queue for your microservices and other occasions
Apache Kafka as Message Queue for your microservices and other occasionsApache Kafka as Message Queue for your microservices and other occasions
Apache Kafka as Message Queue for your microservices and other occasions
 
Introducing ELK
Introducing ELKIntroducing ELK
Introducing ELK
 
Storm at Spotify
Storm at SpotifyStorm at Spotify
Storm at Spotify
 
How to Monitor DOCSIS Devices Using SNMP, InfluxDB, and Telegraf
How to Monitor DOCSIS Devices Using SNMP, InfluxDB, and TelegrafHow to Monitor DOCSIS Devices Using SNMP, InfluxDB, and Telegraf
How to Monitor DOCSIS Devices Using SNMP, InfluxDB, and Telegraf
 
Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018
 
Static code analysis with sonar qube
Static code analysis with sonar qubeStatic code analysis with sonar qube
Static code analysis with sonar qube
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
Reliable and Scalable Data Ingestion at Airbnb
Reliable and Scalable Data Ingestion at AirbnbReliable and Scalable Data Ingestion at Airbnb
Reliable and Scalable Data Ingestion at Airbnb
 
Jenkins multi configuration (matrix)
Jenkins multi configuration (matrix)Jenkins multi configuration (matrix)
Jenkins multi configuration (matrix)
 

Viewers also liked

Spotify Brand Audit IMC 613
Spotify Brand Audit IMC 613Spotify Brand Audit IMC 613
Spotify Brand Audit IMC 613Jamie Huggins
 
Astronaut Wheelock Pictures
Astronaut Wheelock PicturesAstronaut Wheelock Pictures
Astronaut Wheelock PicturesTom Kuipers
 
Riding promotion team
Riding promotion teamRiding promotion team
Riding promotion teamJungkoo Kim
 
Sharing china photos on flickr
Sharing china photos on flickrSharing china photos on flickr
Sharing china photos on flickrLeigh Scott
 
Guide for One Person Company Registration
Guide for One Person Company RegistrationGuide for One Person Company Registration
Guide for One Person Company RegistrationBinoy Chacko
 
Kỹ năng thuyết trình
Kỹ năng thuyết trìnhKỹ năng thuyết trình
Kỹ năng thuyết trìnhLinh Pham Dieu
 
Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1Melanie Zurba
 
Vincent tema5 town
Vincent tema5 townVincent tema5 town
Vincent tema5 townJacket25
 
Azkena rock
Azkena rockAzkena rock
Azkena rockaneborja
 
Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1Melanie Zurba
 
Two wrongs don’t make a right
Two wrongs don’t make a rightTwo wrongs don’t make a right
Two wrongs don’t make a rightBillGENGL1021
 

Viewers also liked (20)

Spotify Brand Audit IMC 613
Spotify Brand Audit IMC 613Spotify Brand Audit IMC 613
Spotify Brand Audit IMC 613
 
Astronaut Wheelock Pictures
Astronaut Wheelock PicturesAstronaut Wheelock Pictures
Astronaut Wheelock Pictures
 
Riding promotion team
Riding promotion teamRiding promotion team
Riding promotion team
 
Who is HAYAL KÖKSAL? What has she done in 40 years of teaching life?
Who is HAYAL KÖKSAL? What has she done in 40 years of teaching life?Who is HAYAL KÖKSAL? What has she done in 40 years of teaching life?
Who is HAYAL KÖKSAL? What has she done in 40 years of teaching life?
 
2016 Leading Seagulls 4 Todays Interns
2016 Leading Seagulls 4 Todays Interns 2016 Leading Seagulls 4 Todays Interns
2016 Leading Seagulls 4 Todays Interns
 
Sharing china photos on flickr
Sharing china photos on flickrSharing china photos on flickr
Sharing china photos on flickr
 
Quechua
QuechuaQuechua
Quechua
 
17 icsqcc hayal koksal
17 icsqcc hayal koksal17 icsqcc hayal koksal
17 icsqcc hayal koksal
 
Guide for One Person Company Registration
Guide for One Person Company RegistrationGuide for One Person Company Registration
Guide for One Person Company Registration
 
www.toneabs.info
www.toneabs.infowww.toneabs.info
www.toneabs.info
 
Kỹ năng thuyết trình
Kỹ năng thuyết trìnhKỹ năng thuyết trình
Kỹ năng thuyết trình
 
Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1
 
Bibliotecas famosas
Bibliotecas famosasBibliotecas famosas
Bibliotecas famosas
 
Lurdes
LurdesLurdes
Lurdes
 
2016 leading seagulls 7 teacher candy dates
2016 leading seagulls 7 teacher candy dates2016 leading seagulls 7 teacher candy dates
2016 leading seagulls 7 teacher candy dates
 
Vincent tema5 town
Vincent tema5 townVincent tema5 town
Vincent tema5 town
 
Azkena rock
Azkena rockAzkena rock
Azkena rock
 
Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1
 
2016 leading seagulls 16 beautiful minds
2016 leading seagulls 16 beautiful minds 2016 leading seagulls 16 beautiful minds
2016 leading seagulls 16 beautiful minds
 
Two wrongs don’t make a right
Two wrongs don’t make a rightTwo wrongs don’t make a right
Two wrongs don’t make a right
 

Similar to Distributed "Web Scale" Systems

Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Bob Pusateri
 
Spotify: P2P music-on-demand streaming
Spotify: P2P music-on-demand streamingSpotify: P2P music-on-demand streaming
Spotify: P2P music-on-demand streamingRicardo Vice Santos
 
Messaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkMessaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkTomas Doran
 
Putting Kafka Into Overdrive
Putting Kafka Into OverdrivePutting Kafka Into Overdrive
Putting Kafka Into OverdriveTodd Palino
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterJohn Adams
 
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)
Parallel and Asynchronous Programming -  ITProDevConnections 2012 (English)Parallel and Asynchronous Programming -  ITProDevConnections 2012 (English)
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)Panagiotis Kanavos
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 
Parallel and Asynchronous Programming - ITProDevConnections 2012 (Greek)
Parallel and Asynchronous Programming -  ITProDevConnections 2012 (Greek)Parallel and Asynchronous Programming -  ITProDevConnections 2012 (Greek)
Parallel and Asynchronous Programming - ITProDevConnections 2012 (Greek)Panagiotis Kanavos
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineDataWorks Summit
 
Design for Scale / Surge 2010
Design for Scale / Surge 2010Design for Scale / Surge 2010
Design for Scale / Surge 2010Christopher Brown
 
The economies of scaling software - Abdel Remani
The economies of scaling software - Abdel RemaniThe economies of scaling software - Abdel Remani
The economies of scaling software - Abdel Remanijaxconf
 
The Economies of Scaling Software
The Economies of Scaling SoftwareThe Economies of Scaling Software
The Economies of Scaling SoftwareAbdelmonaim Remani
 
Setting Up .Onion Addresses for your Enterprise, v3.5
Setting Up .Onion Addresses for your Enterprise, v3.5Setting Up .Onion Addresses for your Enterprise, v3.5
Setting Up .Onion Addresses for your Enterprise, v3.5Alec Muffett
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Bob Pusateri
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Bob Pusateri
 

Similar to Distributed "Web Scale" Systems (20)

Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
 
Spotify: P2P music-on-demand streaming
Spotify: P2P music-on-demand streamingSpotify: P2P music-on-demand streaming
Spotify: P2P music-on-demand streaming
 
Messaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkMessaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new framework
 
Putting Kafka Into Overdrive
Putting Kafka Into OverdrivePutting Kafka Into Overdrive
Putting Kafka Into Overdrive
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
Realtime web2012
Realtime web2012Realtime web2012
Realtime web2012
 
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)
Parallel and Asynchronous Programming -  ITProDevConnections 2012 (English)Parallel and Asynchronous Programming -  ITProDevConnections 2012 (English)
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Parallel and Asynchronous Programming - ITProDevConnections 2012 (Greek)
Parallel and Asynchronous Programming -  ITProDevConnections 2012 (Greek)Parallel and Asynchronous Programming -  ITProDevConnections 2012 (Greek)
Parallel and Asynchronous Programming - ITProDevConnections 2012 (Greek)
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
 
Design for Scale / Surge 2010
Design for Scale / Surge 2010Design for Scale / Surge 2010
Design for Scale / Surge 2010
 
The economies of scaling software - Abdel Remani
The economies of scaling software - Abdel RemaniThe economies of scaling software - Abdel Remani
The economies of scaling software - Abdel Remani
 
The Economies of Scaling Software
The Economies of Scaling SoftwareThe Economies of Scaling Software
The Economies of Scaling Software
 
Setting Up .Onion Addresses for your Enterprise, v3.5
Setting Up .Onion Addresses for your Enterprise, v3.5Setting Up .Onion Addresses for your Enterprise, v3.5
Setting Up .Onion Addresses for your Enterprise, v3.5
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
 

Recently uploaded

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 

Recently uploaded (20)

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 

Distributed "Web Scale" Systems

  • 1. Distributed “Web Scale” Systems Ricardo Vice Santos @ricardovice
  • 2. Who am I? •  I’m Ricardo! •  Lead Engineer at Spotify •  ricardovice on twitter, spotify, about.me, kiva, slideshare, github, bitbucket, delicious… •  Portuguese •  Previously working in the video streaming industry •  (only) Discovered Spotify late 2009 •  Joined in 2010
  • 3. spotifiera: to use Spotify; spo·ti·fie·ra Verb to provide a service free of cost;
  • 4. What’s Spotify all about? •  A big catalogue, tons of music •  Available everywhere •  Great user experience •  More convenient than piracy •  Reliable, high availability •  Scalable for many, many users But what really got me hooked up: •  Free, legal ad-supported service •  Very fast
  • 5. The importance of being fast •  High latency can be a problem, not only in First Person Shooters •  Slow performance is a major user experience killer •  At Velocity 2009, Eric Schurman (Bing) and Jake Brutlag (Google Search) showed that increased latency directly hurt usage and revenue per user[1]. •  Latency leads to users leaving, many wont ever come back •  Users will share their experience with friends [1] http://radar.oreilly.com/2009/07/velocity-making-your-site-fast.html
  • 6. So how fast is Spotify? •  We monitor playback latency on the client side •  Current median latency to play any track is 265ms •  On average, the human notion of “instant” is anything under 200ms •  Due to disk lookup, at times it's actually faster to start playing a track from network than from disk •  Below 1% of playbacks experienced stutter
  • 7. “Spotify is fast due to P2P” •  This is something I read a lot around the web •  P2P does play a crucial role in the picture, but… •  Experience at Spotify showed me that most latency issues are directly linked to backend problems •  It’s a mistake to think that we could be this fast without a smart and scalable backend architecture So let’s give credit where credit is due.
  • 8. Going web scale!!1 “Scaling Twitter” Blaine Cook, 2007 http://www.slideshare.net/Blaine/scaling-twitter
  • 9. Handling growth Things to keep in mind: •  Scaling is not an exact science •  There is no such thing as a magic formula •  Usage patterns differ •  There is always a limit to what you can handle •  Fail gracefully •  Continuous evolution process
  • 10. Scaling horizontally •  You can always add more machines! •  Stateless services •  Several processes can share memcached •  Possible to run in “the cloud” (EC2, Rackspace) •  Need some kind of load balancer •  Data sharing/synchronization can be hard •  Complexity: many pieces, maybe hidden SPOFs •  Fundamental to the application’s design
  • 11. Usage patterns Typically, some services are more demanding than others, this can be due to: •  Higher popularity •  Higher complexity •  Low latency expectation •  All combined
  • 12. Decoupling •  Divide and conquer! •  The Unix way •  Resources assigned individually •  Using the right tools to address each problem •  Organization and delegation •  Problems are isolated •  Easier to handle growth
  • 13. Read only services •  The easiest to scale •  Stateless •  Use indices, large read-optimized data containers •  Each node has its local copy •  Data structured according to service •  Updated periodically, during off-peak hours •  Take advantage of OS page cache
  • 14. Read-write services •  User generated content, e.g. playlists •  Hard to ensure consistence of data across instances Solutions: •  Eventual consistency: •  Reads of just written data not guaranteed to be up-to-date •  Locking, atomic operations •  Creating globally unique keys, e.g. usernames •  Transactions, e.g. billing
  • 16. Finding a service via DNS Each service has an SRV DNS record: •  One record with same name for each service instance •  Clients (AP) resolve to find servers providing that service •  Lowest priority record is chosen with weighted shuffle •  Clients retry other instances in case of failures Example SRV record _frobnicator._http.example.com. 3600 SRV 10 50 8081 frob1.example.com.! name TTL type prio weight port host!
  • 17. Request assignment •  Hardware load balancers •  Round-robin DNS •  Proxy servers •  Sharding: •  Each server/instance responsible for subset of data •  Directs client to instance that has its data •  Easy if nothing is shared •  Hard if you require replication
  • 18. Sharding using a DHT Some Spotify services use Dynamo inspired DHTs[1]: •  Each request has a key •  Each service node is responsible for a range of hash keys •  Data is distributed among service nodes •  Redundancy is ensured by re-hashing and writing to replica node •  Data must be transitioned when ring changes ! [1] http://dl.acm.org/citation.cfm?id=1294281
  • 20. Spotify’s DNS powered DHT Configuration of DHT config._frobnicator._http.example.com. 3600 TXT “slaves=0”! config.srv_name. TTL type ! no replication! ! config._frobnicator._http.example.com. 3600 TXT “slaves=2 redundancy=host”! config.srv_name. TTL! type ! three replicas! on separate hosts! Ring segment, one per node tokens.8081.frob1.example.com. 3600 TXT “00112233445566778899aabbccddeeff”! tokens.port.host. TTL type last key! !
  • 21. And if none of this works for you Remember /dev/null is web scale!! http://www.xtranormal.com/watch/6995033/
  • 22. Questions? get in touch! @ricardovice ricardo@spotify.com
  • 23. Thank you. @ricardovice ricardo@spotify.com