4. Spotify has more than a hundred backend services. They handle enormous amounts of data.
They should always be available. How are they built?
Today
måndag 27 maj 13
6. A small code base is simpler to understand and reason about
Doing one thing and one thing only means no compromises
In praise of small services
C
CC C
AP
S
S S
S
måndag 27 maj 13
7. “Rule of Modularity: Developers should build a program out of simple parts connected by well
defined interfaces, so problems are local, and parts of the program can be replaced in future
versions to support new features. This rule aims to save time on debugging complex code that
is complex, long, and unreadable.”
Eric S. Raymond, The Art of Unix Programming
måndag 27 maj 13
8. “Decouple until it breaks, and then back of just a little”
Strive to make services autonomous
Watch your latency, but commonly not significant
Decouple
C
CC C
AP
S
S S
S
måndag 27 maj 13
9. Use scaffolding to quickly get the basic service structure
Reuse in libraries
Don’t overuse patterns. Don’t use layers upon layers. Keep it simple
Simple codebases
måndag 27 maj 13
10. We build services in Python and Java
Python is awesome for quick development and beautiful code
The JVM is stable, performant and transparent
Languages and runtimes
måndag 27 maj 13
12. Care about your performance. Set clear goals. Measure, measure, measure.
Have an architecture that allows for scale. Build out as needed. Measure, measure, measure.
Performance at scale
http://www.bbc.co.uk/programmes/b01qzdc1
måndag 27 maj 13
13. Prefer stateless services when possible
Scales out linear
Isolate mutating operations
Prefer stateless services
måndag 27 maj 13
14. Fast, efficient, RESTful protocols
Connection pools are hard. Overloaded TCP servers are complicated
Use queues. Proper pushback. Naturally asynchronous.
Efficient protocols
måndag 27 maj 13
15. Small payloads, fast marshaling
gzip
http://qconsf.com/dl/qcon-sanfran-2011/slides/
SastryMalladi_DealingWithPerformanceChallengesOptimizedSerializationTechniques.pdf
Efficient payloads
måndag 27 maj 13
16. ZeroMQ. Light-weight, fast as hell, queue based
Protobuf. Small, fast, schema-based, simple binary format
Request-reply and pub/sub
Hermes
måndag 27 maj 13
17. Don’t be afraid to drop requests (and replies) when overloaded
Use shallow queues
Use short timeouts
Use small thread pools
Use small connection pools
Drop requests
måndag 27 maj 13
19. We use the best tool for each case from a small, carefully selected set of options
PostgreSQL as the default mutable storage
Cassandra for large scale (heavy writes) or multi-site services
Various read-only key-value stores
http://labs.spotify.com/2013/02/25/in-praise-of-boring-technology/
Scaling storage
måndag 27 maj 13
21. Stuff is always broken. Deal with it.
Always design for redundancy
Always keep an eye on your world
Don’t DDoS yourself
Always fail, never fail
måndag 27 maj 13
22. Build your system to run on multiple servers
Use service discovery everywhere. We use DNS SRV records.
Make deployment and configuration automated and repeatable
Make sure your service is actually running
Many commodity servers
måndag 27 maj 13
23. Instrument your code with metrics everywhere
We use our own for Python. http://metrics.codahale.com for java
Monitor your infrastructure. JVMs, OS, network, storage
Measure everything
måndag 27 maj 13
24. Graph your important metrics, strive for seconds latency
We use a heavily extended derivative of Munin
Graph
måndag 27 maj 13
25. Hard to know beforehand, err on the side of logging too much (within reasons)
Use a structured format
Use syslog
Collect your logs in a central place
Store your logs and make them analyzable
Log what’s important
måndag 27 maj 13
26. Consistently build to some form of packages. Keep track of dependencies
We build everything* to Debian packages and use package dependencies
Debian is awesome. Use it.
Automate deployment
* Except Maven dependencies
måndag 27 maj 13
27. Keep everything under version control
Use a provisioning tool
We use Puppet and store every configuration in Git. Everything*.
250 modules, 880 classes
Automate configuration
* Everything
måndag 27 maj 13
28. Trust your developers and ops. Let your teams be autonomous
Long-term ownership
Minimize interruptions (aka meetings)
Favor asynchronous communication. We coordinate over IRC and use mail
Ship.
Development
måndag 27 maj 13