API Caching, why your server needs some rest

@TwitterAds | Conﬁdential
@lfcipriani
2013-08-30
APIs Caching
W h y y o u r s e r v e r n e e d s s o m e r e s t
R u b y c o n f B r a z i l 2 0 1 3

Who?
@lfcipriani

What?

@lfcipriani
Scope of this presentation
4
• Caching in a Distributed System
• The ﬂows of HTTP Cache and how to control them
• Good and Bad Practices

@lfcipriani
If you need a friendly way to understand the Caching part of RFC 2616
Scope of this presentation
5
Source: http://www.slideshare.net/lfcipriani/fearless-http-requests-abuse

Deﬁnitions and
Deﬁnitions and Motivations
6

@lfcipriani
Memorizing phone numbers or go check phonebook every time
7
Analogy

@lfcipriani
Network Eﬀect
8
Welcome to the ﬁrst year of Software Engineering...
...where every request delivers a response without
failure and all network is reliable and fast.
Source: First day on Internet Kid (know your meme)

@lfcipriani
What problems cache helps to solve?
• redundant and unnecessary data trafﬁc
• network bottlenecks
• origin server heavy load (or spikes)
• long network latency
9

@lfcipriani
HTTP Archive
10
Motivations
Source: http://httparchive.org/trends.php?s=All&minlabel=Jan+20+2011&maxlabel=Aug+15+2013
All sites Top 1000

@lfcipriani
HTTP Archive Cache lifetime: All Sites vs Top 100
11
Motivations
http://httparchive.org/interesting.php?a=All&l=Aug%2015%202013&s=Top100

HTTP Caching Protocol
12

@lfcipriani
HTTP Caching ﬂows
13

@lfcipriani 14
https://vine.co/v/hOuAXTOetuz
bit.ly/vinecaching

@lfcipriani 15
https://vine.co/v/hOuMHbTzp6h
bit.ly/vinecaching

@lfcipriani 16https://vine.co/v/hOu5g9FVDa5
bit.ly/vinecaching

@lfcipriani 17
https://vine.co/v/hOuvzinwrt6
bit.ly/vinecaching

@lfcipriani
The Cache headers zoo
18
Source: http://www.slideshare.net/lfcipriani/fearless-http-requests-abuse

Cache Coherency
19

@lfcipriani
What’s cache coherency?
20
Since only the Origin Server knows the state of a
resource with certainty, caches and other components must
to ensure that the cached response is still fresh before
returning it to client.
Due to the complexity, keep cache coherency in
distributed systems has a high cost.
In a distributed system

@lfcipriani
Better safe than sorry
Strong consistency
21
Maintain coherency by revalidating every request in origin
server.

@lfcipriani
Living dangerously
Weak consistency
22
Cache has autonomy to use a heuristic to decide whether
the cached response is still fresh, without consulting the
origin server
Basically, there are 2 types of weak consistency.

@lfcipriani
Weak consistency - Invalidation
23

@lfcipriani
Weak consistency - Invalidation is bad!
24
• approach does not scale
• server needs to coordinate with a unknown network of
caches
• choose 2: immediacy, scalability, reliability
• “There are only two hard things in Computer Science:
cache invalidation and naming things” - Phil Karlton
• Two Generals Problem
http://www.subbu.org/blog/2010/01/cache-invalidation
http://en.wikipedia.org/wiki/Two_Generals'_Problem

@lfcipriani
Weak consistency - When to do Invalidation
25
When your network is similar to the one below ;-)

@lfcipriani
Weak consistency - TTL approach
26

Taming Cache
27

@lfcipriani
Topology considerations
28

@lfcipriani
Controlling cacheability
Protocol Speciﬁc Considerations
29
1. locally means a cache that servers only one consumer
2. these directives override any conﬁguration of the cache
3. by default, we can cache non safe/authenticated requests, GET and
HEAD and those with status code 200, 203, 206, 300, 301, 410
cache-control
directive
may I cache locally?
may I cache
anywhere?
should revalidate,
even being fresh?
no-store no no n/a
private yes no no
no-cache yes yes yes
public yes yes no

@lfcipriani 30
Controlling cacheability
Be aware of the Vary header, if the value is a header name
which values are high diversiﬁed, you could ﬁll cache
storage too fast.

@lfcipriani 31
Controlling revalidation
Revalidation is done with conditional requests.
If-Modified-Since != Last-Modified = 200
If-Modified-Since == Last-Modified = 304
If-None-Match != Etag = 200
If-None-Match == Etag = 304
You can even decide how revalidation is done.

@lfcipriani
Content speciﬁc considerations
32
Careful with cookies
Be aware of how privacy policy inﬂuences what’s
cacheable

@lfcipriani
Content life cycle considerations
33
TL;DR;
Know the rates of change of your resources and establish
a time to live for them.
Expires=[Date]
Cache-Control: max-age=[seconds]

@lfcipriani 34
• too short (seconds) or too long (days) TTLs smell bad
• TTL can vary, don’t consider it as a constant value.
• don’t be afraid to get sophisticated, if needed:
• L-Factor heuristic: (date - last modiﬁed) * factor
• Prediction Models http://www.slideshare.net/jseidman/real-world-machine-learning-at-orbitz-strata-2011
• Control your cache strategy!
Content life cycle considerations

@lfcipriani
General considerations
35
Deciding to have NO cache is part of the strategy.
Your cache strategy might not be honored by an
intermediary cache, no hard feelings about it, is more
common than you think.

Measuring eﬃciency
36

@lfcipriani
Measuring Cache eﬃciency
37
Hit Rate = Cache hits / Total of requests
This will depend on:
• how big your cache is
• how similar the interests of the cache users are
• the data rate of change
• how caches are conﬁgured

@lfcipriani
38
Byte Hit Rate =
Bytes transferred from cache hits /
Bytes transferred by Total of requests

@lfcipriani
39
• the same metrics could be applied to revalidations
• do the measures by resource
• do continuous measures and monitor to improve strategy

@lfcipriani
Validate your strategy in redbot.org
40

Final considerations
41

@lfcipriani
Final considerations
42
• Is important to have a good knowledge of Topology of the application and Distributed
Systems constraints.
• Think and build a good strategy, don’t rely on default heuristics
• Measure, monitor and improve. Strategies are dynamic and change it is part of the
process.
• All this can be done incrementally, focus on relevant resources
• Be careful to not turn cache into overhead.

@lfcipriani 43
References
Web Protocols and Practice: HTTP/1.1, Networking Protocols, Caching, and Trafﬁc
Measurement (Balachander Krishnamurthy and Jennifer Rexford)
HTTP: The Deﬁnitive Guide (David Gourley, Brian Totty, Marjorie Sayer and Anshu
Aggarwal)
http://www.w3.org/Protocols/rfc2616/rfc2616.html (HTTP RFC)
http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13 (Caching in HTTP)
http://stevesouders.com/
http://talleye.com/
https://dev.twitter.com/
bit.ly/vinecaching

Thank you!

API Caching, why your server needs some rest

Recommended

Recommended

More Related Content

Similar to API Caching, why your server needs some rest

Similar to API Caching, why your server needs some rest (20)

More from Luis Cipriani

More from Luis Cipriani (11)

Recently uploaded

Recently uploaded (20)

API Caching, why your server needs some rest