1. 1
Cache Concepts and Varnish-Cache
Playing with Varnish
Marc Cortinas – Production Service - Webops - Semptember2014
2. 2
Agenda
Part 1: Cache Concepts (10min)
1. What is Caching?
2. Cache levels and types
3. The Rules
4. Header Pragma
5. Header Last-Modified
6. Header Etag
7. Header Expires
8. Header Cache Control
9. Cache key
10. Methods and cacheability
Part 2: Varnish-Cache (30min)
1. What is Varnish-cache?
2. Process Architecture
3. Installation and Basic Configuration
4. VCL Backends, Probes, Directors
5. VCL functions or subroutines
6. VCL Reference
7. VCL Variables Availability
8. VCL Subroutines Graph
9. VCL Stale Content
10. Our VCL configuration in cdn-own.edreams.com
11. VMOD Directory
12. Tunning and best practices
13. Proof-Concept with siege (only HTTP)
3. CDNs/Varnish/Nginx/
Apache(mod_cache)
3
Cache Levels and types
Browser
ISP
Proxies
CPD LTM/Varnish/Nginx/
Apache(mod_cache)
Code Cache
APC php
Data Cache
redis,etc..
Disk
Cache
- Local Cache
- Reverse Proxy Caches
- Data Cache
- Code cache
- Disk Cache
4. 4
What is Caching?
Caching is a great example of the ubiquitous time-space tradeoff in
programming. You can save time by using space to store results.
In the case of websites, the browser can save a copy of images, stylesheets,
javascript or the entire page. The next time the user needs that resource (such
as a script or logo that appears on every page), the browser doesn’t have to
download it again. Fewer downloads means a faster, happier site.
Here’s a quick refresher on how a web browser gets a page from the server:
5. 5
The Rules
Describe the main http headers help us distribute and cache the content
efficiently.
RFCs:
1. http1.0
1. http://www.rfc-base.org/rfc-1945.html
2. http2.0
1. http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
HTTP HEADERs is our friend!
6. 6
Header Pragma
Pragma Header is “deprecated”
RFCs:
1. http1.0
1. Request header used to revalidate any cached response before using it.
2. http2.0
1. only works for HTTP/1.1 caches when Cache-Control is missing.
7. Header Last-Modified
One fix is for the server to tell the browser what version of the file it is sending. A
server can return a Last-modified date along with the file (let’s call it logo.png),
like this:
Last-modified: Fri, 16 Mar 2007 04:00:25 GMT
7
304 is cheaper than all obj!
8. 8
Header Etag
What if the server’s clock was originally wrong and then got fixed? What if
daylight savings time comes early and the server isn’t updated? The caches
could be inaccurate.
ETags to the rescue. An ETag is a unique identifier given to every file. It’s like a
hash or fingerprint: every file gets a unique fingerprint, and if you change the
file (even by one byte), the fingerprint changes as well.
9. 9
Header Expires
Caching a file and checking with the server is nice, except for one thing: we are
still checking with the server
Example: Expires: Fri, 30 Oct 1998 14:19:41 GMT (Past=uncacheable)
- Absolute Time (totally dependency of Clocks )
- last time that the client retrieved the document (last access time)
- last time the document changed on your server (last modification time)
10. 10
Header Cache-Control
Expires is great, but it has to be computed for every date. The max-age header
lets us say “This file expires 1 week from today”, which is simpler than setting
an explicit date.
max-age=[seconds] — specifies the maximum amount of time that a representation will be considered
fresh. Similar to Expires, this directive is relative to the time of the request, rather than absolute.
[seconds] is the number of seconds from the time of the request you wish the representation to be fresh
for.
s-maxage=[seconds] — similar to max-age, except that it only applies to shared (e.g., proxy) caches.
public — marks authenticated responses as cacheable; normally, if HTTP authentication is required,
responses are automatically private.
private — allows caches that are specific to one user (e.g., in a browser) to store the response; shared
caches (e.g., in a proxy) may not.
no-cache — forces caches to submit the request to the origin server for validation before releasing a
cached copy, every time. This is useful to assure that authentication is respected (in combination with
public), or to maintain rigid freshness, without sacrificing all of the benefits of caching.
no-store — instructs caches not to keep a copy of the representation under any conditions.
must-revalidate — tells caches that they must obey any freshness information you give them about a
representation. HTTP allows caches to serve stale representations under special conditions; by specifying
this header, you’re telling the cache that you want it to strictly follow your rules.
proxy-revalidate — similar to must-revalidate, except that it only applies to proxy caches.
Cache-Control: public, max-age=0, s-maxage=0, no-cache, no-store, must-revalidate, proxy-revalidate
Cache-Control: public, max-age=3600
11. 11
The Cache Key
URI Split
<protocol>://<user>@<passwd>:<host>:<port>/<path>?<qsa>
What’s the cache key? (HASH in varnish)
The key to find the object again!!
- Akamai Cache Key
❯ akacurl "http://cdn-aka.edreams.com/?a=1&b=2&c=3" 2>&1 |grep -i "X-Cache-key"
⏎
X-Cache-Key: /L/728/323898/1h/cdn-aka.edreams.com/?a=1&b=2&c=3
- Varnish HASH
sub vcl_hash {
hash_data(req.url);
if (req.http.host) {
hash_data(req.http.host);
} else {
hash_data(server.ip);
}
return (lookup)
}
12. 12
Methods and Cacheability
HTTP METHOD Cacheability
GET Yes
HEAD Yes
POST No
PUT No
DELETE No
OPTIONS No
TRACE No
CONNECT No
PATCH No
13. 13
Different ways to define cache TTL
• Application: PHP, Java, etc... (the best custom-scenary)
• HTTP Server:
By FilesMatch Ex: <FilesMatch ".(html|htm|php|cgi|pl)$”>
By Location: Ex: <Location "/deal”>
By LocationMatch (EREG) Ex: <LocationMatch "/offers/.*/today/.*">
14. 14
Part 2: What’s varnish-cache?
Project Web: https://www.varnish-cache.org/
Documentation: https://www.varnish-cache.
org/docs/4.0/reference/index.html
GitHub: https://github.com/varnish/Varnish-Cache
Varnish is an HTTP accelerator designed for content-heavy
dynamic web sites.
In contrast to other web accelerators, such as Squid, which
began life as a client-side cache, or Apache and nginx,
which are primarily origin servers, Varnish was designed
as an HTTP accelerator. Varnish is focused exclusively
on HTTP, unlike other proxy servers that often support
FTP, SMTP and other network protocols.
Version 1.0 of Varnish was released in 2006,
Varnish 2.0 in 2008,
Varnish 3.0 in 2011, and Varnish 4.0 in 2014
15. 15
Process Architecture
1. Management process apply
configuration changes (VCL
and parameters), compile
VCL, monitor Varnish,
initialize Varnish and
provides a command line
interface, accessible either
directly on the terminal or
through a management
interface.
2. Child process consist of
several different types of
threads, including, but not
limited to:
• Acceptor thread to accept new
connections and delegate them.
• Worker threads - one per
session. It’s common to use
hundreds of worker threads.
• Expiry thread, to evict old
content from the cache.
16. 16
Installation and Basic Configuration
Installation:
• Debian/Ubuntu: apt from repository repo.varnish-cache.org
• FreeBSD: Compile with freebsd ports
• RedHat/CentOS/Fedora: yum from EPEL repository
• Solaris 10 and 11: Compile with gmake
• MacOsX: compile with automake from macports
Configuration:
• /etc/default/varnish or /etc/sysconfig/varnish: set parameters of binary file
-P /var/run/varnishd.pid -a :80 -T localhost:6082 -f /etc/varnish/default.vcl -S /etc/
varnish/secret -s malloc,3g -p thread_pools=4 -p thread_pool_min=100 -p thread_pool_max=1000 -p
thread_pool_add_delay=2
• /etc/varnish/default.vcl – Varnish Configuration Language – Initial
Configuration.
17. Varnish Configuration Language – VCL Backends, Probes, Directors
17
backend default {
.host = "cdn1.edreams.com";
.port = "80";
.probe = {
.url = "/engine/static-content/unversioned/html/blank.html";
.timeout = 1s;
.interval = 10s;
.window = 5;
.threshold = 2;
}
…
}
backend web1 {
.host = "cdn1.edreams.com";
.port = "80";
}
# Below is an example redirector based on round-robin requests
import directors;
sub vcl_init {
new cluster1 = directors.round_robin();
cluster1.add_backend(web1); # Backend web1 defined above
}
18. Varnish Configuration Language – VCL functions or subroutines
18
• vcl_recv is the first VCL function executed,
right after Varnish has decoded the request into
its basic data structure.
• Modifying the client data to reduce cache
diversity. E.g., removing any leading “www.” in a
URL.
• Deciding caching policy based on client data. E.g.,
Not caching POST requests, only caching specific
URLs, etc
• Executing re-write rules needed for specific web
applications.
• Deciding which Web server to use.
• vcl_fetch is designed to avoid caching
anything with a set-cookie header. There are
very few situations where caching content with
a set-cookie header is desirable.
• vcl_hash
• Defines what is unique about a request.
• Executed directly after vcl_recv
• vcl_hit
• Right after an object has been found (hit) in the
cache
• You can change the TTL or issue purge;
• Often used to throw out an old object
• vcl_miss
• Right after an object was looked up and not
found in cache
• Mostly used to issue purge;
• Can also be used to modify backend request
headers
• vcl_pass
• Run after a pass in vcl_recv OR after a lookup
that returned a hitpass
• Not run after vcl_fetch.
• vcl_deliver
• Common last exit point for all (except vcl_pipe)
code paths
• Often used to add and remove debug-headers
• vcl_error
• Used to generate content from within Varnish,
without talking to a web server
• Error messages go here by default
• Other use cases: Redirecting users (301/302
Redirects)
19. Varnish Configuration Language – VCL Reference
https://www.varnish-cache.org/docs/4.0/reference/vcl.html
• Built-in Functions:
ban(expr),call(subroutine),hash_data(input),new(),return(),rollback(),synth
etic(STRING),regsub(str,regex,sub), regsuball(str,regex,sub)
• Perl-compatible Regular Expression (PCRE)
• Own subroutines: sub own_subroutine { … }
• ACL to group IP or subnets
• Probes - healthcheck
• Backend Definition
• Import modules (VMODS)
• Include statements – load/add vcl configuration file
• Integers, Reals Numbers or strings
• Operators: =,==, ~,!,&&,||
• Conditionals: If|Else|elseif
19
22. Varnish Configuration Language – VCL Stale content
22
Stale with revalidate
Varnish stale content while a fresh
content is fetched.
Code:
sub vcl_recv {
set req.grace = 300s
}
sub vcl_fetch {
set obj.grace = 300s
}
Stale with backend down
Another e- commerce called this
behaivour “Nightmare mode”
Varnish stale content (outdated)
even when backend is unreached
Code:
sub vcl_recv {
if (req.backend.healthy) {
set req.grace = 10s
else
set req.grace = 2h;
}
sub vcl_fetch {
set obj.grace = 2h
}
23. 23
Tools
https://www.varnish-software.
com/static/book/Appendix_A__Varnish_Programs.html
• varnistop - groups tags and the content of the tag together to generate
a sorted list of the most frequently appearing tag/tag-content pair.
• varnishncsa – used to print shmlog as ncsa-styled log (similar Apache)
• varnishstat – display stadistics from varnish running instance
• varnishhist – very useful
• varnishreplay – utility parses Varnish logs and attempts to reproduce the
traffic.
• varnishtest – script driven program used to test the Varnish Cache
• varnishadm
– load different vcl configuration on-the-fly
– ban/purge/invalidate content cached
25. 25
VMOD Directory
Community Directory with varnish modules.
https://www.varnish-cache.org/vmods
• Useful module: QueryString
This module aims to become your Swiss Army knife to increase your hit
ratio by tweaking the query string of your incoming requests. The plugin is
still under development but it can already:
– remove or clean the query string
– filter specific query parameters based on a name list or a regexp
– sort the query parameters
26. 26
Tunning and best practices
• Data Size – Doc Link
Be aware that every object that is stored also carries overhead that is kept outside the actually storage
area. So, even if you specify -s malloc,16G Varnish might actually use double that. Varnish has a overhead
of about 1KB per object. So, if you have lots of small objects in your cache the overhead might be
significant.
• Check System Parameters – Doc Link
Be aware all the parameters (ex. Shortlived, sess_workspace)
• Storage Backend in RAM – Doc Link (-s malloc)
• Shared Memory Log (also called) mounted in RAM tmpfs – Doc Link
• Custom Timers: (connect_timeout, first_byte_timeout, between_bytes_timeout, send_timeout,
sess_timeout, cli_timeout)
• Timing thread growth (thread_pool_add_delay, thread_pool_timeout, thread_pool_fail_delay)
• Number of threads (thread_pool_min,thread_pool_max)
27. 27
Prof-concept without HTTPs
Without HTTPs = same confitions like other CDNs
Stress Benchmark with 1 instance: we can stress more this instance but we
need more resources to launch "siege" command.