Over the past year, the data team at Riot Games has been using Chef to both configure instances in Amazon Elastic Compute Cloud (EC2) and build AMIs. With Chef as an integral part of the workflow, we've autoscaled thousands of instances in support of the data pipeline for League of Legends and have found that Chef doesn't always play perfectly in the world of autoscaling groups and ephemeral instances. In this talk, we cover what's worked and what's failed and explain how to best utilize Chef in the world of Amazon Web Services.
4. WE WORK WITH BIG DATA
7+ BILLION
EVENTS PER DAY
TESTED @
70+ BILLION
EVENTS PER DAY
100+
TABLES
10+ TABLES
100MM TO 1B ROWS/DAY
7+ PETABYTE GAME EVENT DATASET
SEMI-GLOBAL
DEPLOYMENT
0 DOWNTIME
RUNS IN CLOUD
(AWS) + DATACENTER
9. WHAT PLAYERS WANT
OVERVIEW
TO PLAY LOL
‣
‣
‣
Minimize
Unplanned
Down2me
Rolling
Deploys
Expand
Capacity
NEW FEATURES
‣
‣
Low
Maintenance
Low
Barrier
for
New
Services
21. DATA CENTER VS. CLOUD
CORE CONCEPTS
DATA CENTER MODEL
Min. 2W & 3G
22. DATA CENTER VS. CLOUD
CORE CONCEPTS
DATA CENTER MODEL
W
Min. 2W & 3G
23. DATA CENTER VS. CLOUD
CORE CONCEPTS
DATA CENTER MODEL
W
W
Min. 2W & 3G
24. DATA CENTER VS. CLOUD
CORE CONCEPTS
DATA CENTER MODEL
W
W
G
Min. 2W & 3G
25. DATA CENTER VS. CLOUD
CORE CONCEPTS
DATA CENTER MODEL
W
W
G
G
Min. 2W & 3G
26. DATA CENTER VS. CLOUD
CORE CONCEPTS
DATA CENTER MODEL
W
G
W
G
G
Min. 2W & 3G
27. DATA CENTER VS. CLOUD
CORE CONCEPTS
DATA CENTER MODEL
W
G
W
G
G
Min. 2W & 3G
85%
28. DATA CENTER VS. CLOUD
CORE CONCEPTS
DATA CENTER MODEL
W
G
W
G
G
Min. 2W & 3G
85%
29. DATA CENTER VS. CLOUD
CORE CONCEPTS
DATA CENTER MODEL
W
W
W
G
G
G
Min. 2W & 3G
85%
30. DATA CENTER VS. CLOUD
CORE CONCEPTS
DATA CENTER MODEL
W
W
W
G
G
G
Min. 2W & 3G
MANUAL
manual scaling
no Chef required
85%
31. DATA CENTER VS. CLOUD
CORE CONCEPTS
AUTOSCALING CLOUD
DATA CENTER MODEL
W
W
W
G
G
G
Min. 2W & 3G
MANUAL
manual scaling
no Chef required
85%
32. DATA CENTER VS. CLOUD
CORE CONCEPTS
AUTOSCALING CLOUD
DATA CENTER MODEL
W
W
W
G
G
G
Min. 2W & 3G
MANUAL
manual scaling
no Chef required
85%
33. DATA CENTER VS. CLOUD
CORE CONCEPTS
AUTOSCALING CLOUD
DATA CENTER MODEL
W
W
W
W
W
W
W
G
G
G
G
G
G
G
Min. 2W & 3G
MANUAL
manual scaling
no Chef required
85%
G
34. DATA CENTER VS. CLOUD
CORE CONCEPTS
AUTOSCALING CLOUD
DATA CENTER MODEL
W
W
W
W
W
W
W
G
G
G
G
G
G
G
Min. 2W & 3G
MANUAL
manual scaling
no Chef required
85%
G
35. DATA CENTER VS. CLOUD
CORE CONCEPTS
AUTOSCALING CLOUD
DATA CENTER MODEL
W
W
W
W
W
W
W
G
G
G
G
G
G
G
Min. 2W & 3G
MANUAL
manual scaling
no Chef required
85%
AUTO
automatic scaling
automated provisioning needed
G
43. IMMUTABLE SERVERS
CORE CONCEPTS
LIMITED TO 2 ACTIONS:
1) Start & provision server
2) Kill it
WITH IMMUTABLE SERVERS, YOU CAN’T SSH INTO BOX & CHANGESTHINGS
PREVENT UNEXPECTED CHANGE TO SERVERS
44. IMMUTABLE SERVERS
CORE CONCEPTS
LIMITED TO 2 ACTIONS:
1) Start & provision server
2) Kill it
WITH IMMUTABLE SERVERS, YOU CAN’T SSH INTO BOX & CHANGESTHINGS
PREVENT UNEXPECTED CHANGE TO SERVERS
W1
W1
W1
MUTABLE SERVER:
CHANGE EACH BOX TO NEW VERSION
W1
W1
45. IMMUTABLE SERVERS
CORE CONCEPTS
LIMITED TO 2 ACTIONS:
1) Start & provision server
2) Kill it
WITH IMMUTABLE SERVERS, YOU CAN’T SSH INTO BOX & CHANGESTHINGS
PREVENT UNEXPECTED CHANGE TO SERVERS
W2
W1
W1
MUTABLE SERVER:
CHANGE EACH BOX TO NEW VERSION
W1
W1
46. IMMUTABLE SERVERS
CORE CONCEPTS
LIMITED TO 2 ACTIONS:
1) Start & provision server
2) Kill it
WITH IMMUTABLE SERVERS, YOU CAN’T SSH INTO BOX & CHANGESTHINGS
PREVENT UNEXPECTED CHANGE TO SERVERS
W2
W2
W1
MUTABLE SERVER:
CHANGE EACH BOX TO NEW VERSION
W1
W1
47. IMMUTABLE SERVERS
CORE CONCEPTS
LIMITED TO 2 ACTIONS:
1) Start & provision server
2) Kill it
WITH IMMUTABLE SERVERS, YOU CAN’T SSH INTO BOX & CHANGESTHINGS
PREVENT UNEXPECTED CHANGE TO SERVERS
W2
W2
W2
MUTABLE SERVER:
CHANGE EACH BOX TO NEW VERSION
W1
W1
48. IMMUTABLE SERVERS
CORE CONCEPTS
LIMITED TO 2 ACTIONS:
1) Start & provision server
2) Kill it
WITH IMMUTABLE SERVERS, YOU CAN’T SSH INTO BOX & CHANGESTHINGS
PREVENT UNEXPECTED CHANGE TO SERVERS
W2
W2
W2
MUTABLE SERVER:
CHANGE EACH BOX TO NEW VERSION
W2
W1
49. IMMUTABLE SERVERS
CORE CONCEPTS
LIMITED TO 2 ACTIONS:
1) Start & provision server
2) Kill it
WITH IMMUTABLE SERVERS, YOU CAN’T SSH INTO BOX & CHANGESTHINGS
PREVENT UNEXPECTED CHANGE TO SERVERS
W2
W2
W2
MUTABLE SERVER:
CHANGE EACH BOX TO NEW VERSION
W2
W2
50. IMMUTABLE SERVERS
CORE CONCEPTS
LIMITED TO 2 ACTIONS:
1) Start & provision server
2) Kill it
WITH IMMUTABLE SERVERS, YOU CAN’T SSH INTO BOX & CHANGESTHINGS
PREVENT UNEXPECTED CHANGE TO SERVERS
W1
W1
W1
IMMUTABLE SERVER:
FLIP ON BOXES WITH NEW VERSION,
SHUT OFF EXISTING BOXES
W1
W1
51. IMMUTABLE SERVERS
CORE CONCEPTS
LIMITED TO 2 ACTIONS:
1) Start & provision server
2) Kill it
WITH IMMUTABLE SERVERS, YOU CAN’T SSH INTO BOX & CHANGESTHINGS
PREVENT UNEXPECTED CHANGE TO SERVERS
W1
W1
W1
W2
W2
W2
IMMUTABLE SERVER:
FLIP ON BOXES WITH NEW VERSION,
SHUT OFF EXISTING BOXES
W1
W1
W2
W2
52. IMMUTABLE SERVERS
CORE CONCEPTS
LIMITED TO 2 ACTIONS:
1) Start & provision server
2) Kill it
WITH IMMUTABLE SERVERS, YOU PREVENT UNEXPECTED CHANGES TO SERVERS
W1
W1
W1
W2
W2
W2
IMMUTABLE SERVER:
FLIP ON BOXES WITH NEW VERSION,
SHUT OFF EXISTING BOXES
W1
W1
W2
W2
53. IMMUTABLE SERVERS
LIMITED TO 2 ACTIONS:
1) Start & provision server
2) Kill it
WITH IMMUTABLE SERVERS, YOU CAN’T SSH INTO BOX & CHANGE THINGS
No snowflakes
BENEFITS:
Easy rollbacks
Cleaner deploys
CORE CONCEPTS
55. WHAT’S REQUIRED
• All cookbooks in one tarball
• Somewhere to put the tarball
• Tell instances how to provision with the tarball
CHEF SOLO
56. WHAT’S REQUIRED
• All cookbooks in one tarball
• Somewhere to put the tarball
• Tell instances how to provision with the tarball
PACKAGING
• Use Berkshelf
• `berkshelf package COOKBOOK_NAME`
• `tar czvf cookbooks-VERSION.tgz ./cookbooks`
CHEF SOLO
57. WHAT’S REQUIRED
• All cookbooks in one tarball
• Somewhere to put the tarball
• Tell instances how to provision with the tarball
PACKAGING
• Use Berkshelf
• `berkshelf package COOKBOOK_NAME`
• `tar czvf cookbooks-VERSION.tgz ./cookbooks`
STORAGE OPTIONS
• S3
• `s3cmd` makes it easy to upload to S3
• Internal asset server
CHEF SOLO
58. CHEF SOLO
WHAT’S REQUIRED
PROVISIONING
• All cookbooks in one tarball
• Somewhere to put the tarball
• Tell instances how to provision with the tarball
• userdata.sh
• Get cookbooks off s3 and untar to /var/chef/
cookbooks
• Get first-boot.json off s3 -> /etc/chef/firstboot.json
• Get solo.rb off s3 -> /etc/chef/solo.rb
• Run `chef-solo`
PACKAGING
• Use Berkshelf
• `berkshelf package COOKBOOK_NAME`
• `tar czvf cookbooks-VERSION.tgz ./cookbooks`
STORAGE OPTIONS
• S3
• `s3cmd` makes it easy to upload to S3
• Internal asset server
59. CHEF SOLO: PROs VS CONs
CHEF SOLO
WHERE IT WORKS
WHERE IT FAILS
• Good when your Chef run is fast
• Easy to set up
• No single point of failure
• When Chef runs occasionally fail
• Configuration changes
• Service discovery
60. WHAT PLAYERS WANT
TO PLAY LOL
ü Minimize
Unplanned
Down2me
ü Rolling
Deploys
ü Expand
Capacity
CHEF SOLO
NEW FEATURES
q Low
Maintenance
q Low
Barrier
for
New
Services
63. WHAT’S REQUIRED
• Chef Server
• Cookbooks uploaded to Chef Server
• Tell instances how to provision with Chef Server
CHEF SERVER
64. WHAT’S REQUIRED
• Chef Server
• Cookbooks uploaded to Chef Server
• Tell instances how to provision with Chef Server
RUNNING CHEF SERVER
• Run
your own
• Hosted Chef
CHEF SERVER
65. WHAT’S REQUIRED
• Chef Server
• Cookbooks uploaded to Chef Server
• Tell instances how to provision with Chef Server
RUNNING CHEF SERVER
• Run your own
• Hosted Chef
GETTING COOKBOOKS TO CHEF SERVER
• Use berkshelf
• `berks upload COOKBOOK_NAME`
CHEF SERVER
66. WHAT’S REQUIRED
PROVISIONING
• Chef Server
• Cookbooks uploaded to Chef Server
• Tell instances how to provision with Chef Server
• userdata.sh
• Fetch validation.pem
• Fetch first-boot.json
• Fetch client.rb
• Run `chef-client …`
RUNNING CHEF SERVER
• Run your own
• Hosted Chef
GETTING COOKBOOKS TO CHEF SERVER
• Use berkshelf
• `berks upload COOKBOOK_NAME`
CHEF SERVER
67. CHEF SERVER: PROs VS CONs
CHEF SERVER
WHERE IT WORKS
WHERE IT FAILS
• Updating feature flag configuration
• Warning: Can affect rolling
deploys!
• Single point of failure in Chef Server
• Long Chef runs cause problems
• Can cause problems in large
organizations where multiple teams
depend on the same cookbook
68. WHAT PLAYERS WANT
TO PLAY LOL
q Minimize
Unplanned
Down2me
q Rolling
Deploys
q Expand
Capacity
CHEF SERVER
NEW FEATURES
ü Low
Maintenance
q Low
Barrier
for
New
Services
69. CHEF SERVER
RECOMMENDATION:
• Use chef-server when managing feature flags
• Not bad when you have multiple applications to
deploy
• Pro tips:
• Use a shutdown script to remove instances from
the chef server
• Only use chef in daemon mode if you’re certain
only feature flags will change
72. GOLDEN IMAGES
WHAT’S REQUIRED
• Means to create Golden Image
• Configuration management
Creating Image
• We use Chef-Solo
• Build an RPM
• Cookbook installs RPMs
73. GOLDEN IMAGES
WHAT’S REQUIRED
Configuration Management
• Means to create Golden Image
• Configuration management
• Archaius (on disk config)
• Chef
Creating Image
• We use Chef-Solo
• Build an RPM
• Cookbook installs RPMS
74. CHEF SOLO: PROs VS CONs
GOLDEN IMAGES
WHERE IT WORKS
WHERE IT FAILS
• Rollbacks are perfect and instances
always boot
• Boot times are very fast
• Need another method for configuration
management
• Need another method for service
discovery
• Releasing patches requires entire new
build
75. WHAT PLAYERS WANT
TO PLAY LOL
ü Minimize
Unplanned
Down2me
ü Rolling
Deploys
ü Expand
Capacity
GOLDEN IMAGES
NEW FEATURES
ü Low
Maintenance
q Low
Barrier
for
New
Services
76. GOLDEN IMAGES
RECOMMENDATION:
• Use when auto scaling is a must
• Requires an up-front effort to get going
• Make sure you can afford it
• Pro tip:
• Use the Netflix stack: Asgard, Aminator, and
Archaius