5. What?
• Chef at Etsy
• Familiarity and Understanding
• Critical Approach and Experimentation
6. What?
• Chef at Etsy
• Familiarity and Understanding
• Critical Approach and Experimentation
• Use The Source
7. What?
• Chef at Etsy
• Familiarity and Understanding
• Critical Approach and Experimentation
• Use The Source
• A liberal sprinkling of screwups
8. What?
• Chef at Etsy
• Familiarity and Understanding
• Critical Approach and Experimentation
• Use The Source
• A liberal sprinkling of screwups
• Open Sourced Goodness - We’re all here!
9. What?
• Chef at Etsy
• Familiarity and Understanding
• Critical Approach and Experimentation
• Use The Source
• A liberal sprinkling of screwups
• Open Sourced Goodness - We’re all here!
• [x] = http://tiny.cc/velocity2012
10. Opscode is Orange,
Velocity is Blue.
In Soviet Russia,
Cookbook writes you.
14. Our Setup
• Open Source chef server 0.10.4
• Backup to Opscode Platform
15. Our Setup
• Open Source chef server 0.10.4
• Backup to Opscode Platform
• ~800 self-hosted nodes (Mainly CentOS,
some RHEL & mac)
16. Our Setup
• Open Source chef server 0.10.4
• Backup to Opscode Platform
• ~800 self-hosted nodes (Mainly CentOS,
some RHEL & mac)
• KVM & lxc virts, self hosted too.
17. Our Setup
• Open Source chef server 0.10.4
• Backup to Opscode Platform
• ~800 self-hosted nodes (Mainly CentOS,
some RHEL & mac)
• KVM & lxc virts, self hosted too.
• Never test in production!
18. Our Setup
• Open Source chef server 0.10.4
• Backup to Opscode Platform
• ~800 self-hosted nodes (Mainly CentOS,
some RHEL & mac)
• KVM & lxc virts, self hosted too.
• Never test in production!
• Many chefs don’t spoil our soup
19. Our Setup
• Open Source chef server 0.10.4
• Backup to Opscode Platform
• ~800 self-hosted nodes (Mainly CentOS,
some RHEL & mac)
• KVM & lxc virts, self hosted too.
• Never test in production!
• Many chefs don’t spoil our soup
33. Chef Dashboard
• Chef handler sends metrics to graphite [4]
• git clone git://github.com/etsy/chef-handlers.git
34. Chef Dashboard
• Chef handler sends metrics to graphite [4]
• git clone git://github.com/etsy/chef-handlers.git
• Set your graphite server’s URL in
graphite.rb
35. Chef Dashboard
• Chef handler sends metrics to graphite [4]
• git clone git://github.com/etsy/chef-handlers.git
• Set your graphite server’s URL in
graphite.rb
• Add the following to client.rb
36. Chef Dashboard
• Chef handler sends metrics to graphite [4]
• git clone git://github.com/etsy/chef-handlers.git
• Set your graphite server’s URL in
graphite.rb
• Add the following to client.rb
• require "<clonedir>/graphite.rb"
graphite_handler = GraphiteReporting.new
report_handlers << graphite_handler
exception_handlers << graphite_handler
37. Chef Dashboard
• Chef handler sends metrics to graphite [4]
• git clone git://github.com/etsy/chef-handlers.git
• Set your graphite server’s URL in
graphite.rb
• Add the following to client.rb
• require "<clonedir>/graphite.rb"
graphite_handler = GraphiteReporting.new
report_handlers << graphite_handler
exception_handlers << graphite_handler
• Etsy dashboards framework [5]
38. [12:54:01] <irccat> Chef run failed on gfernandez.vm.ny4dev.etsy.com
[12:54:02] <irccat> https://github.etsycorp.com/gist/384228
[12:54:02] <irccat>
[12:54:07] <irccat> Chef run failed on buildtest11.ny4dev.etsy.com
[12:54:07] <irccat> https://github.etsycorp.com/gist/384227
[12:54:07] <irccat>
41. Chef irccat Alerts
• Chef handler send fails to irc[4] via irccat [6]
• git clone git://github.com/etsy/chef-handlers.git
42. Chef irccat Alerts
• Chef handler send fails to irc[4] via irccat [6]
• git clone git://github.com/etsy/chef-handlers.git
• Set your irccat[6] server’s URL in
logtoirc.rb
43. Chef irccat Alerts
• Chef handler send fails to irc[4] via irccat [6]
• git clone git://github.com/etsy/chef-handlers.git
• Set your irccat[6] server’s URL in
logtoirc.rb
• Add the following to client.rb
44. Chef irccat Alerts
• Chef handler send fails to irc[4] via irccat [6]
• git clone git://github.com/etsy/chef-handlers.git
• Set your irccat[6] server’s URL in
logtoirc.rb
• Add the following to client.rb
• require "<clonedir>/logtoirc.rb"
exception_handlers << Etsy::LogToIRC.new
45. ~ > knife node lastrun buildtest11.ny4dev.etsy.com
Status failed
Elapsed Time 4.628171438
Start Time 2012-06-18 10:06:28 +0000
End Time 2012-06-18 10:06:32 +0000
Recipe Action Resource Type Resource
Backtrace
<snip>
Exception
Chef::Exceptions::Package: package[php] (php::buildtest line 20) had
an error: Version 5.3.10-1.el5 of php not found. Did you specify both
version and release? (version-release, e.g. 1.84-10.fc6)
46. ~ > knife search node 'lastrun_debug_formatted_exception:Chef:
:Exceptions::Package*' -a lastrun.debug.formatted_exception
5 items found
id: masterrestore.ny4.etsy.com
lastrun.debug.formatted_exception: Chef::Exceptions::Package:
package[postgresql-server] (postgresql::server-8.3 line 1) had an
error: Installed package postgresql-server-8.3.16-1PGDG_id is newer
than candidate package postgresql-server-8.3.11-1PGDG_id.rhel5
id: buildtest11.ny4dev.etsy.com
lastrun.debug.formatted_exception: Chef::Exceptions::Package:
package[php] (php::buildtest line 20) had an error: Version
5.3.10-1.el5 of php not found. Did you specify both version and
release? (version-release, e.g. 1.84-10.fc6)
<snip>
49. Chef lastrun Info
• Chef handler and knife plugin [7]
• gem install knife-lastrun
50. Chef lastrun Info
• Chef handler and knife plugin [7]
• gem install knife-lastrun
• Add the following to client.rb
51. Chef lastrun Info
• Chef handler and knife plugin [7]
• gem install knife-lastrun
• Add the following to client.rb
• require "lastrun_update"
handler = LastRunUpdateHandler.new
report_handlers << handler
exception_handlers << handler
52. Chef lastrun Info
• Chef handler and knife plugin [7]
• gem install knife-lastrun
• Add the following to client.rb
• require "lastrun_update"
handler = LastRunUpdateHandler.new
report_handlers << handler
exception_handlers << handler
• knife node lastrun <nodename>
56. Simplicity
• Think of yourself at 3AM!
• Please, won’t you think of the new guy?
• Minimize the logics!
57. Simplicity
• Think of yourself at 3AM!
• Please, won’t you think of the new guy?
• Minimize the logics!
• As few logical steps from start to finish
as possible.
58. Simplicity - Not!
Date: Mon Dec 05 2011 23:07:18 GMT+0000 (GMT)
Subject: so close to death
# Don't install v2 on search or Cent 5.6 nodes
-if node[:fqdn] !~ /b(^(preprod-)?search[0-9]{2}|ny4dev.etsy.com|^(preprod-)?
giftsweb[0-9]{2}|^db(shard|spare|data)[0-9]{2}|^qa-web01|^devsearch[0-9]{2}|^nagios01|
^webnest[0-9]{2}|^prodking[0-9]{2}|^sandboxweb[0-9]{2}|^virt((0[5-9])|(1[0-9]))|
^msysmgr[0-9]{2}|^msysmta[0-9]{2}|^dbconvo[0-9]{2}|^dbshowcase01|atlasweb[0-9]{2}|
devnagios[0-9]{2}|cimaster02|worker[0-9]{2}|^ganglia[0-9]{2}|^imgcache[0-9]{2}|
imgconvert[0-9]{2}|^imgwriter[0-9]{2}|dev-img02|^datacache04|^graphite01|^graphite03|
^webutil03|^webutil04|^statsd01|^maintweb[0-9]{2}|^(dev-|preprod-)?convosearch[0-9]{2}|
deployinator[0-9]{2}|^wpadmin01|^(preprod-)?dbtasks[0-9]{2})b/ and node.role?("Web56") ==
false and node.role?("Preprodweb56") == false and node.role?("Princess53") == false
+if node[:fqdn] !~ /b(^(preprod-)?search[0-9]{2}|ny4dev.etsy.com|^(preprod-)?
giftsweb[0-9]{2}|^db(shard|spare|data)[0-9]{2}|^qa-web01|^devsearch[0-9]{2}|^nagios01|
^webnest[0-9]{2}|^prodking[0-9]{2}|^sandboxweb[0-9]{2}|^virt((0[5-9])|(1[0-9]))|
^msysmgr[0-9]{2}|^msysmta[0-9]{2}|^dbconvo[0-9]{2}|^dbshowcase01|atlasweb[0-9]{2}|
devnagios[0-9]{2}|cimaster02|worker[0-9]{2}|^ganglia[0-9]{2}|^imgcache[0-9]{2}|
imgconvert[0-9]{2}|^imgwriter[0-9]{2}|dev-img02|^datacache04|^graphite01|^graphite03|
^webutil03|^webutil04|^statsd01|^maintweb[0-9]{2}|^(dev-|preprod-)?convosearch[0-9]{2}|
deployinator[0-9]{2}|^wpadmin01|^(preprod-)?dbtasks[0-9]{2})b/ and node.role?("Web56") ==
false and node.role?("Preprodweb56") == false and node.role?("Princess53") == false and
node.role?("Auth") == false
59. Simplicity - Better!
if node.chef_environment == "libmemcached_upgrade"
package "libmemcached" do
version "1.0.4-1"
action :install
end
<snip>
else
package "libmemcached" do
version "0.53-1.1"
action :install
end
<snip>
end
67. Case Study: Syslog-ng
• 36 recipes
• 30 versions of syslog-ng.conf
• 27 manually configured files in /etc/syslog-
ng.d on central server
68. Case Study: Syslog-ng
• 36 recipes
• 30 versions of syslog-ng.conf
• 27 manually configured files in /etc/syslog-
ng.d on central server
• Edge cases and exceptions galore
72. Case Study: Syslog-ng
• Down to:
• 2 recipes (one client, one server)
• 2 templates (one for syslog-ng.conf, one
for stuff in /etc/syslog-ng.d)
73. Case Study: Syslog-ng
• Down to:
• 2 recipes (one client, one server)
• 2 templates (one for syslog-ng.conf, one
for stuff in /etc/syslog-ng.d)
• Attributes in roles
74. Case Study: Syslog-ng
• Down to:
• 2 recipes (one client, one server)
• 2 templates (one for syslog-ng.conf, one
for stuff in /etc/syslog-ng.d)
• Attributes in roles
• Not open sourced yet, sorry :(
77. Remember, No
Panacea!
• A new package hits the repo.
78. Remember, No
Panacea!
• A new package hits the repo.
• Are you in control of when it goes out?
79. Remember, No
Panacea!
• A new package hits the repo.
• Are you in control of when it goes out?
• Memcached Outage
80. Remember, No
Panacea!
• A new package hits the repo.
• Are you in control of when it goes out?
• Memcached Outage
• Do you know what services are going to
restart and when?
81. Remember, No
Panacea!
• A new package hits the repo.
• Are you in control of when it goes out?
• Memcached Outage
• Do you know what services are going to
restart and when?
• Image Service Outage
86. Standards - No Time!
• I won’t say “Make Time”, but you should...
87. Standards - No Time!
• I won’t say “Make Time”, but you should...
• For a quick win, try Foodcritic
88. Standards - No Time!
• I won’t say “Make Time”, but you should...
• For a quick win, try Foodcritic
• Good out of the box rules
89. Standards - No Time!
• I won’t say “Make Time”, but you should...
• For a quick win, try Foodcritic
• Good out of the box rules
• Jenkins integration in seconds
90. Standards - No Time!
• I won’t say “Make Time”, but you should...
• For a quick win, try Foodcritic
• Good out of the box rules
• Jenkins integration in seconds
• Supports custom rules
91. Standards - No Time!
• I won’t say “Make Time”, but you should...
• For a quick win, try Foodcritic
• Good out of the box rules
• Jenkins integration in seconds
• Supports custom rules
• Plays well with others
99. Standards at Etsy
• “style” not “correctness”[9]
• ETSY001 - Package or yum_package resource used with :upgrade action
100. Standards at Etsy
• “style” not “correctness”[9]
• ETSY001 - Package or yum_package resource used with :upgrade action
• ETSY002 - Execute resource used to run git commands
101. Standards at Etsy
• “style” not “correctness”[9]
• ETSY001 - Package or yum_package resource used with :upgrade action
• ETSY002 - Execute resource used to run git commands
• ETSY003 - Execute resource used to run curl or wget commands
102. Standards at Etsy
• “style” not “correctness”[9]
• ETSY001 - Package or yum_package resource used with :upgrade action
• ETSY002 - Execute resource used to run git commands
• ETSY003 - Execute resource used to run curl or wget commands
• ETSY004 - Execute resource defined without conditional or
action :nothing
103. Standards at Etsy
• “style” not “correctness”[9]
• ETSY001 - Package or yum_package resource used with :upgrade action
• ETSY002 - Execute resource used to run git commands
• ETSY003 - Execute resource used to run curl or wget commands
• ETSY004 - Execute resource defined without conditional or
action :nothing
• ETSY005 - Action :restart sent to a core service
104. Standards at Etsy
• “style” not “correctness”[9]
• ETSY001 - Package or yum_package resource used with :upgrade action
• ETSY002 - Execute resource used to run git commands
• ETSY003 - Execute resource used to run curl or wget commands
• ETSY004 - Execute resource defined without conditional or
action :nothing
• ETSY005 - Action :restart sent to a core service
• ETSY006 - Execute resource used to run chef-provided command
105. Standards at Etsy
• “style” not “correctness”[9]
• ETSY001 - Package or yum_package resource used with :upgrade action
• ETSY002 - Execute resource used to run git commands
• ETSY003 - Execute resource used to run curl or wget commands
• ETSY004 - Execute resource defined without conditional or
action :nothing
• ETSY005 - Action :restart sent to a core service
• ETSY006 - Execute resource used to run chef-provided command
• ETSY007 - Package or yum_package resource used to install core
package without specific version number
106. Standards at Etsy
• “style” not “correctness”[9]
• ETSY001 - Package or yum_package resource used with :upgrade action
• ETSY002 - Execute resource used to run git commands
• ETSY003 - Execute resource used to run curl or wget commands
• ETSY004 - Execute resource defined without conditional or
action :nothing
• ETSY005 - Action :restart sent to a core service
• ETSY006 - Execute resource used to run chef-provided command
• ETSY007 - Package or yum_package resource used to install core
package without specific version number
109. Standards at Etsy
• ETSY001 - Written after Memcached
Outage
• Package or yum_package resource used
with :upgrade action
110. Standards at Etsy
• ETSY001 - Written after Memcached
Outage
• Package or yum_package resource used
with :upgrade action
• package "memcached" do
action :upgrade
end
113. Standards at Etsy
• ETSY005 - Written after a total Image
service outage
• Action :restart sent to a core service
114. Standards at Etsy
• ETSY005 - Written after a total Image
service outage
• Action :restart sent to a core service
• cookbook_file "/etc/httpd/conf.d/myvhost.conf" do
source "myvhost.conf"
notifies :restart, resources(:service => "httpd")
end
118. CA&E
• Chef is by necessity generic
• ...so don’t take Opscode’s word for it.
119. CA&E
• Chef is by necessity generic
• ...so don’t take Opscode’s word for it.
• If it doesn’t work well for you, change it!
120. CA&E
• Chef is by necessity generic
• ...so don’t take Opscode’s word for it.
• If it doesn’t work well for you, change it!
• Case Study - Etsy Environments rollout
127. Env: Standard Workflow
• knife cookbook show php
• Change version number in metadata.rb
128. Env: Standard Workflow
• knife cookbook show php
• Change version number in metadata.rb
• Change version constraint in foo.json
129. Env: Standard Workflow
• knife cookbook show php
• Change version number in metadata.rb
• Change version constraint in foo.json
• Commit and push changes to git
130. Env: Standard Workflow
• knife cookbook show php
• Change version number in metadata.rb
• Change version constraint in foo.json
• Commit and push changes to git
• knife cookbook upload php --freeze
131. Env: Standard Workflow
• knife cookbook show php
• Change version number in metadata.rb
• Change version constraint in foo.json
• Commit and push changes to git
• knife cookbook upload php --freeze
• knife environment from file foo.json
139. Env: Solutions?
• Go with it and hope for the best?
• Don’t use environments?
• Write a totally new workflow?
140. Env: Solutions?
• Go with it and hope for the best?
• Don’t use environments?
• Write a totally new workflow?
• Tweak the existing one with some
tooling?
147. Spork: Workflow
• Wrapper around standard environments
workflow
• check - cookbook versioning
• bump - increment version component
148. Spork: Workflow
• Wrapper around standard environments
workflow
• check - cookbook versioning
• bump - increment version component
• upload - upload and freeze
149. Spork: Workflow
• Wrapper around standard environments
workflow
• check - cookbook versioning
• bump - increment version component
• upload - upload and freeze
• promote - set env constraints
150. Spork: Check
$> knife spork check apache2
Checking versions for cookbook apache2...
Current local version: 1.0.6
Remote versions (Max. 5 most recent only):
*1.0.6, frozen
1.0.5, frozen
<snip>
DANGER: Your local cookbook has same version number as
the starred version above!
Please bump your local version or you won't be able to
upload.
151. Spork: Bump
$> knife spork bump apache2 <major|minor|patch|manual>
Bumping patch level of the apache2 cookbook from 1.0.6
to 1.0.7
153. Spork: Promote
$> knife spork promote foo php
Adding version constraint php = 1.0.6
Saving changes into foo.json
Promotion complete! Please remember to upload your
changed Environment file to the Chef Server.
---
$> knife spork promote foo php --remote
Adding version constraint php = 0.1.0
Saving changes into foo.json
Uploading foo to server
156. Spork
• Worked well, avoided the issues it was
designed for
• Subsequently evolved
157. Spork
• Worked well, avoided the issues it was
designed for
• Subsequently evolved
• A lot of input & work came from Devs
158. Spork
• Worked well, avoided the issues it was
designed for
• Subsequently evolved
• A lot of input & work came from Devs
• Organic evolution
159. Spork
• Worked well, avoided the issues it was
designed for
• Subsequently evolved
• A lot of input & work came from Devs
• Organic evolution
• Open sourced, of course
165. Spork: Safety Checks
$> knife spork promote php --remote
<snip>
WARNING: It looks like you have multiple cookbook paths
defined so I can't tell if you're running inside a git
repo.
Checking that php version 0.1.93 exists on the server
before promoting (any error means it hasn't been
uploaded yet)...
ERROR: The object you are looking for could not be found
Response: Cannot find a cookbook named php with version
0.1.93
166. Spork: Safety Checks
• Before promoting, check version is
uploaded...
• Check if you’re promoting changes to more
than you thought....
167. Spork: Safety Checks
WARNING: You're about to promote changes to several
cookbooks:
WARNING:
ganglia: = 0.1.26 changed to = 0.1.25
installerz: = 0.1.66 changed to = 0.1.65
php: = 0.1.92 changed to = 0.1.93
Are you sure you want to continue? (Y/N) N
You said no, so I'm done here.
Would you like to reset your local development.json to
match the server?? (Y/N) Y
<snip>
development.json reset.