SlideShare a Scribd company logo
1 of 33
Download to read offline
Chef	
  Pa(erns	
  From	
  Building	
  Clusters	
  
Biju	
  Nair	
  
Boston	
  DevOps	
  Meetup	
  
08-­‐July-­‐2015	
  
Background	
  
•  Automate	
  build	
  &	
  management	
  of	
  clusters	
  	
  
– Hadoop	
  
– KaLa…	
  etc	
  
•  Pa(erns	
  which	
  can	
  be	
  used	
  elsewhere	
  
Movies	
  On	
  Demand	
  
Service	
  On	
  Demand	
  
•  Common	
  services	
  which	
  can	
  be	
  requested	
  
– Copy	
  logs	
  from	
  applicaQons	
  to	
  a	
  centralized	
  
locaQon	
  
– Service	
  available	
  on	
  all	
  the	
  nodes	
  
– ApplicaQons	
  can	
  request	
  the	
  service	
  dynamically	
  
Service	
  On	
  Demand	
  
•  Node	
  A(ribute	
  to	
  store	
  service	
  requests	
  
default['bcpc']['hadoop']['copylog'] = {}
{
'app_id' => { 'logfile' => "/path/file_name_of_log_file",
'docopy' => true (or false)
},...
}
•  Data	
  Structure	
  to	
  make	
  service	
  requests	
  
Service	
  On	
  Demand	
  
•  ApplicaQon	
  recipes	
  make	
  service	
  requests	
  
#
# Updating node attributes to copy HBase master log file to HDFS
#
node.default['bcpc']['hadoop']['copylog']['hbase_master'] = {
'logfile' => "/var/log/hbase/hbase-master-#{node.hostname}.log",
'docopy' => true
}
node.default['bcpc']['hadoop']['copylog']['hbase_master_out'] = {
'logfile' => "/var/log/hbase/hbase-master-#{node.hostname}.out",
'docopy' => true
}
Service	
  On	
  Demand	
  
•  Service	
  recipe	
  
node['bcpc']['hadoop']['copylog'].each do |id,f|
if f['docopy']
template "/etc/flume/conf/flume-#{id}.conf" do
source "flume_flume-conf.erb”
action :create ...
variables(:agent_name => "#{id}",
:log_location => "#{f['logfile']}" )
notifies :restart,"service[flume-agent-multi-#{id}]",:delayed
end
service "flume-agent-multi-#{id}" do
supports :status => true, :restart => true, :reload => false
service_name "flume-agent-multi"
action :start
start_command "service flume-agent-multi start #{id}"
restart_command "service flume-agent-multi restart #{id}"
status_command "service flume-agent-multi status #{id}"
end
•  Separate	
  role	
  at	
  the	
  end	
  of	
  run	
  list	
  	
  
Choices	
  
Pluggable	
  Alerts	
  
•  Single	
  source	
  for	
  monitored	
  stats	
  
– Allows	
  users	
  to	
  visualize	
  stats	
  across	
  different	
  
parameters	
  
– Didn’t	
  want	
  to	
  duplicate	
  the	
  stats	
  collecQon	
  by	
  
alerQng	
  system	
  
– Need	
  to	
  feed	
  data	
  to	
  the	
  alerQng	
  system	
  to	
  
generate	
  alerts	
  
Pluggable	
  Alerts	
  
•  A(ribute	
  where	
  users	
  can	
  define	
  alerts	
  
default["bcpc"]["hadoop"]["graphite"]["queries"] = {
'hbase_master' => [
{ 'type' => "jmx",
'query' => "memory.NonHeapMemoryUsage_committed",
'key' => "hbasenonheapmem",
'trigger_val' => "max(61,0)",
'trigger_cond' => "=0",
'trigger_name' => "HBaseMasterAvailability",
'trigger_dep' => ["NameNodeAvailability"],
'trigger_desc' => "HBase master seems to be down",
'severity' => 1
},{
'type' => "jmx",
'query' => "memory.HeapMemoryUsage_committed",
'key' => "hbaseheapmem",
...
},...], ’namenode' => [...] ...}
Pluggable	
  Alerts	
  
•  Recipes	
  and	
  templates	
  use	
  the	
  data	
  structure	
  
– To	
  generate	
  queries	
  to	
  pull	
  data	
  from	
  staQsQcs	
  
store	
  and	
  send	
  
•  h(ps://github.com/bloomberg/chef-­‐bach/blob/master/
cookbooks/bcpc-­‐hadoop/templates/default/
graphite.query_graphite.config.erb	
  
– To	
  create	
  requested	
  trigger	
  related	
  objects	
  in	
  
alarming	
  system	
  
•  h(ps://github.com/bloomberg/chef-­‐bach/blob/master/
cookbooks/bcpc-­‐hadoop/recipes/graphite_to_zabbix.rb	
  
Pluggable	
  Alerts	
  
•  Servers	
  Defined	
  in	
  role	
  is	
  used	
  by	
  recipes	
  
"default_attributes" : {
"jmxtrans": {
"servers": [
{
"type": "hbase_master",
"service": "hbase-master",
"service_cmd": "org.apache.hadoop.hbase.master.HMaster”
}, {
"type": "hbase_rs",
"service": "hbase-regionserver",
"service_cmd":
"org.apache.hadoop.hbase.regionserver.HRegionServer"
}
]
} ...
Dependency	
  
Service	
  Restart	
  
•  We	
  use	
  jmxtrans	
  to	
  monitor	
  jmx	
  stats	
  
– Services	
  to	
  be	
  monitored	
  varies	
  with	
  node	
  
– There	
  can	
  be	
  more	
  than	
  one	
  service	
  to	
  be	
  
monitored	
  
– Monitored	
  service	
  restart	
  requires	
  JMXtrans	
  to	
  be	
  
restarted**	
  
Service	
  Restart	
  
•  Data	
  structure	
  in	
  roles	
  to	
  define	
  the	
  services	
  
"default_attributes" : {
"jmxtrans": {
"servers": [
{
"type": "datanode",
"service": "hadoop-hdfs-datanode",
"service_cmd":
"org.apache.hadoop.hdfs.server.datanode.DataNode"
}, {
"type": "hbase_rs",
"service": "hbase-regionserver",
"service_cmd":
“org.apache.hadoop.hbase.regionserver.HRegionServer"
}
]
} ...
Service	
  Restart	
  
•  Jmxtrans	
  service	
  restart	
  logic	
  built	
  dynamically	
  
jmx_services = Array.new
jmx_srvc_cmds = Hash.new
node['jmxtrans']['servers'].each do |server|
jmx_services.push(server['service'])
jmx_srvc_cmds[server['service']] = server['service_cmd']
end
service "restart jmxtrans on dependent service" do
service_name "jmxtrans"
supports :restart => true, :status => true, :reload => true
action :restart
jmx_services.each do |jmx_dep_service|
subscribes :restart, "service[#{jmx_dep_service}]", :delayed
end
only_if {process_require_restart?("jmxtrans","jmxtrans-all.jar",
jmx_srvc_cmds)}
end
Service	
  Restart	
  
def process_require_restart?(process_name, process_cmd, dep_cmds)
tgt_proces_pid = `pgrep -f #{process_cmd}`
...
tgt_proces_stime = `ps --no-header -o start_time #{tgt_process_pid}`
...
ret = false
restarted_processes = Array.new
dep_cmds.each do |dep_process, dep_cmd|
dep_pids = `pgrep -f #{dep_cmd}`
if dep_pids != ""
dep_pids_arr = dep_pids.split("n")
dep_pids_arr.each do |dep_pid|
dep_process_stime = `ps --no-header -o start_time #{dep_pid}`
if DateTime.parse(tgt_proces_stime) <
DateTime.parse(dep_process_stime)
restarted_processes.push(dep_process)
ret = true
end ...
External	
  Dependency	
  
Rolling	
  Restart	
  	
  
•  Changes	
  to	
  configuraQon	
  
•  Availability	
  
– Toxic	
  ConfiguraQon	
  
•  ContenQon	
  
– Poll	
  &	
  Wait	
  
– Fail	
  the	
  Run	
  
– Simply	
  Skip	
  Service	
  Restart	
  and	
  Go	
  On	
  
•  Store	
  the	
  state	
  and	
  need	
  for	
  restart	
  
•  Breaks	
  assumpQons	
  of	
  Procedural	
  Chef	
  Runs	
  
Rolling	
  Restart	
  	
  
•  ZooKeeper	
  
– Service	
  specific	
  znode	
  as	
  lock	
  
•  Node	
  a(ribute	
  to	
  flag	
  restart	
  failures	
  
h(ps://github.com/bloomberg/chef-­‐bach/blob/rolling_restart/
cookbooks/bcpc-­‐hadoop/definiQons/hadoop_service.rb	
  
Change	
  Course	
  
Logic	
  InjecQon	
  
•  We	
  use	
  Community	
  cookbooks	
  
– Takes	
  care	
  of	
  standard	
  install,	
  enable	
  and	
  starQng	
  
of	
  services	
  
•  Need	
  to	
  add	
  logic	
  to	
  cookbook	
  recipes	
  
– Take	
  acQon	
  on	
  a	
  service	
  only	
  when	
  condiQons	
  are	
  
saQsfied	
  
– Take	
  acQon	
  on	
  a	
  service	
  based	
  on	
  dependent	
  
service	
  state	
  
Logic	
  InjecQon	
  
kafka_install node.kafka.version_install_dir do
from kafka_target_path
not_if { kafka_installed? }
end
template ::File.join(node.kafka.config_dir, 'server.properties') do
source 'server.properties.erb’
...
helpers(Kafka::Configuration)
if restart_on_configuration_change?
notifies :restart, 'service[kafka]', :delayed
end
end
service 'kafka' do
provider kafka_init_opts[:provider]
supports start: true, stop: true, restart: true, status: true
action kafka_service_actions
end
Logic	
  InjecQon	
  
•  Changes	
  to	
  standard	
  cookbook	
  
– Create	
  a	
  new	
  recipe	
  to	
  perform	
  service	
  acQon	
  
•  Resource	
  to	
  intercept	
  noQficaQons	
  to	
  service	
  resource	
  
•  Original	
  service	
  resource	
  	
  
• Add	
  node	
  attribute	
  which	
  stores	
  name	
  of	
  new	
  
recipe	
  
• Update	
  original	
  recipe	
  
– Remove	
  the	
  service	
  resource	
  from	
  the	
  original	
  
recipe	
  
– Replace	
  it	
  with	
  include_recipe	
  new_a(ribute	
  
Logic	
  InjecQon	
  
•  New	
  recipe	
  to	
  perform	
  service	
  acQons	
  
– First	
  step	
  is	
  the	
  ruby_block	
  to	
  intercept	
  
noQficaQons	
  
ruby_block 'coordinate-kafka-start' do
block do
Chef::Log.debug 'Default recipe to coordinate Kafka start is used'
end
action :nothing
notifies :restart, 'service[kafka]', :delayed
end
service 'kafka' do
provider kafka_init_opts[:provider]
supports start: true, stop: true, restart: true, status: true
action kafka_service_actions
end
Logic	
  InjecQon	
  
•  A(ribute	
  to	
  set	
  the	
  recipe	
  for	
  service	
  acQons	
  
#
# Attribute to set the recipe to used to coordinate Kafka service star
# if nothing is set the default recipe ”_coordinate" will be used
#
default.kafka.start_coordination.recipe = 'kafka::_coordinate'
Logic	
  InjecQon	
  
•  Changes	
  to	
  the	
  original	
  recipe	
  
kafka_install node.kafka.version_install_dir do
from kafka_target_path
not_if { kafka_installed? }
end
template ::File.join(node.kafka.config_dir, 'server.properties') do
source 'server.properties.erb’
...
helpers(Kafka::Configuration)
if restart_on_configuration_change?
notifies :create,'ruby_block[coordinate-kafka-start]’,immediately
end
end
include_recipe node.kafka.start_coordination.recipe
Logic	
  InjecQon	
  
•  Changes	
  in	
  wrapper	
  cookbook	
  
– Create	
  custom	
  recipe	
  in	
  wrapper	
  cookbook	
  
•  NoQficaQon	
  interceptor	
  ruby_block	
  should	
  be	
  first	
  
•  Logic	
  to	
  determine	
  service	
  restart	
  acQon	
  
•  service	
  resource	
  
•  Any	
  clean-­‐up	
  logic	
  
– Overwrite	
  a(ribute	
  with	
  custom	
  recipe	
  name	
  
Logic	
  InjecQon	
  
ruby_block 'coordinate-kafka-start' do
block do
Chef::Log.info 'Custom recipe to coordinate Kafka start/restart'
end ...
ruby_block 'restart-coordination' do
block do
Chef::Log.info 'Implement the process to coordinate the restart'
end ...
service 'kafka' do
provider kafka_init_opts[:provider]
supports start: true, stop: true, restart: true, status: true
...
ruby_block 'restart-coordination-cleanup' do
block do
Chef::Log.info 'Implement any cleanup logic required'
end
Logic	
  InjecQon	
  
•  Overwrite	
  a(ribute	
  to	
  set	
  the	
  custom	
  recipe	
  	
  
#
# Overwrite the community cookbook attribute with custom recipe name
#
default[:kafka][:start_coordination][:recipe] = 'kafka-bcpc::coordinate'
QuesQons	
  ?	
  
References	
  	
  
•  h(ps://github.com/bloomberg/chef-­‐bach	
  
•  h(p://blog.asquareb.com/blog/categories/
chef-­‐pa(erns/	
  
Thank	
  You!!	
  
bnair@asquareb.com	
  

More Related Content

What's hot

Fortify aws aurora_proxy_2019_pleu
Fortify aws aurora_proxy_2019_pleuFortify aws aurora_proxy_2019_pleu
Fortify aws aurora_proxy_2019_pleuMarco Tusa
 
Oracle Open World Thursday 230 ashmasters
Oracle Open World Thursday 230 ashmastersOracle Open World Thursday 230 ashmasters
Oracle Open World Thursday 230 ashmastersKyle Hailey
 
Processing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and SparkProcessing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and SparkBen Slater
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsCommand Prompt., Inc
 
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQLHBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQLCloudera, Inc.
 
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...DataStax
 
VirtaThon 2011 - Mining the AWR
VirtaThon 2011 - Mining the AWRVirtaThon 2011 - Mining the AWR
VirtaThon 2011 - Mining the AWRKristofferson A
 
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016DataStax
 
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016DataStax
 
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax
 
Ash masters : advanced ash analytics on Oracle
Ash masters : advanced ash analytics on Oracle Ash masters : advanced ash analytics on Oracle
Ash masters : advanced ash analytics on Oracle Kyle Hailey
 
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOxInfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOxInfluxData
 
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data TablesPostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data TablesSperasoft
 
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax
 
PostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingPostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingAmir Reza Hashemi
 

What's hot (20)

Fortify aws aurora_proxy_2019_pleu
Fortify aws aurora_proxy_2019_pleuFortify aws aurora_proxy_2019_pleu
Fortify aws aurora_proxy_2019_pleu
 
Oracle Open World Thursday 230 ashmasters
Oracle Open World Thursday 230 ashmastersOracle Open World Thursday 230 ashmasters
Oracle Open World Thursday 230 ashmasters
 
AWR reports-Measuring CPU
AWR reports-Measuring CPUAWR reports-Measuring CPU
AWR reports-Measuring CPU
 
Processing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and SparkProcessing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and Spark
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
 
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQLHBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
 
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
 
PostgreSQL Replication Tutorial
PostgreSQL Replication TutorialPostgreSQL Replication Tutorial
PostgreSQL Replication Tutorial
 
VirtaThon 2011 - Mining the AWR
VirtaThon 2011 - Mining the AWRVirtaThon 2011 - Mining the AWR
VirtaThon 2011 - Mining the AWR
 
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
 
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
 
Intro to ASH
Intro to ASHIntro to ASH
Intro to ASH
 
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
 
Ash masters : advanced ash analytics on Oracle
Ash masters : advanced ash analytics on Oracle Ash masters : advanced ash analytics on Oracle
Ash masters : advanced ash analytics on Oracle
 
AWR Sample Report
AWR Sample ReportAWR Sample Report
AWR Sample Report
 
Learning postgresql
Learning postgresqlLearning postgresql
Learning postgresql
 
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOxInfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
 
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data TablesPostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
 
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
 
PostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingPostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / Sharding
 

Viewers also liked

Managing Websphere Application Server certificates
Managing Websphere Application Server certificatesManaging Websphere Application Server certificates
Managing Websphere Application Server certificatesPiyush Chordia
 
Project Risk Management
Project Risk ManagementProject Risk Management
Project Risk ManagementBiju Nair
 
HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User ReferenceBiju Nair
 
NENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezzaNENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezzaBiju Nair
 
Hadoop security
Hadoop securityHadoop security
Hadoop securityBiju Nair
 
Websphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentalsWebsphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentalsBiju Nair
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance ImprovementBiju Nair
 
Netezza workload management
Netezza workload managementNetezza workload management
Netezza workload managementBiju Nair
 
Row or Columnar Database
Row or Columnar DatabaseRow or Columnar Database
Row or Columnar DatabaseBiju Nair
 
Netezza fundamentals for developers
Netezza fundamentals for developersNetezza fundamentals for developers
Netezza fundamentals for developersBiju Nair
 

Viewers also liked (11)

Managing Websphere Application Server certificates
Managing Websphere Application Server certificatesManaging Websphere Application Server certificates
Managing Websphere Application Server certificates
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
Project Risk Management
Project Risk ManagementProject Risk Management
Project Risk Management
 
HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User Reference
 
NENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezzaNENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezza
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Websphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentalsWebsphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentals
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
 
Netezza workload management
Netezza workload managementNetezza workload management
Netezza workload management
 
Row or Columnar Database
Row or Columnar DatabaseRow or Columnar Database
Row or Columnar Database
 
Netezza fundamentals for developers
Netezza fundamentals for developersNetezza fundamentals for developers
Netezza fundamentals for developers
 

Similar to Chef patterns

fog or: How I Learned to Stop Worrying and Love the Cloud
fog or: How I Learned to Stop Worrying and Love the Cloudfog or: How I Learned to Stop Worrying and Love the Cloud
fog or: How I Learned to Stop Worrying and Love the CloudWesley Beary
 
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)Wesley Beary
 
Learning Puppet basic thing
Learning Puppet basic thing Learning Puppet basic thing
Learning Puppet basic thing DaeHyung Lee
 
Railsconf2011 deployment tips_for_slideshare
Railsconf2011 deployment tips_for_slideshareRailsconf2011 deployment tips_for_slideshare
Railsconf2011 deployment tips_for_slidesharetomcopeland
 
What Makes a Good Chef Cookbook? (May 2014 Edition)
What Makes a Good Chef Cookbook? (May 2014 Edition)What Makes a Good Chef Cookbook? (May 2014 Edition)
What Makes a Good Chef Cookbook? (May 2014 Edition)Julian Dunn
 
Creating Reusable Puppet Profiles
Creating Reusable Puppet ProfilesCreating Reusable Puppet Profiles
Creating Reusable Puppet ProfilesBram Vogelaar
 
Future Decoded - Node.js per sviluppatori .NET
Future Decoded - Node.js per sviluppatori .NETFuture Decoded - Node.js per sviluppatori .NET
Future Decoded - Node.js per sviluppatori .NETGianluca Carucci
 
Rhebok, High Performance Rack Handler / Rubykaigi 2015
Rhebok, High Performance Rack Handler / Rubykaigi 2015Rhebok, High Performance Rack Handler / Rubykaigi 2015
Rhebok, High Performance Rack Handler / Rubykaigi 2015Masahiro Nagano
 
Nagios Conference 2014 - Rob Hassing - How To Maintain Over 20 Monitoring App...
Nagios Conference 2014 - Rob Hassing - How To Maintain Over 20 Monitoring App...Nagios Conference 2014 - Rob Hassing - How To Maintain Over 20 Monitoring App...
Nagios Conference 2014 - Rob Hassing - How To Maintain Over 20 Monitoring App...Nagios
 
Cutting through the fog of cloud
Cutting through the fog of cloudCutting through the fog of cloud
Cutting through the fog of cloudKyle Rames
 
Understanding OpenStack Deployments - PuppetConf 2014
Understanding OpenStack Deployments - PuppetConf 2014Understanding OpenStack Deployments - PuppetConf 2014
Understanding OpenStack Deployments - PuppetConf 2014Puppet
 
Streaming Way to Webscale: How We Scale Bitly via Streaming
Streaming Way to Webscale: How We Scale Bitly via StreamingStreaming Way to Webscale: How We Scale Bitly via Streaming
Streaming Way to Webscale: How We Scale Bitly via StreamingAll Things Open
 
Building Scalable Websites with Perl
Building Scalable Websites with PerlBuilding Scalable Websites with Perl
Building Scalable Websites with PerlPerrin Harkins
 

Similar to Chef patterns (20)

fog or: How I Learned to Stop Worrying and Love the Cloud
fog or: How I Learned to Stop Worrying and Love the Cloudfog or: How I Learned to Stop Worrying and Love the Cloud
fog or: How I Learned to Stop Worrying and Love the Cloud
 
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
 
Learning Puppet basic thing
Learning Puppet basic thing Learning Puppet basic thing
Learning Puppet basic thing
 
Railsconf2011 deployment tips_for_slideshare
Railsconf2011 deployment tips_for_slideshareRailsconf2011 deployment tips_for_slideshare
Railsconf2011 deployment tips_for_slideshare
 
What Makes a Good Chef Cookbook? (May 2014 Edition)
What Makes a Good Chef Cookbook? (May 2014 Edition)What Makes a Good Chef Cookbook? (May 2014 Edition)
What Makes a Good Chef Cookbook? (May 2014 Edition)
 
Creating Reusable Puppet Profiles
Creating Reusable Puppet ProfilesCreating Reusable Puppet Profiles
Creating Reusable Puppet Profiles
 
Future Decoded - Node.js per sviluppatori .NET
Future Decoded - Node.js per sviluppatori .NETFuture Decoded - Node.js per sviluppatori .NET
Future Decoded - Node.js per sviluppatori .NET
 
The Monitoring Playground
The Monitoring PlaygroundThe Monitoring Playground
The Monitoring Playground
 
Chef solo the beginning
Chef solo the beginning Chef solo the beginning
Chef solo the beginning
 
Cooking with Chef
Cooking with ChefCooking with Chef
Cooking with Chef
 
Rhebok, High Performance Rack Handler / Rubykaigi 2015
Rhebok, High Performance Rack Handler / Rubykaigi 2015Rhebok, High Performance Rack Handler / Rubykaigi 2015
Rhebok, High Performance Rack Handler / Rubykaigi 2015
 
Memcached Study
Memcached StudyMemcached Study
Memcached Study
 
Celery with python
Celery with pythonCelery with python
Celery with python
 
Puppet Camp 2012
Puppet Camp 2012Puppet Camp 2012
Puppet Camp 2012
 
Nagios Conference 2014 - Rob Hassing - How To Maintain Over 20 Monitoring App...
Nagios Conference 2014 - Rob Hassing - How To Maintain Over 20 Monitoring App...Nagios Conference 2014 - Rob Hassing - How To Maintain Over 20 Monitoring App...
Nagios Conference 2014 - Rob Hassing - How To Maintain Over 20 Monitoring App...
 
Cutting through the fog of cloud
Cutting through the fog of cloudCutting through the fog of cloud
Cutting through the fog of cloud
 
Understanding OpenStack Deployments - PuppetConf 2014
Understanding OpenStack Deployments - PuppetConf 2014Understanding OpenStack Deployments - PuppetConf 2014
Understanding OpenStack Deployments - PuppetConf 2014
 
Streaming Way to Webscale: How We Scale Bitly via Streaming
Streaming Way to Webscale: How We Scale Bitly via StreamingStreaming Way to Webscale: How We Scale Bitly via Streaming
Streaming Way to Webscale: How We Scale Bitly via Streaming
 
Building Scalable Websites with Perl
Building Scalable Websites with PerlBuilding Scalable Websites with Perl
Building Scalable Websites with Perl
 
Django Celery
Django Celery Django Celery
Django Celery
 

More from Biju Nair

Chef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scaleChef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scaleBiju Nair
 
HBase Internals And Operations
HBase Internals And OperationsHBase Internals And Operations
HBase Internals And OperationsBiju Nair
 
Apache Kafka Reference
Apache Kafka ReferenceApache Kafka Reference
Apache Kafka ReferenceBiju Nair
 
Serving queries at low latency using HBase
Serving queries at low latency using HBaseServing queries at low latency using HBase
Serving queries at low latency using HBaseBiju Nair
 
Multi-Tenant HBase Cluster - HBaseCon2018-final
Multi-Tenant HBase Cluster - HBaseCon2018-finalMulti-Tenant HBase Cluster - HBaseCon2018-final
Multi-Tenant HBase Cluster - HBaseCon2018-finalBiju Nair
 
Cursor Implementation in Apache Phoenix
Cursor Implementation in Apache PhoenixCursor Implementation in Apache Phoenix
Cursor Implementation in Apache PhoenixBiju Nair
 

More from Biju Nair (6)

Chef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scaleChef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scale
 
HBase Internals And Operations
HBase Internals And OperationsHBase Internals And Operations
HBase Internals And Operations
 
Apache Kafka Reference
Apache Kafka ReferenceApache Kafka Reference
Apache Kafka Reference
 
Serving queries at low latency using HBase
Serving queries at low latency using HBaseServing queries at low latency using HBase
Serving queries at low latency using HBase
 
Multi-Tenant HBase Cluster - HBaseCon2018-final
Multi-Tenant HBase Cluster - HBaseCon2018-finalMulti-Tenant HBase Cluster - HBaseCon2018-final
Multi-Tenant HBase Cluster - HBaseCon2018-final
 
Cursor Implementation in Apache Phoenix
Cursor Implementation in Apache PhoenixCursor Implementation in Apache Phoenix
Cursor Implementation in Apache Phoenix
 

Recently uploaded

Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 

Recently uploaded (20)

Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 

Chef patterns

  • 1. Chef  Pa(erns  From  Building  Clusters   Biju  Nair   Boston  DevOps  Meetup   08-­‐July-­‐2015  
  • 2. Background   •  Automate  build  &  management  of  clusters     – Hadoop   – KaLa…  etc   •  Pa(erns  which  can  be  used  elsewhere  
  • 4. Service  On  Demand   •  Common  services  which  can  be  requested   – Copy  logs  from  applicaQons  to  a  centralized   locaQon   – Service  available  on  all  the  nodes   – ApplicaQons  can  request  the  service  dynamically  
  • 5. Service  On  Demand   •  Node  A(ribute  to  store  service  requests   default['bcpc']['hadoop']['copylog'] = {} { 'app_id' => { 'logfile' => "/path/file_name_of_log_file", 'docopy' => true (or false) },... } •  Data  Structure  to  make  service  requests  
  • 6. Service  On  Demand   •  ApplicaQon  recipes  make  service  requests   # # Updating node attributes to copy HBase master log file to HDFS # node.default['bcpc']['hadoop']['copylog']['hbase_master'] = { 'logfile' => "/var/log/hbase/hbase-master-#{node.hostname}.log", 'docopy' => true } node.default['bcpc']['hadoop']['copylog']['hbase_master_out'] = { 'logfile' => "/var/log/hbase/hbase-master-#{node.hostname}.out", 'docopy' => true }
  • 7. Service  On  Demand   •  Service  recipe   node['bcpc']['hadoop']['copylog'].each do |id,f| if f['docopy'] template "/etc/flume/conf/flume-#{id}.conf" do source "flume_flume-conf.erb” action :create ... variables(:agent_name => "#{id}", :log_location => "#{f['logfile']}" ) notifies :restart,"service[flume-agent-multi-#{id}]",:delayed end service "flume-agent-multi-#{id}" do supports :status => true, :restart => true, :reload => false service_name "flume-agent-multi" action :start start_command "service flume-agent-multi start #{id}" restart_command "service flume-agent-multi restart #{id}" status_command "service flume-agent-multi status #{id}" end •  Separate  role  at  the  end  of  run  list    
  • 9. Pluggable  Alerts   •  Single  source  for  monitored  stats   – Allows  users  to  visualize  stats  across  different   parameters   – Didn’t  want  to  duplicate  the  stats  collecQon  by   alerQng  system   – Need  to  feed  data  to  the  alerQng  system  to   generate  alerts  
  • 10. Pluggable  Alerts   •  A(ribute  where  users  can  define  alerts   default["bcpc"]["hadoop"]["graphite"]["queries"] = { 'hbase_master' => [ { 'type' => "jmx", 'query' => "memory.NonHeapMemoryUsage_committed", 'key' => "hbasenonheapmem", 'trigger_val' => "max(61,0)", 'trigger_cond' => "=0", 'trigger_name' => "HBaseMasterAvailability", 'trigger_dep' => ["NameNodeAvailability"], 'trigger_desc' => "HBase master seems to be down", 'severity' => 1 },{ 'type' => "jmx", 'query' => "memory.HeapMemoryUsage_committed", 'key' => "hbaseheapmem", ... },...], ’namenode' => [...] ...}
  • 11. Pluggable  Alerts   •  Recipes  and  templates  use  the  data  structure   – To  generate  queries  to  pull  data  from  staQsQcs   store  and  send   •  h(ps://github.com/bloomberg/chef-­‐bach/blob/master/ cookbooks/bcpc-­‐hadoop/templates/default/ graphite.query_graphite.config.erb   – To  create  requested  trigger  related  objects  in   alarming  system   •  h(ps://github.com/bloomberg/chef-­‐bach/blob/master/ cookbooks/bcpc-­‐hadoop/recipes/graphite_to_zabbix.rb  
  • 12. Pluggable  Alerts   •  Servers  Defined  in  role  is  used  by  recipes   "default_attributes" : { "jmxtrans": { "servers": [ { "type": "hbase_master", "service": "hbase-master", "service_cmd": "org.apache.hadoop.hbase.master.HMaster” }, { "type": "hbase_rs", "service": "hbase-regionserver", "service_cmd": "org.apache.hadoop.hbase.regionserver.HRegionServer" } ] } ...
  • 14. Service  Restart   •  We  use  jmxtrans  to  monitor  jmx  stats   – Services  to  be  monitored  varies  with  node   – There  can  be  more  than  one  service  to  be   monitored   – Monitored  service  restart  requires  JMXtrans  to  be   restarted**  
  • 15. Service  Restart   •  Data  structure  in  roles  to  define  the  services   "default_attributes" : { "jmxtrans": { "servers": [ { "type": "datanode", "service": "hadoop-hdfs-datanode", "service_cmd": "org.apache.hadoop.hdfs.server.datanode.DataNode" }, { "type": "hbase_rs", "service": "hbase-regionserver", "service_cmd": “org.apache.hadoop.hbase.regionserver.HRegionServer" } ] } ...
  • 16. Service  Restart   •  Jmxtrans  service  restart  logic  built  dynamically   jmx_services = Array.new jmx_srvc_cmds = Hash.new node['jmxtrans']['servers'].each do |server| jmx_services.push(server['service']) jmx_srvc_cmds[server['service']] = server['service_cmd'] end service "restart jmxtrans on dependent service" do service_name "jmxtrans" supports :restart => true, :status => true, :reload => true action :restart jmx_services.each do |jmx_dep_service| subscribes :restart, "service[#{jmx_dep_service}]", :delayed end only_if {process_require_restart?("jmxtrans","jmxtrans-all.jar", jmx_srvc_cmds)} end
  • 17. Service  Restart   def process_require_restart?(process_name, process_cmd, dep_cmds) tgt_proces_pid = `pgrep -f #{process_cmd}` ... tgt_proces_stime = `ps --no-header -o start_time #{tgt_process_pid}` ... ret = false restarted_processes = Array.new dep_cmds.each do |dep_process, dep_cmd| dep_pids = `pgrep -f #{dep_cmd}` if dep_pids != "" dep_pids_arr = dep_pids.split("n") dep_pids_arr.each do |dep_pid| dep_process_stime = `ps --no-header -o start_time #{dep_pid}` if DateTime.parse(tgt_proces_stime) < DateTime.parse(dep_process_stime) restarted_processes.push(dep_process) ret = true end ...
  • 19. Rolling  Restart     •  Changes  to  configuraQon   •  Availability   – Toxic  ConfiguraQon   •  ContenQon   – Poll  &  Wait   – Fail  the  Run   – Simply  Skip  Service  Restart  and  Go  On   •  Store  the  state  and  need  for  restart   •  Breaks  assumpQons  of  Procedural  Chef  Runs  
  • 20. Rolling  Restart     •  ZooKeeper   – Service  specific  znode  as  lock   •  Node  a(ribute  to  flag  restart  failures   h(ps://github.com/bloomberg/chef-­‐bach/blob/rolling_restart/ cookbooks/bcpc-­‐hadoop/definiQons/hadoop_service.rb  
  • 22. Logic  InjecQon   •  We  use  Community  cookbooks   – Takes  care  of  standard  install,  enable  and  starQng   of  services   •  Need  to  add  logic  to  cookbook  recipes   – Take  acQon  on  a  service  only  when  condiQons  are   saQsfied   – Take  acQon  on  a  service  based  on  dependent   service  state  
  • 23. Logic  InjecQon   kafka_install node.kafka.version_install_dir do from kafka_target_path not_if { kafka_installed? } end template ::File.join(node.kafka.config_dir, 'server.properties') do source 'server.properties.erb’ ... helpers(Kafka::Configuration) if restart_on_configuration_change? notifies :restart, 'service[kafka]', :delayed end end service 'kafka' do provider kafka_init_opts[:provider] supports start: true, stop: true, restart: true, status: true action kafka_service_actions end
  • 24. Logic  InjecQon   •  Changes  to  standard  cookbook   – Create  a  new  recipe  to  perform  service  acQon   •  Resource  to  intercept  noQficaQons  to  service  resource   •  Original  service  resource     • Add  node  attribute  which  stores  name  of  new   recipe   • Update  original  recipe   – Remove  the  service  resource  from  the  original   recipe   – Replace  it  with  include_recipe  new_a(ribute  
  • 25. Logic  InjecQon   •  New  recipe  to  perform  service  acQons   – First  step  is  the  ruby_block  to  intercept   noQficaQons   ruby_block 'coordinate-kafka-start' do block do Chef::Log.debug 'Default recipe to coordinate Kafka start is used' end action :nothing notifies :restart, 'service[kafka]', :delayed end service 'kafka' do provider kafka_init_opts[:provider] supports start: true, stop: true, restart: true, status: true action kafka_service_actions end
  • 26. Logic  InjecQon   •  A(ribute  to  set  the  recipe  for  service  acQons   # # Attribute to set the recipe to used to coordinate Kafka service star # if nothing is set the default recipe ”_coordinate" will be used # default.kafka.start_coordination.recipe = 'kafka::_coordinate'
  • 27. Logic  InjecQon   •  Changes  to  the  original  recipe   kafka_install node.kafka.version_install_dir do from kafka_target_path not_if { kafka_installed? } end template ::File.join(node.kafka.config_dir, 'server.properties') do source 'server.properties.erb’ ... helpers(Kafka::Configuration) if restart_on_configuration_change? notifies :create,'ruby_block[coordinate-kafka-start]’,immediately end end include_recipe node.kafka.start_coordination.recipe
  • 28. Logic  InjecQon   •  Changes  in  wrapper  cookbook   – Create  custom  recipe  in  wrapper  cookbook   •  NoQficaQon  interceptor  ruby_block  should  be  first   •  Logic  to  determine  service  restart  acQon   •  service  resource   •  Any  clean-­‐up  logic   – Overwrite  a(ribute  with  custom  recipe  name  
  • 29. Logic  InjecQon   ruby_block 'coordinate-kafka-start' do block do Chef::Log.info 'Custom recipe to coordinate Kafka start/restart' end ... ruby_block 'restart-coordination' do block do Chef::Log.info 'Implement the process to coordinate the restart' end ... service 'kafka' do provider kafka_init_opts[:provider] supports start: true, stop: true, restart: true, status: true ... ruby_block 'restart-coordination-cleanup' do block do Chef::Log.info 'Implement any cleanup logic required' end
  • 30. Logic  InjecQon   •  Overwrite  a(ribute  to  set  the  custom  recipe     # # Overwrite the community cookbook attribute with custom recipe name # default[:kafka][:start_coordination][:recipe] = 'kafka-bcpc::coordinate'
  • 32. References     •  h(ps://github.com/bloomberg/chef-­‐bach   •  h(p://blog.asquareb.com/blog/categories/ chef-­‐pa(erns/