Why you need automation, configuration management and remote execution in your life. An intro to Ansible and how it can make your life in Ops infinitely easier.
3. Why We Sleep on the Couch
We’re the ones they call when
• “The website is down”
• The customer is having an “weird” issue
• A critical exploit just entered the wild
• “I can’t login to the domain”
• New deployment off hours
• Offshore needs 6 new VMs for testing
• “I’m doing a customer demo at 8AM”
4. Why We Own DNKN
We’re seriously overloaded
• Building VM templates
• Configuring clusters
• Storage management
• Networking
• Software deployment
• User/Group
Authentication/Authorization
• Performance Testing
• OS Upgrades
• Software Upgrades
• Security Patching
• Troubleshooting
• Customer Support
• Development Labs
• Alerting/Monitoring
5. Guys What R U Doin
• Building VM templates
• ISO install and configuration
• Network setup
• Set up users/group, security,
authentication/authorization
• Software install and
configuration
• Building out clusters
• Cloning N number of VMs from
X number of templates
• Hostname/network configuration
• Firewalling
• Software deployments
• Turn off monitoring/alerting
• Pull nodes out of Load
Balanced Group
• Run DB migrations
• Deploy application code
• Restart web server
• Put nodes back in/turn
monitoring back on
• Server maintenance
• SSH in to every server and
restart a service
• Write complex scripts to log in
to every server and update
openssl
6. Guys STAHP
• Ad hoc is bad hoc
• Complex shell scripts to account for every eventuality
• DRY (yeah I used to be a ruby dev)
• Any manual task can introduce human error
• They shouldn’t have to call you on your on vacation
7. Put an M on It
• Configuration Management (CM)
• authoritative centralization of configuration data and actions
• history of updates, changes for auditing purposes
• define the exact state a system should be in
• Infrastructure CM
• define the state that a system should be in with respect to it’s
configuration and use tools that achieve that state
• enforce consistency across an entire environment
• automate to increase efficiency and repeatability
• easier to affect change (cloud provider, OS, etc.)
• remove the human factor
• disaster recovery
8. Tool Time
• Puppet
• great with Windows (as long as they’re not XP)
• amazing Enterprise support
• cryptic DSL (imo)
• Chef
• easy to learn if you’re a ruby developer!
• amazing wealth of cookbooks
• Almost too verbose
• SaltStack
9. Ansible
• Agentless!
• Uses SSH (with one python requirement)
• Everything is a YAML file
• Structure is flexible (ad-hoc, playbooks, roles,
orchestration)
• Easily extensible via modules
• Encryption and security built in
• Full power at the CLI (open source!)
• Even more features available in enterprise (Tower)
• No Windows
• Idempotent
10. Idempo-what?
“Operations in mathematics and computer science, that
can be applied multiple times without changing the result
beyond the initial application.” – wikipedia
11. Idempodent: Example
• You need all your application servers’ tomcat setenv.sh to look like:
JAVA_HOME=/usr/java/latest
JAVA_OPTS="-Xms512m -Xmx1024m -XX:PermSize=128m -XX:MaxPermSize=256m"
CATALINA_HOME=/usr/local/tomcat
• You could get the job done with a classic echo command
echo JAVA_HOME=/usr/java/latest >> /usr/local/tomcat/bin/setenv.sh
echo JAVA_OPTS="-Xms512m -Xmx1024m -XX:PermSize=128m -XX:MaxPermSize=256m"
>> /usr/local/tomcat/bin/setenv.sh
echo CATALINA_HOME=/usr/local/tomcat >> /usr/local/tomcat/bin/setenv.sh
• But…
• what if these lines already exist?
• what if the file doesn’t exist on a few of the servers?
• what if you needed to run your script again to update/restore another setting?
• If you don’t know the beginning state of the system, the end state is unpredictable
12. Idempotent: Example
• You could write a more complex script
for i in $tomcat_env_file
do
echo "Processing file $i"
# i hate this trick, but since a non match is status code 1 and
that
# will kill this script do an unless here that forces a zero
has_java_home=$(grep -cE "^JAVA_HOME=" $i || true)
if [ $has_java_home -eq 0 ]; then
sed –f setenv-without-java-home.sed -i $i
elif [ $has_java_home -eq 1 ]; then
sed -f setenv-with-java-home.sed -i $i
else
echo "Something went very wrong. Please review $i and make
sure there is only a single line containing ’JAVA_HOME=' and run
script again"
fi
done
13. Ain’t Nobody Got Time For That
• Use an idempotent CM tool
• Tell the tool what state you need the system to be in
• Tool will get you from A->Z or B->Z or even C->Z
• Won’t get mangled configs
• Won’t get conflicting packages
• Won’t get mismatched versions
• Won’t get error messages you have to handle for each
unique case
14. Ansible: Example
• template the setenv.sh file we want
# {{ ansible_managed }}
JAVA_HOME={{java_home}}latest
JAVA_OPTS="-Xms512m -Xmx1024m -XX:PermSize=128m -XX:MaxPermSize=256m"
CATALINA_HOME={{tomcat_home}}
• provide defaults (in yml!)
---
java_home: /usr/java/
tomcat_home: /usr/local/tomcat
tomcat_user: {name: ‘tomcat’, group: ‘tomcat’}
18. Dynamic vSphere Inventory
• Any script that can output JSON can be used to generate
dynamic inventory
• Use pysphere (python) or rbvmomi (ruby) to communicate
with vSphere/vCenter
• Organize your VMs by folder or resource pool to translate
in to group
20. Dynamic vSphere Inventory
• Preface each command with the script
$ ANSIBLE_HOSTS="/src/vmware-ansible/query.py" ansible all -m ping
• Export an environmental variable
$ export ANSIBLE_HOSTS="/src/vmware-ansible/query.py”
$ ansible no_group -m ping
21. Modules
• can be written in any language as long as they output
JSON
• take parameters and conditions to define desired state
• handles processing of system resources, services,
packages, files, etc. in idempotent fashion
• “seek to avoid changes to the system unless a change
needs to be made”
• ansible comes preloaded with a plethora of modules
• tons of community pull requests
22. Ad Hoc Commands
• run a single, one-off command
• run on a full or partial inventory
• run on a single host
• no need to save for later
$ ansible webservers –m command –a “dpkg-query –W openssl” –u joe –k
SSH password:
foo.example.com | success | rc=0 >>
openssl 1.0.1e-2+deb7u10
bar.example.com | success | rc=0 >>
openssl 1.0.1e-2+deb7u10
23. Playbooks
• More powerful configuration management
• Kept in source control, developed, validated
• Declare configurations of more complex mutli-system
enviornments
• Arrange and run tasks synchronously or asynchronously
24. Playbooks: Example
---
- hosts: all
remote_user: vagrant
sudo: true
sudo_user: root
vars_files:
- roles/vars/webserver.encrypt
vars:
lifecycle: dev
roles:
- roles/debian
- roles/vmware-tools
- roles/local-users
- roles/sudoers
- roles/iptables
- roles/clamav
- roles/java-jdk-7
- roles/postgres
- roles/apache
- roles/tomcat-7
- { role: roles/tc-native, when: native== 'true' }
- roles/ansible
- roles/git
- roles/liquibase
- roles/cleanup
post_tasks:
- name: Reboot the Server
command: '/sbin/reboot'
- name: Wait for Server to come back
wait_for: host='{{inventory_hostname}} ’port='22’
sudo: no
delegate_to: localhost
- name: Wait for Services to start fully
wait_for: port='{{item}}' delay='5' timeout='600'
with_items:
- '8009' #ajp
- '8080' #tomcat
- '80' #httpd
25. Playbooks: Example
$ ansible-playbook –i production webserver.yml –k –K
$ ansible-playbook –i production webserver.yml –f 10 –k –K
$ ansible-playbook –i production webserver.yml --list-hosts -k –K
$ ansible-playbook –i production webserver.yml –-check –k –K
27. Variables:
• Simple YAML format
• Can create arrays and hashes
• Can substitute vars into vars
• Vars can be defined at many levels (default, role
,playbook)
• Can test conditionals on vars and require them
• Can be filtered and manipulated with jinja2
• Can be matched to regex!
29. Templates
• Templates are interpreted by jinja2
• stub out files
• fill variables in differently depending on conditions
• Powerful conditionals
• Loops and iterators
• Replace a file completely every time?
• Yes. We configure for an end state.
30. Templates: Example
# {{ ansible_managed }}
Defaults env_reset
Defaults mail_badpass
Defaults secure_path="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin”
# User privilege specification
root ALL=(ALL:ALL) ALL
# Allow members of group sudo to execute any command
%sudo ALL=(ALL:ALL) ALL
{% for item in admins %}
{% if item.nopasswd is true %}
{{item.name}} ALL= NOPASSWD: ALL
Defaults:{{item.name}} !requiretty
{% else %}
{{item.name}} ALL=(ALL) ALL
{% endif %}
{% endfor %}
{% if ad is defined %}
{% for item in ad.sudoers_groups %}
%{{item}} ALL=(ALL) ALL
{% endfor %}
{% endif %}
31. Templates: Example
# Ansible managed: /tmp/packer-provisioner-ansible-local/roles/roles/sudoers/templates/sudoers-
debian.j2 modified on 2014-06-09 10:08:44 by vagrant on vagrant
Defaults env_reset
Defaults mail_badpass
Defaults secure_path="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin”
# User privilege specification
root ALL=(ALL:ALL) ALL
# Allow members of group sudo to execute any command
%sudo ALL=(ALL:ALL) ALL
yoda ALL= NOPASSWD: ALL
Defaults:yoda !requiretty
luke ALL=(ALL) ALL
anakin ALL=(ALL) ALL
%jedi ALL=(ALL) ALL
32. Handlers
• Written just like a regular task
• Only run if triggered by the notify directive
• Indicates a change in the system state
• Any module can be used for the handler action
Handler
- name: Restart Tomcat
service: name=tomcat state=restarted
Task
- name: Apache Tomcat | Configure | Overlay configuration
template: src=‘{{item.file}}' dest='{{item.target}}’
with_items: tomcat.config_files
notify: Restart Tomcat
33. Roles
• Break up configuration into repeatable chunks
• Reduce, reuse, recycle
• Clean, understandable structure
• Stack on top of each other
• Ansible Galaxy
36. Roles
• Dependencies
• Always run before the roles that depend on them
• If dependencies are duplicated amongst roles, they will only be run
once by default
• Can use allow_duplicates to require a role to be run more than
once with different conditions
---
dependencies:
- { role: liquibase }
- { role: apache, port: 80 }
- { role: postgres, dbname: appdb, bind_nice: eth1 }
37. Orchestration
• “Rolling Updates”
• Performing very complex infrastructure or cluster
operations
• Run plays in serial instead of parallel
• Wait for certain conditions to move forward
• Abort if certain percentage of failure
38. Orchestration: Example
• turn off monitoring and alerting
• remove application server from load balanced group
• stop services
• wait for services to stop fully
• checkout new code from git
• deploy webapp
• restart services
• wait for services to start fully
• return to load blanced group
39. Example: Simple Service Restart
• Problem
• 50ish production customer VMs
• Older CentOS 5 mixed with CentOS 6
• May or may not have python installed
• Domain authentication
• Need to restart livevault service
40. Example: Simple Service Restart
• Create inventory
• Dump IP addresses of customer VM into simple ansible inventory
[customer_vms:vars]
[customer_vms]
192.168.32.117
192.168.32.39
192.168.34.176
192.168.34.28
192.168.33.100
192.168.32.197
192.168.34.181
192.168.34.158
...
41. Example: Simple Service Restart
• Use an ad hoc command to make sure VMs are
bootstrapped for Ansible
$ ansible cusomter_vms -i oldvms –u domainjoe -s -U root -m raw
-a "sudo yum install -y python-simplejson" -k –K
• Restart the live vault service
$ ansible customer_vms –i oldvms –u domainjoe –s –U root -m
service –a "name=livevault state=restarted" –k -K
42. Example: Heartbleed
• openssl exploit
• good news: patched for your OS
• other packages updated along with openssl
• 6 different environments (production, test, demo, etc.)
• may require service restarts
• need verification of final state version
43. Example: Heartbleed
---
hosts: all
sudo: yes
sudo_user: root
tasks:
- name: OpenSSL | Get current version
shell: 'dpkg-query -W openssl'
register: openssl_version
- name: OpenSSL | Get current version
shell: 'dpkg-query -W libssl1.0.0'
register: libssl_version
- name: OpenSSL | Confirm new version
debug: msg="OpenSSL version installed is {{openssl_version.stdout}}, libssl version
installed is {{libssl_version.stdout}}"
- name: OpenSSL | Apt | Install debconf-utils
apt: pkg='debconf-utils' state='latest'
44. Example: Heartbleed
- name: OpenSSL | Apt | Prevent restart services dialog
debconf: name='libssl1.0.0' question='libssl1.0.0/restart-services' vtype='string' value='ntp’
- name: OpenSSL | Apt | Prevent restart services dialog
debconf: name='libssl1.0.0:amd64' question='libssl1.0.0/restart-services' vtype='string'
value='ntp’
- name: OpenSSL | Apt | Upgrade Openssl
apt: pkg='{{item}}' state='latest' update_cache='yes' install_recommends='yes' force='yes'
with_items:
- 'openssl'
- 'libssl1.0.0'
- name: OpenSSL | Get new version
shell: 'dpkg-query -W openssl'
register: openssl_version
- name: OpenSSL | Get new version
shell: 'dpkg-query -W libssl1.0.0'
register: libssl_version
- name: OpenSSL | Confirm new version
debug: msg="OpenSSL version installed is {{openssl_version.stdout}}, libssl version installed is
{{libssl_version.stdout}}"
45. Example: Heartbleed
$ ansible-playbook -i cloud-daily, openssl.yml -u joe -k -K
SSH password:
sudo password [defaults to SSH password]:
PLAY [all] ********************************************************************
GATHERING FACTS ***************************************************************
ok: [cloud-daily]
TASK: [OpenSSL | Get current version] *****************************************
changed: [cloud-daily]
TASK: [OpenSSL | Get current version] *****************************************
changed: [cloud-daily]
TASK: [OpenSSL | Confirm new version] *****************************************
ok: [cloud-daily] => {
"msg": "OpenSSL version installed is opensslt1.0.1e-2+deb7u9, libssl version installed is
libssl1.0.0:amd64t1.0.1e-2+deb7u9"
}
46. Example: Heartbleed
TASK: [OpenSSL | Apt | Install debconf-utils] *********************************
ok: [cloud-daily]
TASK: [OpenSSL | Apt | Prevent restart services dialog] ***********************
ok: [cloud-daily]
TASK: [OpenSSL | Apt | Prevent restart services dialog] ***********************
ok: [cloud-daily]
TASK: [OpenSSL | Apt | Upgrade Openssl] ***************************************
changed: [cloud-daily] => (item=openssl,libssl1.0.0)
TASK: [OpenSSL | Get new version] *********************************************
changed: [cloud-daily]
TASK: [OpenSSL | Get new version] *********************************************
changed: [cloud-daily]
TASK: [OpenSSL | Confirm new version] *****************************************
ok: [cloud-daily] => {
"msg": "OpenSSL version installed is opensslt1.0.1e-2+deb7u11, libssl version installed is
libssl1.0.0:amd64t1.0.1e-2+deb7u11"
}
PLAY RECAP ********************************************************************
cloud-daily : ok=11 changed=5 unreachable=0 failed=0
47. Example: Join Domain
• every new VM needs to be added to a domain
• packages needed (winbind/samba)
• domain could depend on environment
• samba/winbind configuration different per machine
• sudoers will be different per machine
• domain admin must authenticate
• this happens a lot
• reusable playbook and roles
55. Example: Join Domain
• sudoers
sudoers/
tasks/
main.yml
templates/
sudoers-debian.j2
• tasks (main.yml)
---
- name: User | sudo Configure | Don't always set home and Preserve env home
template: src='sudoers-debian.j2' dest='/tmp/sudoers' owner='root' group='root'
mode='0600' validate='visudo -cf %s’
- name: User | sudo Configure | Place new config
shell: 'cp -vf /tmp/sudoers /etc/sudoers’
- name: User | sudo Configure | Clean up temporary files
file: path='/tmp/sudoers' state='absent’
56. Example: Join Domain
• templates (sudoers-debian.j2)
# {{ ansible_managed }}
Defaults env_reset
Defaults mail_badpass
Defaults secure_path="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
root ALL=(ALL:ALL) ALL
%sudo ALL=(ALL:ALL) ALL
{% for item in admins %}
{% if item.nopasswd == 'true' %}
{{item.name}} ALL= NOPASSWD: ALL
Defaults:{{item.name}} !requiretty
{% else %}
{{item.name}} ALL=(ALL) ALL
{% endif %}
{% endfor %}
{% if ad_sudoers_groups is defined %}
{% for item in ad_sudoers_groups %}
%{{item}} ALL=(ALL) ALL
{% endfor %}
{% endif %}
57. Example: Join Domain
• Put it all together in a playbook
---
- hosts: all
sudo: True
sudo_user: root
vars_prompt:
- name: "ad_domain"
prompt: "Domain to join (e.g. office.lan)"
private: no
- name: "ad_domain_admin_username"
prompt: "Domain Admin username"
private: no
- name: "ad_domain_admin_password"
prompt: "Domain Admin password"
private: yes
vars_files:
- ../roles/active-directory-join/vars/{{ad_domain}}.encrypt
roles:
- ../roles/active-directory
- ../roles/active-directory-join
- ../roles/sudoers
tasks:
- name: Reboot the Server
command: '/sbin/reboot'
- name: Wait for Server to come back
wait_for: host='{{inventory_hostname}}' port='22' delay='5' timeout='300'
sudo: no
delegate_to: localhost
58. Example: Join Domain
$ ansible-playbook -i new-vm-clone, ad-join.yml –u joe -k -K
SSH password:
sudo password [defaults to SSH password]:
Domain to join (e.g. office.lan): office.lan
Domain Admin username: admin
Domain Admin password:
PLAY [all] ********************************************************************
GATHERING FACTS ***************************************************************
ok: [new-vm-clone]
TASK: [../roles/active-directory | AD Authentication| Install | Install dependencies for AD authentication] ***
ok: [new-vm-clone] => (item=krb5-user,libpam-krb5,winbind,samba)
TASK: [../roles/active-directory | AD Authentication | Configure | Allow for authentication using winbind] ***
changed: [new-vm-clone]
TASK: [../roles/active-directory-join | AD Authentication | Configure | Place kerberos config for domain
authentication] ***
changed: [new-vm-clone]
TASK: [../roles/active-directory-join | AD Authentication | Configure | Place samba config for domain
authentication] ***
changed: [new-vm-clone]
TASK: [../roles/active-directory-join | AD Authentication | Configure | Start services and enable on boot] ***
changed: [new-vm-clone] => (item=winbind)
59. Example: Join Domain
TASK: [../roles/active-directory-join | AD Authentication | Configure | Start services and do not enable on boot] ***
changed: [new-vm-clone] => (item=samba)
TASK: [../roles/active-directory-join | AD Authentication | Configure | kinit] ***
changed: [new-vm-clone]
TASK: [../roles/active-directory-join | AD Authentication | Configure | Join Active Directory] ***
changed: [new-vm-clone]
TASK: [../roles/active-directory-join | AD Authentication | Configure | Enable pam authentication via winbind] ***
changed: [new-vm-clone] => (item={'name': 'common-session-interactive.j2', 'target': 'common-session-interactive'})
changed: [new-vm-clone] => (item={'name': 'common-password.j2', 'target': 'common-password'})
changed: [new-vm-clone] => (item={'name': 'common-account.j2', 'target': 'common-account'})
changed: [new-vm-clone] => (item={'name': 'common-auth.j2', 'target': 'common-auth'})
changed: [new-vm-clone] => (item={'name': 'sudo.j2', 'target': 'sudo'})
TASK: [../roles/active-directory-join | AD Authentication | Configure | Set domain controllers to be ntp servers] ***
ok: [new-vm-clone] => (item=ad1.office.lan)
ok: [new-vm-clone] => (item=ad2.office.lan)
TASK: [../roles/active-directory-join | AD Authentication | Configure | Restart services] ***
changed: [new-vm-clone] => (item=winbind)
changed: [new-vm-clone] => (item=samba)
60. Example: Join Domain
TASK: [../roles/sudoers | User | sudo Configure | Don't always set home and Preserve env home] ***
changed: [new-vm-clone]
TASK: [../roles/sudoers | User | sudo Configure | Place new config] ***********
changed: [new-vm-clone]
TASK: [../roles/sudoers | User | sudo Configure | Clean up temporary files] ***
changed: [new-vm-clone]
TASK: [Reboot the Server] *****************************************************
changed: [new-vm-clone]
TASK: [Wait for Server to come back] ******************************************
ok: [new-vm-clone]
PLAY RECAP ********************************************************************
new-vm-clone : ok=18 changed=13 unreachable=0 failed=0
61. Example: Server provisioner
• Build and configure webserver
---
#packer provisioning only
- hosts: all
connection: local
remote_user: vagrant
sudo: True
sudo_user: root
vars_files:
- roles/vars/cloud.encrypt
vars:
lifecycle: 'production'
build_flavor: 'cloud'
app_flavor: 'app'
roles:
- roles/debian
- roles/vmware-tools
- roles/local-users
- roles/active-directory
- roles/cloud-baseline
- roles/sudoers
- roles/iptables
- roles/java-jdk-7
- roles/tomcat-7
- { role: roles/tomcat-native, when: native == 'true' }
- roles/ansible
- roles/app-dynamics
- roles/opsview
- roles/cleanup
- roles/git
tasks:
- name: Reboot the Server
command: '/sbin/reboot'
- name: Wait for Server to come back
wait_for: host='{{inventory_hostname}}' port='22’
sudo: no
delegate_to: localhost
- name: Wait for Services to start fully
wait_for: port='{{item}}' delay='5' timeout='600'
with_items:
- '8009' #ajp
- '8080' #tomcat
62. Where do I go from here?
• Stop doing everything by hand!
• If you find yourself logging in to more than one VM to do
the same task...
• If you have been meaning to get around to patching or
updating a bunch of VMs...
• If you know all of the prompts of the OS installer by
heart...
• If scp and vi are your favorite tools...
• If you dread the next release of your application
• If you wince every time your phone rings