SlideShare a Scribd company logo
1 of 27
Download to read offline
AUTO-SCALE A SELF-HEALING
CLUSTER IN OPENSTACK
2018 Việt Nam OpenInfraDay
Rico Lin, irc: ricolin <rico.lin@easystack.cn> @ EasyStack
Xin chào các bạn, Mình tên là Rico Lin, đến từ Đài Loan, lần đầu tiên
sang Việt Nam, cảm thấy rất thích và vui. Hôm nay Mình sẽ chia sẽ
cho các bạn, chủ đề là AUTO-SCALE A SELF-HEALING CLUSTER IN
OPENSTACK
October
2018
_____________ A _______________
_________ IN OPENSTACK
_____________ A _______________
CLUSTER IN OPENSTACK
A Unit in Application cluster
Pool
Network
Subnet
Loadbalancer
Floating IP Heal monitor
Pool Member
Nova
Nginx
Unit with Heat
Software Deploy
Nova Server
What you can install with
● heat-config-ansible
● heat-config-apply-config
● heat-config-cfn-init
● heat-config-chef
● heat-config-docker-cmd
● heat-config-docker-compose
● heat-config-hiera
● heat-config-json-file
● heat-config-kubelet
● heat-config-puppet
● heat-config-salt
● heat-config-script
And you can customize your own
hook
os-collect-config
os-refresh-config
os-apply-config
kubelet-hook$ kubelet
Webserver
done
config-notify
Signal
● CCFN_SIGNAL
● TEMP_URL_SIGNAL
● NO_SIGNAL
● HEAT_SIGNAL
● ZAQAR_SIGNAL
Software Config
Pool
Network
Subnet
Loadbalancer
Floating IP Heal monitor
Pool Member
Nginx
Heat Container Agent
Heat container agents [sample in repo]
Software Deploy
Nova Server
What you can install with
● heat-config-ansible
● heat-config-apply-config
● heat-config-cfn-init
● heat-config-chef
● heat-config-docker-cmd
● heat-config-docker-compose
● heat-config-hiera
● heat-config-json-file
● heat-config-kubelet
● heat-config-puppet
● heat-config-salt
● heat-config-script
And you can customize your own
hook
os-collect-config
os-refresh-config
os-apply-config
kubelet-hook$ kubelet
Webserver
done
config-notify
Signal
● CCFN_SIGNAL
● TEMP_URL_SIGNAL
● NO_SIGNAL
● HEAT_SIGNAL
● ZAQAR_SIGNAL
Dockers
Software Config
Pool
Network
Subnet
Loadbalancer
Floating IP Heal monitor
Pool Member
Heat container agents [sample in repo]
config:
type: OS::Heat::SoftwareConfig
properties:
group: script
outputs:
- name: result
config: { get_file: example-script.sh }
deployment:
type: OS::Heat::SoftwareDeployment
properties:
config: { get_resource: config }
server: { get_resource: server }
start_container_agent:
type: OS::Heat::SoftwareConfig
properties:
group: ungrouped
config: {get_file: ./start-container-agent.sh}
server:
type: OS::Nova::Server
properties:
image: {get_param: image}
flavor: {get_param: flavor}
key_name: {get_param: key_name}
networks:
- network: {get_param: private_net}
security_groups:
- {get_resource: the_sg}
user_data_format: SOFTWARE_CONFIG
user_data: {get_attr: [start_container_agent, config]}
#!/bin/bash
set -ux
# heat-docker-agent service
cat <<EOF > /etc/systemd/system/heat-container-agent.service
[Unit]
Description=Heat Container Agent
After=docker.service
Requires=docker.service
[Service]
TimeoutSec=5min
RestartSec=5min
User=root
Restart=on-failure
ExecStartPre=-/usr/bin/docker rm -f heat-container-agent
ExecStartPre=-/usr/bin/docker pull
docker.io/rico/heat-container-agent
ExecStart=/usr/bin/docker run --name heat-container-agent 
--privileged 
--net=host 
-v /run/systemd:/run/systemd 
-v /etc/sysconfig:/etc/sysconfig 
-v /etc/systemd/system:/etc/systemd/system 
-v /var/lib/heat-cfntools:/var/lib/heat-cfntools 
-v /var/lib/cloud:/var/lib/cloud 
-v /tmp:/tmp 
-v /etc/hosts:/etc/hosts 
docker.io/rico/heat-container-agent
ExecStop=/usr/bin/docker stop heat-container-agent
[Install]
WantedBy=multi-user.target
EOF
# enable and start heat-container-agent
chmod 0640 /etc/systemd/system/heat-container-agent.service
/usr/bin/systemctl enable heat-container-agent.service
/usr/bin/systemctl start --no-block heat-container-agent.service
Demo
_____________ A SELF-HEALING
CLUSTER IN OPENSTACK
Self Healing
XXX::Server
XXX::Signal XXX::Alarm
XXX::Workflow
Signal
Meter
Trigger
XXX::AutoScaling
Fix
Self Healing
XXX::Server
XXX::Signal XXX::Alarm
XXX::Workflow
Signal
Meter
Trigger
XXX::AutoScaling
How you
metering?
How you
handle
signal?
How you
trigger a
fix job
What's
meter to
you?
Fix
Self Healing
server:
type: OS::Nova::Server
properties:
...
alarm_queue:
type: OS::Zaqar::Queue
error_event_alarm:
type: OS::Aodh::EventAlarm
properties:
event_type: compute.instance.update
query:
- field: traits.instance_id
value: {get_resource: server}
op: eq
- field: traits.state
value: error
op: eq
alarm_queues:
- {get_resource: alarm_queue}
alarm_subscription:
type: OS::Zaqar::MistralTrigger
properties:
queue_name: {get_resource: alarm_queue}
workflow_id: {get_resource: autoheal}
input:
stack_id: {get_param: "OS::stack_id"}
root_stack_id:
if:
- is_standalone
- {get_param: "OS::stack_id"}
- {get_param: "root_stack_id"}
autoheal:
type: OS::Mistral::Workflow
properties:
description: >
Mark a server as unhealthy and commence a stack update
to replace it.
input:
stack_id:
root_stack_id:
type: direct
tasks:
- name: resources_mark_unhealthy
action:
list_join:
- ' '
- - heat.resources_mark_unhealthy
- stack_id=<% $.stack_id %>
- resource_name=<%
env().notification.body.reason_data.event.traits.where($[0] =
'instance_id').select($[2]).first() %>
- mark_unhealthy=true
- resource_status_reason='Marked by alarm'
on_success:
- stacks_update
- name: stacks_update
action: heat.stacks_update stack_id=<% $.root_stack_id
%> existing=true
OpenStack Self Healing SIG[link]
Demo
AUTO-SCALE A SELF-HEALING
CLUSTER IN OPENSTACK
Auto Scaling
AutoScalingGroup
ScalingPolicy XXX::Alarm
Signal
Meter
Trigger
Scale
Auto Scaling
AutoScalingGroup
ScalingPolicy XXX::Alarm
Signal
Meter
Trigger
What to
Alarm
Scale
What to
scale
Auto Scaling https://github.com/openstack/heat-templates/tree/master/hot/autoscaling.yaml
resources:
asg:
type: OS::Heat::AutoScalingGroup
properties:
min_size: 1
max_size: 3
resource:
type: lb_server.yaml
properties:
flavor: {get_param: flavor}
image: {get_param: image}
web_server_scaleup_policy:
type: OS::Heat::ScalingPolicy
properties:
adjustment_type: change_in_capacity
auto_scaling_group_id: {get_resource: asg}
cooldown: 60
scaling_adjustment: 1
# min_adjustment_step:
web_server_scaledown_policy:
type: OS::Heat::ScalingPolicy
properties:
adjustment_type: change_in_capacity
auto_scaling_group_id: {get_resource: asg}
cooldown: 60
scaling_adjustment: -1
cpu_alarm_high:
type: OS::Aodh::GnocchiAggregationByResourcesAlarm
properties:
description: Scale up if CPU > 80%
metric: cpu_util
aggregation_method: mean
granularity: 300
evaluation_periods: 1
threshold: 80
resource_type: instance
comparison_operator: gt
alarm_actions:
- str_replace:
template: trust+url
params:
url: {get_attr: [web_server_scaleup_policy, signal_url]}
query:
list_join:
- ''
- - {'=': {server_group: {get_param: "OS::stack_id"}}}
cpu_alarm_low:
type: OS::Aodh::GnocchiAggregationByResourcesAlarm
resources:
asg:
type: OS::Heat::AutoScalingGroup
properties:
min_size: 1
max_size: 3
resource:
type: lb_server.yaml
properties:
flavor: {get_param: flavor}
image: {get_param: image}
web_server_scaleup_policy:
type: OS::Heat::ScalingPolicy
properties:
adjustment_type: change_in_capacity
auto_scaling_group_id: {get_resource: asg}
cooldown: 60
scaling_adjustment: 1
monitoring:
type: monitor.yaml
properties:
url: get_attr: [web_server_scaleup_policy, signal_url]
ScalingPolicy
Stack
Monitor service AutoScalingGroup
Instance
1
1.Metering
2 N
2.Alarm
3.Scale
Choose your own structure
resources:
asg:
type: OS::Heat::AutoScalingGroup
properties:
min_size: 1
max_size: 3
resource:
type: lb_server.yaml
properties:
flavor: {get_param: flavor}
image: {get_param: image}
web_server_scaleup_policy:
type: OS::Heat::ScalingPolicy
properties:
adjustment_type: change_in_capacity
auto_scaling_group_id: {get_resource: asg}
cooldown: 60
scaling_adjustment: 1
outputs:
signal_url:
value: {get_attr: [web_server_scaleup_policy, signal_url]}
ScalingPolicy
Stack
Monitor service
AutoScalingGroup
Instance
1
1.Metering
2 N
2.Alarm
3.Scale
Choose your own structure
resources:
asg:
type: OS::Heat::AutoScalingGroup
properties:
min_size: 1
max_size: 3
resource:
type: lb_server.yaml
properties:
flavor: {get_param: flavor}
image: {get_param: image}
web_server_scaleup_policy:
type: OS::Heat::ScalingPolicy
properties:
adjustment_type: change_in_capacity
auto_scaling_group_id: {get_resource: asg}
cooldown: 60
scaling_adjustment: 1
outputs:
signal_url:
value: {get_attr: [web_server_scaleup_policy, signal_url]}
ScalingPolicy
Stack
Monitor service
AutoScalingGroup
Instance
1
1.Metering
2 N
2.Alarm
3.Scale
Choose your own structure
curl -i -H "X-Auth-Token: $TOKEN" -X POST $Signal_url
curl -i -H "Content-Type: application/json" -d '{ "auth": { "identity": { "methods":
["password"], "password": { "user": { "name": "admin", "domain": { "id":
"default" }, "password": "password" } } }, "scope": { "project": {
"name": "admin", "domain": { "id": "default" } } } }}'
http://$KEYSTONE/identity/v3/auth/tokens ; echo
Look into options for auto-scaling
OS::Heat::AutoScalingGroup
● Properties
○ resource:
■ type: web_server.yaml
■ properties
○ min_size: 10
○ max_size: 100
○ cooldown: 30
○ desired_capacity: 30
○ rolling_updates
■ min_in_service: 5
■ max_batch_size: 10
■ pause_time: 15
● Attributes
○ outputs
○ outputs_list
○ current_size
○ refs [IDs]
○ refs_map {[names: IDs]}
Look into options for auto-scaling
OS::Heat::ScalingPolicy
● Properties
○ adjustment_type: change_in_capacity
■ exact_capacity
■ change_in_capacity
■ percent_change_in_capacity
○ auto_scaling_group_id: asg_id
○ cooldown: 60
○ scaling_adjustment: 5
○ # min_adjustment_step:
● Attributes
○ alarm_url
○ signal_url
Demo
• Review https://goo.gl/4KL1gN
• StoryBoard (Bugs/BP)
https://storyboard.openstack.org/#!/project_group/82
• StoryBoard guide
https://etherpad.openstack.org/p/Heat-StoryBoard-Migration-Info
• Documents https://docs.openstack.org/heat/latest/
• Release Notes https://docs.openstack.org/releasenotes/heat/
• Feedback or Provide ideas = irc: #heat
• Feedback your Use cases
https://etherpad.openstack.org/p/heat-usecases
• Team meeting time Wednesday 14:00 UTC #heat (meeting wiki and
archive)
Join Heat
➔ Boston Summit
◆ Heat project update [ slide & video ]
◆ Heat Onboarding [ slide & video ]
➔ Sydney Summit
◆ Heat project update [ slide & video ]
◆ Heat Onboarding [ slide & video ]
➔ Vancouver Summit
◆ Heat project update [ slide & video ]
◆ Heat Onboarding [ slide & video ]
➔ Heat templates
➔ PTG Etherpad
Q & A
Links: demo video
If you wondering what your product or you can interact with Open
Source Cloud community: Embrace community! Embrace Life!

More Related Content

What's hot

[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?
[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?
[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?
OpenStack Korea Community
 
[OpenStack Days Korea 2016] Track3 - 오픈스택 환경에서 공유 파일 시스템 구현하기: 마닐라(Manila) 프로젝트
[OpenStack Days Korea 2016] Track3 - 오픈스택 환경에서 공유 파일 시스템 구현하기: 마닐라(Manila) 프로젝트[OpenStack Days Korea 2016] Track3 - 오픈스택 환경에서 공유 파일 시스템 구현하기: 마닐라(Manila) 프로젝트
[OpenStack Days Korea 2016] Track3 - 오픈스택 환경에서 공유 파일 시스템 구현하기: 마닐라(Manila) 프로젝트
OpenStack Korea Community
 

What's hot (20)

VietOpenStack meetup 7th Auto-scaling
VietOpenStack meetup 7th  Auto-scalingVietOpenStack meetup 7th  Auto-scaling
VietOpenStack meetup 7th Auto-scaling
 
[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?
[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?
[OpenStack Days Korea 2016] Track1 - 카카오는 오픈스택 기반으로 어떻게 5000VM을 운영하고 있을까?
 
Overview of Distributed Virtual Router (DVR) in Openstack/Neutron
Overview of Distributed Virtual Router (DVR) in Openstack/NeutronOverview of Distributed Virtual Router (DVR) in Openstack/Neutron
Overview of Distributed Virtual Router (DVR) in Openstack/Neutron
 
Quick, resilient and auto scaling cluster with Senlin [Meetup #21 - 03]
Quick, resilient and auto scaling cluster with Senlin [Meetup #21 - 03]Quick, resilient and auto scaling cluster with Senlin [Meetup #21 - 03]
Quick, resilient and auto scaling cluster with Senlin [Meetup #21 - 03]
 
DevOps Meetup ansible
DevOps Meetup   ansibleDevOps Meetup   ansible
DevOps Meetup ansible
 
How to Survive an OpenStack Cloud Meltdown with Ceph
How to Survive an OpenStack Cloud Meltdown with CephHow to Survive an OpenStack Cloud Meltdown with Ceph
How to Survive an OpenStack Cloud Meltdown with Ceph
 
OpenDaylight SDN Controller - Introduction
OpenDaylight SDN Controller - IntroductionOpenDaylight SDN Controller - Introduction
OpenDaylight SDN Controller - Introduction
 
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
 
Kvm and libvirt
Kvm and libvirtKvm and libvirt
Kvm and libvirt
 
[OpenStack] 공개 소프트웨어 오픈스택 입문 & 파헤치기
[OpenStack] 공개 소프트웨어 오픈스택 입문 & 파헤치기[OpenStack] 공개 소프트웨어 오픈스택 입문 & 파헤치기
[OpenStack] 공개 소프트웨어 오픈스택 입문 & 파헤치기
 
[OpenInfra Days Korea 2018] (Track 2) Neutron LBaaS 어디까지 왔니? - Octavia 소개
[OpenInfra Days Korea 2018] (Track 2) Neutron LBaaS 어디까지 왔니? - Octavia 소개[OpenInfra Days Korea 2018] (Track 2) Neutron LBaaS 어디까지 왔니? - Octavia 소개
[OpenInfra Days Korea 2018] (Track 2) Neutron LBaaS 어디까지 왔니? - Octavia 소개
 
Openstack heat & How Autoscaling works
Openstack heat & How Autoscaling worksOpenstack heat & How Autoscaling works
Openstack heat & How Autoscaling works
 
Red Hat Ansible 적용 사례
Red Hat Ansible 적용 사례Red Hat Ansible 적용 사례
Red Hat Ansible 적용 사례
 
KVM High Availability Regardless of Storage - Gabriel Brascher, VP of Apache ...
KVM High Availability Regardless of Storage - Gabriel Brascher, VP of Apache ...KVM High Availability Regardless of Storage - Gabriel Brascher, VP of Apache ...
KVM High Availability Regardless of Storage - Gabriel Brascher, VP of Apache ...
 
Building Multi-Site and Multi-OpenStack Cloud with OpenStack Cascading
Building Multi-Site and Multi-OpenStack Cloud with OpenStack CascadingBuilding Multi-Site and Multi-OpenStack Cloud with OpenStack Cascading
Building Multi-Site and Multi-OpenStack Cloud with OpenStack Cascading
 
[OpenStack Days Korea 2016] Track3 - 오픈스택 환경에서 공유 파일 시스템 구현하기: 마닐라(Manila) 프로젝트
[OpenStack Days Korea 2016] Track3 - 오픈스택 환경에서 공유 파일 시스템 구현하기: 마닐라(Manila) 프로젝트[OpenStack Days Korea 2016] Track3 - 오픈스택 환경에서 공유 파일 시스템 구현하기: 마닐라(Manila) 프로젝트
[OpenStack Days Korea 2016] Track3 - 오픈스택 환경에서 공유 파일 시스템 구현하기: 마닐라(Manila) 프로젝트
 
Nova: Openstack Compute-as-a-service
Nova: Openstack Compute-as-a-serviceNova: Openstack Compute-as-a-service
Nova: Openstack Compute-as-a-service
 
Ansible presentation
Ansible presentationAnsible presentation
Ansible presentation
 
Solution Live-migrate vm in Openstack with Cpu Pinning configuratin
Solution Live-migrate vm in Openstack with Cpu Pinning configuratinSolution Live-migrate vm in Openstack with Cpu Pinning configuratin
Solution Live-migrate vm in Openstack with Cpu Pinning configuratin
 
Openstack ansible
Openstack ansibleOpenstack ansible
Openstack ansible
 

Similar to Autoscale a self-healing cluster in OpenStack with Heat

LinuxCon 2013 Steven Dake on Using Heat for autoscaling OpenShift on Openstack
LinuxCon 2013 Steven Dake on Using Heat for autoscaling OpenShift on OpenstackLinuxCon 2013 Steven Dake on Using Heat for autoscaling OpenShift on Openstack
LinuxCon 2013 Steven Dake on Using Heat for autoscaling OpenShift on Openstack
OpenShift Origin
 

Similar to Autoscale a self-healing cluster in OpenStack with Heat (20)

LinuxCon 2013 Steven Dake on Using Heat for autoscaling OpenShift on Openstack
LinuxCon 2013 Steven Dake on Using Heat for autoscaling OpenShift on OpenstackLinuxCon 2013 Steven Dake on Using Heat for autoscaling OpenShift on Openstack
LinuxCon 2013 Steven Dake on Using Heat for autoscaling OpenShift on Openstack
 
Puppet Performance Profiling
Puppet Performance ProfilingPuppet Performance Profiling
Puppet Performance Profiling
 
Oracle Database Backup Cloud Service
Oracle Database Backup Cloud ServiceOracle Database Backup Cloud Service
Oracle Database Backup Cloud Service
 
Remote Config REST API and Versioning
Remote Config REST API and VersioningRemote Config REST API and Versioning
Remote Config REST API and Versioning
 
Pyramid Deployment and Maintenance
Pyramid Deployment and MaintenancePyramid Deployment and Maintenance
Pyramid Deployment and Maintenance
 
Advanced Ops Manager Topics
Advanced Ops Manager TopicsAdvanced Ops Manager Topics
Advanced Ops Manager Topics
 
How to install and configure LEMP stack
How to install and configure LEMP stackHow to install and configure LEMP stack
How to install and configure LEMP stack
 
6 tips for improving ruby performance
6 tips for improving ruby performance6 tips for improving ruby performance
6 tips for improving ruby performance
 
Python from zero to hero (Twitter Explorer)
Python from zero to hero (Twitter Explorer)Python from zero to hero (Twitter Explorer)
Python from zero to hero (Twitter Explorer)
 
Stored procedure
Stored procedureStored procedure
Stored procedure
 
Regain Control Thanks To Prometheus
Regain Control Thanks To PrometheusRegain Control Thanks To Prometheus
Regain Control Thanks To Prometheus
 
Monitoring und Metriken im Wunderland
Monitoring und Metriken im WunderlandMonitoring und Metriken im Wunderland
Monitoring und Metriken im Wunderland
 
Pyramid deployment
Pyramid deploymentPyramid deployment
Pyramid deployment
 
Build resource server &amp; client for OCF Cloud (2018.8.30)
Build resource server &amp; client for OCF Cloud (2018.8.30)Build resource server &amp; client for OCF Cloud (2018.8.30)
Build resource server &amp; client for OCF Cloud (2018.8.30)
 
Performance Tuning Your Puppet Infrastructure - PuppetConf 2014
Performance Tuning Your Puppet Infrastructure - PuppetConf 2014Performance Tuning Your Puppet Infrastructure - PuppetConf 2014
Performance Tuning Your Puppet Infrastructure - PuppetConf 2014
 
EuroPython 2013 - Python3 TurboGears Training
EuroPython 2013 - Python3 TurboGears TrainingEuroPython 2013 - Python3 TurboGears Training
EuroPython 2013 - Python3 TurboGears Training
 
2015-06-25 Red Hat Summit 2015 - Security Compliance Made Easy
2015-06-25 Red Hat Summit 2015 - Security Compliance Made Easy2015-06-25 Red Hat Summit 2015 - Security Compliance Made Easy
2015-06-25 Red Hat Summit 2015 - Security Compliance Made Easy
 
So I Wrote a Manifest
So I Wrote a ManifestSo I Wrote a Manifest
So I Wrote a Manifest
 
Node.js API 서버 성능 개선기
Node.js API 서버 성능 개선기Node.js API 서버 성능 개선기
Node.js API 서버 성능 개선기
 
AMS Node Meetup December presentation Phusion Passenger
AMS Node Meetup December presentation Phusion PassengerAMS Node Meetup December presentation Phusion Passenger
AMS Node Meetup December presentation Phusion Passenger
 

More from Rico Lin

More from Rico Lin (12)

Improvements in OpenStack Integration for Application Developers
Improvements in OpenStack Integration for Application DevelopersImprovements in OpenStack Integration for Application Developers
Improvements in OpenStack Integration for Application Developers
 
Heat onboarding - Berlin OpenStack summit
Heat onboarding - Berlin OpenStack summitHeat onboarding - Berlin OpenStack summit
Heat onboarding - Berlin OpenStack summit
 
Embrace Community! Embrace a better life!
Embrace Community! Embrace a better life!Embrace Community! Embrace a better life!
Embrace Community! Embrace a better life!
 
Take the advantage and connect upstream to downstream
Take the advantage and connect upstream to downstreamTake the advantage and connect upstream to downstream
Take the advantage and connect upstream to downstream
 
Take the advantage and connect upstream to downstream
Take the advantage and connect upstream to downstreamTake the advantage and connect upstream to downstream
Take the advantage and connect upstream to downstream
 
OpenInfra Summit - 2018 Vancouver - Heat project update
OpenInfra Summit - 2018 Vancouver - Heat project updateOpenInfra Summit - 2018 Vancouver - Heat project update
OpenInfra Summit - 2018 Vancouver - Heat project update
 
OpenInfra Summit - 2018 Vancouver - Heat Onboarding
OpenInfra Summit - 2018 Vancouver - Heat OnboardingOpenInfra Summit - 2018 Vancouver - Heat Onboarding
OpenInfra Summit - 2018 Vancouver - Heat Onboarding
 
OpenStack - heat on boarding
OpenStack - heat on boardingOpenStack - heat on boarding
OpenStack - heat on boarding
 
Project update - heat (up to pike-1)
Project update - heat (up to pike-1)Project update - heat (up to pike-1)
Project update - heat (up to pike-1)
 
Heat project onboarding
Heat  project onboardingHeat  project onboarding
Heat project onboarding
 
Heat optimization
Heat optimizationHeat optimization
Heat optimization
 
Heat up your stack
Heat up your stackHeat up your stack
Heat up your stack
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Autoscale a self-healing cluster in OpenStack with Heat

  • 1. AUTO-SCALE A SELF-HEALING CLUSTER IN OPENSTACK 2018 Việt Nam OpenInfraDay Rico Lin, irc: ricolin <rico.lin@easystack.cn> @ EasyStack Xin chào các bạn, Mình tên là Rico Lin, đến từ Đài Loan, lần đầu tiên sang Việt Nam, cảm thấy rất thích và vui. Hôm nay Mình sẽ chia sẽ cho các bạn, chủ đề là AUTO-SCALE A SELF-HEALING CLUSTER IN OPENSTACK October 2018
  • 4. A Unit in Application cluster Pool Network Subnet Loadbalancer Floating IP Heal monitor Pool Member Nova Nginx
  • 5. Unit with Heat Software Deploy Nova Server What you can install with ● heat-config-ansible ● heat-config-apply-config ● heat-config-cfn-init ● heat-config-chef ● heat-config-docker-cmd ● heat-config-docker-compose ● heat-config-hiera ● heat-config-json-file ● heat-config-kubelet ● heat-config-puppet ● heat-config-salt ● heat-config-script And you can customize your own hook os-collect-config os-refresh-config os-apply-config kubelet-hook$ kubelet Webserver done config-notify Signal ● CCFN_SIGNAL ● TEMP_URL_SIGNAL ● NO_SIGNAL ● HEAT_SIGNAL ● ZAQAR_SIGNAL Software Config Pool Network Subnet Loadbalancer Floating IP Heal monitor Pool Member Nginx
  • 7. Heat container agents [sample in repo] Software Deploy Nova Server What you can install with ● heat-config-ansible ● heat-config-apply-config ● heat-config-cfn-init ● heat-config-chef ● heat-config-docker-cmd ● heat-config-docker-compose ● heat-config-hiera ● heat-config-json-file ● heat-config-kubelet ● heat-config-puppet ● heat-config-salt ● heat-config-script And you can customize your own hook os-collect-config os-refresh-config os-apply-config kubelet-hook$ kubelet Webserver done config-notify Signal ● CCFN_SIGNAL ● TEMP_URL_SIGNAL ● NO_SIGNAL ● HEAT_SIGNAL ● ZAQAR_SIGNAL Dockers Software Config Pool Network Subnet Loadbalancer Floating IP Heal monitor Pool Member
  • 8. Heat container agents [sample in repo] config: type: OS::Heat::SoftwareConfig properties: group: script outputs: - name: result config: { get_file: example-script.sh } deployment: type: OS::Heat::SoftwareDeployment properties: config: { get_resource: config } server: { get_resource: server } start_container_agent: type: OS::Heat::SoftwareConfig properties: group: ungrouped config: {get_file: ./start-container-agent.sh} server: type: OS::Nova::Server properties: image: {get_param: image} flavor: {get_param: flavor} key_name: {get_param: key_name} networks: - network: {get_param: private_net} security_groups: - {get_resource: the_sg} user_data_format: SOFTWARE_CONFIG user_data: {get_attr: [start_container_agent, config]} #!/bin/bash set -ux # heat-docker-agent service cat <<EOF > /etc/systemd/system/heat-container-agent.service [Unit] Description=Heat Container Agent After=docker.service Requires=docker.service [Service] TimeoutSec=5min RestartSec=5min User=root Restart=on-failure ExecStartPre=-/usr/bin/docker rm -f heat-container-agent ExecStartPre=-/usr/bin/docker pull docker.io/rico/heat-container-agent ExecStart=/usr/bin/docker run --name heat-container-agent --privileged --net=host -v /run/systemd:/run/systemd -v /etc/sysconfig:/etc/sysconfig -v /etc/systemd/system:/etc/systemd/system -v /var/lib/heat-cfntools:/var/lib/heat-cfntools -v /var/lib/cloud:/var/lib/cloud -v /tmp:/tmp -v /etc/hosts:/etc/hosts docker.io/rico/heat-container-agent ExecStop=/usr/bin/docker stop heat-container-agent [Install] WantedBy=multi-user.target EOF # enable and start heat-container-agent chmod 0640 /etc/systemd/system/heat-container-agent.service /usr/bin/systemctl enable heat-container-agent.service /usr/bin/systemctl start --no-block heat-container-agent.service
  • 12. Self Healing XXX::Server XXX::Signal XXX::Alarm XXX::Workflow Signal Meter Trigger XXX::AutoScaling How you metering? How you handle signal? How you trigger a fix job What's meter to you? Fix
  • 13. Self Healing server: type: OS::Nova::Server properties: ... alarm_queue: type: OS::Zaqar::Queue error_event_alarm: type: OS::Aodh::EventAlarm properties: event_type: compute.instance.update query: - field: traits.instance_id value: {get_resource: server} op: eq - field: traits.state value: error op: eq alarm_queues: - {get_resource: alarm_queue} alarm_subscription: type: OS::Zaqar::MistralTrigger properties: queue_name: {get_resource: alarm_queue} workflow_id: {get_resource: autoheal} input: stack_id: {get_param: "OS::stack_id"} root_stack_id: if: - is_standalone - {get_param: "OS::stack_id"} - {get_param: "root_stack_id"} autoheal: type: OS::Mistral::Workflow properties: description: > Mark a server as unhealthy and commence a stack update to replace it. input: stack_id: root_stack_id: type: direct tasks: - name: resources_mark_unhealthy action: list_join: - ' ' - - heat.resources_mark_unhealthy - stack_id=<% $.stack_id %> - resource_name=<% env().notification.body.reason_data.event.traits.where($[0] = 'instance_id').select($[2]).first() %> - mark_unhealthy=true - resource_status_reason='Marked by alarm' on_success: - stacks_update - name: stacks_update action: heat.stacks_update stack_id=<% $.root_stack_id %> existing=true
  • 15. Demo
  • 19. Auto Scaling https://github.com/openstack/heat-templates/tree/master/hot/autoscaling.yaml resources: asg: type: OS::Heat::AutoScalingGroup properties: min_size: 1 max_size: 3 resource: type: lb_server.yaml properties: flavor: {get_param: flavor} image: {get_param: image} web_server_scaleup_policy: type: OS::Heat::ScalingPolicy properties: adjustment_type: change_in_capacity auto_scaling_group_id: {get_resource: asg} cooldown: 60 scaling_adjustment: 1 # min_adjustment_step: web_server_scaledown_policy: type: OS::Heat::ScalingPolicy properties: adjustment_type: change_in_capacity auto_scaling_group_id: {get_resource: asg} cooldown: 60 scaling_adjustment: -1 cpu_alarm_high: type: OS::Aodh::GnocchiAggregationByResourcesAlarm properties: description: Scale up if CPU > 80% metric: cpu_util aggregation_method: mean granularity: 300 evaluation_periods: 1 threshold: 80 resource_type: instance comparison_operator: gt alarm_actions: - str_replace: template: trust+url params: url: {get_attr: [web_server_scaleup_policy, signal_url]} query: list_join: - '' - - {'=': {server_group: {get_param: "OS::stack_id"}}} cpu_alarm_low: type: OS::Aodh::GnocchiAggregationByResourcesAlarm
  • 20. resources: asg: type: OS::Heat::AutoScalingGroup properties: min_size: 1 max_size: 3 resource: type: lb_server.yaml properties: flavor: {get_param: flavor} image: {get_param: image} web_server_scaleup_policy: type: OS::Heat::ScalingPolicy properties: adjustment_type: change_in_capacity auto_scaling_group_id: {get_resource: asg} cooldown: 60 scaling_adjustment: 1 monitoring: type: monitor.yaml properties: url: get_attr: [web_server_scaleup_policy, signal_url] ScalingPolicy Stack Monitor service AutoScalingGroup Instance 1 1.Metering 2 N 2.Alarm 3.Scale Choose your own structure
  • 21. resources: asg: type: OS::Heat::AutoScalingGroup properties: min_size: 1 max_size: 3 resource: type: lb_server.yaml properties: flavor: {get_param: flavor} image: {get_param: image} web_server_scaleup_policy: type: OS::Heat::ScalingPolicy properties: adjustment_type: change_in_capacity auto_scaling_group_id: {get_resource: asg} cooldown: 60 scaling_adjustment: 1 outputs: signal_url: value: {get_attr: [web_server_scaleup_policy, signal_url]} ScalingPolicy Stack Monitor service AutoScalingGroup Instance 1 1.Metering 2 N 2.Alarm 3.Scale Choose your own structure
  • 22. resources: asg: type: OS::Heat::AutoScalingGroup properties: min_size: 1 max_size: 3 resource: type: lb_server.yaml properties: flavor: {get_param: flavor} image: {get_param: image} web_server_scaleup_policy: type: OS::Heat::ScalingPolicy properties: adjustment_type: change_in_capacity auto_scaling_group_id: {get_resource: asg} cooldown: 60 scaling_adjustment: 1 outputs: signal_url: value: {get_attr: [web_server_scaleup_policy, signal_url]} ScalingPolicy Stack Monitor service AutoScalingGroup Instance 1 1.Metering 2 N 2.Alarm 3.Scale Choose your own structure curl -i -H "X-Auth-Token: $TOKEN" -X POST $Signal_url curl -i -H "Content-Type: application/json" -d '{ "auth": { "identity": { "methods": ["password"], "password": { "user": { "name": "admin", "domain": { "id": "default" }, "password": "password" } } }, "scope": { "project": { "name": "admin", "domain": { "id": "default" } } } }}' http://$KEYSTONE/identity/v3/auth/tokens ; echo
  • 23. Look into options for auto-scaling OS::Heat::AutoScalingGroup ● Properties ○ resource: ■ type: web_server.yaml ■ properties ○ min_size: 10 ○ max_size: 100 ○ cooldown: 30 ○ desired_capacity: 30 ○ rolling_updates ■ min_in_service: 5 ■ max_batch_size: 10 ■ pause_time: 15 ● Attributes ○ outputs ○ outputs_list ○ current_size ○ refs [IDs] ○ refs_map {[names: IDs]}
  • 24. Look into options for auto-scaling OS::Heat::ScalingPolicy ● Properties ○ adjustment_type: change_in_capacity ■ exact_capacity ■ change_in_capacity ■ percent_change_in_capacity ○ auto_scaling_group_id: asg_id ○ cooldown: 60 ○ scaling_adjustment: 5 ○ # min_adjustment_step: ● Attributes ○ alarm_url ○ signal_url
  • 25. Demo
  • 26. • Review https://goo.gl/4KL1gN • StoryBoard (Bugs/BP) https://storyboard.openstack.org/#!/project_group/82 • StoryBoard guide https://etherpad.openstack.org/p/Heat-StoryBoard-Migration-Info • Documents https://docs.openstack.org/heat/latest/ • Release Notes https://docs.openstack.org/releasenotes/heat/ • Feedback or Provide ideas = irc: #heat • Feedback your Use cases https://etherpad.openstack.org/p/heat-usecases • Team meeting time Wednesday 14:00 UTC #heat (meeting wiki and archive) Join Heat ➔ Boston Summit ◆ Heat project update [ slide & video ] ◆ Heat Onboarding [ slide & video ] ➔ Sydney Summit ◆ Heat project update [ slide & video ] ◆ Heat Onboarding [ slide & video ] ➔ Vancouver Summit ◆ Heat project update [ slide & video ] ◆ Heat Onboarding [ slide & video ] ➔ Heat templates ➔ PTG Etherpad
  • 27. Q & A Links: demo video If you wondering what your product or you can interact with Open Source Cloud community: Embrace community! Embrace Life!