SlideShare a Scribd company logo
1 of 37
OpenStack HA -
Theory to Reality
GERD PRÜßMANN SHAMAIL TAHIR
SRIRAM SUBRAMANIAN KALIN NIKOLOV
Gerd Prüßmann Shamail Tahir
Cloud Architect Cloud Architect
Deutsche Telekom AG EMC Office of the CTO
Sriram Subramanian Kalin Nikolov
Founder & Cloud Specialist Cloud Engineer
CloudDon PayPal
@2digitsleft @ShamailXD
@sriramhere
Agenda
OpenStack HA - Introduction
Active/ Active
Active/ Passive
DT Implementation
eBay/PayPal Implementation
Summary
OpenStack HA - Introduction
What does it mean?
Why is it not by default?
Stateless vs Stateful
Challenges
More than one way
Active/ Passive
Active/ Active
Is This?
Or This?
Active/ Active
API Service Endpoints
Database
Networking
Active/ Active
● OS High Availability (HA) concept depends on components used for
i.e. network virtualization, storage backend, database system etc.
● Various technologies available to realize HA:
Vendors use combinations: i.e. Pacemaker, Corosync, Galera, Keepalived,
HAProxy, VRRP, DRBD … or their own tools
The following description is derived from the generic proposal from the
OpenStack HA guide:
http://docs.openstack.org/high-availability-guide/content/index.html
Active/ Active
● Target: Try to have all services of the platform highly available
Redundancy and resiliency against single service / node failure
● stateless services are load balanced (HAproxy + keepalived)
o i.e. API endpoints / nova-scheduler
● stateful services use individual HA technologies
o i.e. RabbitMQ, MySQL DB etc.
o might be load balanced as well
● some services/agents where no built in HA feature is available
Active/ Active - API service endpoints
API endpoints
● deploy on multiple nodes
● configure load balancing with virtual IPs in HAproxy
● use HAproxy’s VIPs to configure respective identity endpoints
● all service configuration files refer to these VIPs only
schedulers
● nova-scheduler, nova-conductor, cinder-scheduler, neutron-server,
ceilometer-collector, heat-engine
● schedulers will be configured with clustered RabbitMQ nodes
Active/ Active - Databases
● MySQL or MariaDB with Galera cluster
(wsrep) library extension
o transaction commit level replication
● synchronous multiple master nodes setup
o min. 3 nodes to get quorum in
case of network partition
● Write and read to any node
● other databases options possible:
Percona XtraDB, PostgreSQL etc.
Active/ Active - RabbitMQ
● RabbitMQ nodes clustered
● mirrored queues configured via policy (i.e. ha-mode all)
● all services use the RabbitMQ nodes
Active/ Active - Networking
Network
● deploy multiple network nodes
● Neutron DHCP agent – configure multiple DHCP agents
(dhcp_agents_per_network)
● Neutron L3 agent
o Automatic L3 agent HA (allow_automatic_l3agent_failover)
o VRRP (l3_ha, max_l3_agents_per_router, min_l3_agents_per_router)
● Neutron L2 agent - no HA available
● Neutron metadata agent – no HA availailable
● Neutron LBaaS agent – no HA available
● no HA feature available: active/passive pacemaker / corosync solution
Active/ Active - Example
Deployment example
Active/ Passive
General
Tools Overview
Controllers Overview
Active/ Passive: General
● Components should leverage a Virtual IP
● The primary tools used for Active/Passive
OpenStack configurations are general (non-
OpenStack specific): Pacemaker +
Corosync, and DRBD
Corosync
● Messaging Layer used by Cluster
● Responsibilities include cluster membership and
messaging
● Leverages RRP (Redundant Ring Protocol)
o Rings can be set up as A/A or A/P
o UDP Only
o mcastport specifies rcv port; mcastport minus 1 is
send port
Pacemaker
● Cluster Resource Manager
● Cluster Information Base (CIB)
o Represents current state of resources
and cluster configuration (XML)
● Cluster Resource Management Daemon
(CRMd)
o Acts as decision maker (one master)
● Policy Engine (PEngine)
o Send instructions to LRMd and CRMd
● STONITHd
o Fencing mechanism
CRMd
STONITHd CIB
PEngine
LRMd
DRBD
● Distributed Replicated Block Device
● Creates logical block devices (e.g. /dev/drbdX) that
having backing volumes
● Reads serviced locally
● Primary node writes are sent to secondary node
Host1
Active/Passive: Database
MySQL
Host2
MySQL
DRBD DRBD
Pacemaker Pacemaker
Corosync Corosync
● Use DRBD to back MySQL
● Leverage VIP that can float
between hosts
● Manage all resources (including
MySQL Daemon) with Pacemaker
● MySQL/Galera is an alternative
but current version of HA Guide
does not recommend it
Host1
Active/Passive: RabbitMQ
RabbitMQ
Host2
RabbitMQ
DRBD DRBD
Pacemaker Pacemaker
Corosync Corosync
● Use DRBD to back RabbitMQ
● Leverage VIP that can float
between hosts
● Ensure erlang.cookie are identical
on all nodes
o Enables ability to
communicate with each other
● RabbitMQ clustering does not
tolerate network partitions well
Active/Passive: Overview (From Guide)
● Leverage DB, RabbitMQ VIP in configuration files
● Configure Pacemaker Resources for OpenStack Services
o Image API
o Identity
o Block Storage API
o Telemetry Central Agent
o Networking
o L3-Agent
o DHCP
DT Implementation - Overview
● Business Market Place (BMP)
● SaaS offering
● https://portal.telekomcloud.com/
● SaaS Applications from Software Partners
(ISVs) and DT offered to SME customers
● Platform based on Open Source technologies only
(OpenStack, CEPH, Linux)
● Project started in 2012 with OS Essex, CEPH
● In production since 3/13
DT Implementation
DTAG scale out project (ongoing)
Target: Migrate production to a new DC and scale out
Requirements:
● scale out compute by 30%, storage by 40%
● eliminate all SPOFs
● Setup in two fire protection areas / physically separated DC rooms
DT Implementation
● single region HA OS instance
● all services distributed over two DC rooms
o Compute and Storage distributed equally
o All OpenStack services HA (as far as possible)
 OSS (DNS, NTP, puppet master, Mirror etc., redundant perimeter
firewall)
● Instance distribution: 4 Availability Zones, multiple host aggregates and
scheduler filters
DT Implementation
● Load Balancing
o HAproxy for MySQL, services, RabbitMQ, APIs (nginx under test)
● MySQL
o Galera Multi Master Node replication (3 nodes)
● RabbitMQ
o 2 nodes cluster / mirrored queues
● Neutron
o DHCP multiple agents started; Pacemaker/Corosync
● API Endpoints
o Loadbalancing with round robin distribution
● Storage
o 2 shared, distributed CEPH clusters (RBD/S3)
DT Implementation
Tests/Experiences so far
● Load balancing works well
● Database: OpenStack multi-node write issues
o 1 node write / 2 nodes backup: diminishes Galera HA efficiency (monitoring)
● Specific issues with deployment in 2 DC rooms / uneven distribution of services (Galera)
o if the “wrong” room fails
 Galera: quorum requires majority!
room with 2 nodes goes down → 3rd node will deactivate itself → DB outage
 Storage specific:
 CEPH may lose 2/3 of the replicas → heavy replication load on CEPH cluster
 danger of losing data (OSD/disk failure) → raise replica level / adapt crush map
 Network: recovering from a neutron / L3 failure: <15 minutes to recover
o pet applications vulnerable – may suffer from hick-ups at disasters anyway
● DHCP agent failures
DT Implementation
Plans for the future
● use DVR / VRRP in the future
o make network more resilient and elastic
● a third DC room would be desirable :-)
o CEPH replicas / MONs, MySQL Galera
eBay/PayPal Implementation
The scope of Ebay/PayPal OpenStack Clouds
● 100% of PayPal web/mid tier
● Most of Dev/QA
● Number of HVs: 8,500
● Number of Virtual Machines: 70,000
● Number of users: Several thousands
● Availability zones: 10
eBay/PayPal Implementation
● Database
MySQL MMM replication, VIP with FailoverPersistence / Galera
● RabbitMQ
VIP with SingleNode FailoverPersistence or 3 nodes with mirrored queues
● NeutronDHCP / LBaaS
Corosync/Pacemaker
● API Endpoints
LB VIPs for every service with either RR or least connection
● Storage
Shared storage with nfs/iscsi
eBay/PayPal Implementation
Successful HA Implementations
● LoadBalanced HA - VIPs for every service
● LB Single Node Failover Persistence Profile
● Galera/Percona for Identity Service
● Global Identity Service using GLB
eBay/PayPal Implementation
HA Failures
● Corosync/Pacemaker
NeutronDHCP and LBaaS - missing advanced health checks
● RabbitMQ
Single Node Failover Persistence
● MySQL Replication
Single Node Failover Persistence sometimes doesn't work well
Implemented external monitoring and disabling of the failed member.
● VIPs without ECV health checks
eBay/PayPal Implementation
Future direction
● HA on Global or Regional Services
One leg in each Availability Zone
(Keystone, LBaaS, Swift)
● RabbitMQ with 3 node/mirrored queues
LB VIP with least connections
● No shared NFS for Glance
eBay/PayPal Global Identity Service
eBay/PayPal Implementation
Lessons Learned
● Try not to overcomplicate
● Simulate Failures
Before placing in production make sure HA works
● Place your services in different Availability zones
or at least different FaultZones
● Always make backups
No matter how robust your HA solution is
● OpenStack HA Guide Update Efforts
● WTE Work Group (now known as ‘Enterprise’)
● Share Best Practices
Call to Action
Reference
OpenStack HA guide:
http://docs.openstack.org/high-availability-guide/content/index.html
Percona Resources
https://www.percona.com/resources/mysql-webinars/high-availability-using-
mysql-cloud-today-tomorrow-and-keys-your-success
HA Proxy Documentation:
http://www.haproxy.org/

More Related Content

What's hot

OpenStack HA
OpenStack HAOpenStack HA
OpenStack HAtcp cloud
 
OpenStack High Availability
OpenStack High AvailabilityOpenStack High Availability
OpenStack High AvailabilityJakub Pavlik
 
High Availability in OpenStack Cloud
High Availability in OpenStack CloudHigh Availability in OpenStack Cloud
High Availability in OpenStack CloudQiming Teng
 
OpenStack Load Balancing Use Cases and Requirements
OpenStack Load Balancing Use Cases and RequirementsOpenStack Load Balancing Use Cases and Requirements
OpenStack Load Balancing Use Cases and RequirementsJohn Gruber
 
Neutron high availability open stack architecture openstack israel event 2015
Neutron high availability  open stack architecture   openstack israel event 2015Neutron high availability  open stack architecture   openstack israel event 2015
Neutron high availability open stack architecture openstack israel event 2015Arthur Berezin
 
Chef cookbooks for OpenStack HA
Chef cookbooks for OpenStack HAChef cookbooks for OpenStack HA
Chef cookbooks for OpenStack HAAdam Spiers
 
Openstack Basic with Neutron
Openstack Basic with NeutronOpenstack Basic with Neutron
Openstack Basic with NeutronKwonSun Bae
 
Orchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Orchestration Tool Roundup - Arthur Berezin & Trammell ScruggsOrchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Orchestration Tool Roundup - Arthur Berezin & Trammell ScruggsCloud Native Day Tel Aviv
 
Introduction to MidoNet
Introduction to MidoNetIntroduction to MidoNet
Introduction to MidoNetTaku Fukushima
 
Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...
Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...
Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...Cloud Native Day Tel Aviv
 
OpenStack Neutron's Distributed Virtual Router
OpenStack Neutron's Distributed Virtual RouterOpenStack Neutron's Distributed Virtual Router
OpenStack Neutron's Distributed Virtual Routercarlbaldwin
 
Container Orchestration
Container OrchestrationContainer Orchestration
Container Orchestrationdfilppi
 
Inside Architecture of Neutron
Inside Architecture of NeutronInside Architecture of Neutron
Inside Architecture of Neutronmarkmcclain
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthNicolas Brousse
 

What's hot (20)

OpenStack HA
OpenStack HAOpenStack HA
OpenStack HA
 
OpenStack High Availability
OpenStack High AvailabilityOpenStack High Availability
OpenStack High Availability
 
High Availability in OpenStack Cloud
High Availability in OpenStack CloudHigh Availability in OpenStack Cloud
High Availability in OpenStack Cloud
 
OpenStack Load Balancing Use Cases and Requirements
OpenStack Load Balancing Use Cases and RequirementsOpenStack Load Balancing Use Cases and Requirements
OpenStack Load Balancing Use Cases and Requirements
 
Neutron high availability open stack architecture openstack israel event 2015
Neutron high availability  open stack architecture   openstack israel event 2015Neutron high availability  open stack architecture   openstack israel event 2015
Neutron high availability open stack architecture openstack israel event 2015
 
Chef cookbooks for OpenStack HA
Chef cookbooks for OpenStack HAChef cookbooks for OpenStack HA
Chef cookbooks for OpenStack HA
 
Openstack Basic with Neutron
Openstack Basic with NeutronOpenstack Basic with Neutron
Openstack Basic with Neutron
 
MidoNet deep dive
MidoNet deep diveMidoNet deep dive
MidoNet deep dive
 
High Availability in Neutron
High Availability in NeutronHigh Availability in Neutron
High Availability in Neutron
 
Topologies of OpenStack
Topologies of OpenStackTopologies of OpenStack
Topologies of OpenStack
 
Orchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Orchestration Tool Roundup - Arthur Berezin & Trammell ScruggsOrchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Orchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
 
Introduction to MidoNet
Introduction to MidoNetIntroduction to MidoNet
Introduction to MidoNet
 
Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...
Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...
Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...
 
OpenStack Neutron's Distributed Virtual Router
OpenStack Neutron's Distributed Virtual RouterOpenStack Neutron's Distributed Virtual Router
OpenStack Neutron's Distributed Virtual Router
 
L2 and L3 agent restructure
L2 and L3 agent restructureL2 and L3 agent restructure
L2 and L3 agent restructure
 
Container Orchestration
Container OrchestrationContainer Orchestration
Container Orchestration
 
Inside Architecture of Neutron
Inside Architecture of NeutronInside Architecture of Neutron
Inside Architecture of Neutron
 
Neutron DVR
Neutron DVRNeutron DVR
Neutron DVR
 
Deploying IPv6 on OpenStack
Deploying IPv6 on OpenStackDeploying IPv6 on OpenStack
Deploying IPv6 on OpenStack
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
 

Similar to Open stack HA - Theory to Reality

Running OpenStack in Production - Barcamp Saigon 2016
Running OpenStack in Production - Barcamp Saigon 2016Running OpenStack in Production - Barcamp Saigon 2016
Running OpenStack in Production - Barcamp Saigon 2016Thang Man
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 
Openstack overview thomas-goirand
Openstack overview thomas-goirandOpenstack overview thomas-goirand
Openstack overview thomas-goirandOpenCity Community
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and MetricsRicardo Lourenço
 
Ceph Day Santa Clara: The Future of CephFS + Developing with Librados
Ceph Day Santa Clara: The Future of CephFS + Developing with LibradosCeph Day Santa Clara: The Future of CephFS + Developing with Librados
Ceph Day Santa Clara: The Future of CephFS + Developing with LibradosCeph Community
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Dave Holland
 
Building AuroraObjects- Ceph Day Frankfurt
Building AuroraObjects- Ceph Day Frankfurt Building AuroraObjects- Ceph Day Frankfurt
Building AuroraObjects- Ceph Day Frankfurt Ceph Community
 
Rook - cloud-native storage
Rook - cloud-native storageRook - cloud-native storage
Rook - cloud-native storageKarol Chrapek
 
Openstack HA
Openstack HAOpenstack HA
Openstack HAYong Luo
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introductionkanedafromparis
 
5 levels of high availability from multi instance to hybrid cloud
5 levels of high availability  from multi instance to hybrid cloud5 levels of high availability  from multi instance to hybrid cloud
5 levels of high availability from multi instance to hybrid cloudRafał Leszko
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
 
Comparison between zookeeper, etcd 3 and other distributed coordination systems
Comparison between zookeeper, etcd 3 and other distributed coordination systemsComparison between zookeeper, etcd 3 and other distributed coordination systems
Comparison between zookeeper, etcd 3 and other distributed coordination systemsImesha Sudasingha
 
5 Levels of High Availability: From Multi-instance to Hybrid Cloud
5 Levels of High Availability: From Multi-instance to Hybrid Cloud5 Levels of High Availability: From Multi-instance to Hybrid Cloud
5 Levels of High Availability: From Multi-instance to Hybrid CloudRafał Leszko
 
MySQL High Availability Solutions
MySQL High Availability SolutionsMySQL High Availability Solutions
MySQL High Availability SolutionsLenz Grimmer
 
Mysqlhacodebits20091203 1260184765-phpapp02
Mysqlhacodebits20091203 1260184765-phpapp02Mysqlhacodebits20091203 1260184765-phpapp02
Mysqlhacodebits20091203 1260184765-phpapp02Louis liu
 
MySQL High Availability Solutions
MySQL High Availability SolutionsMySQL High Availability Solutions
MySQL High Availability SolutionsLenz Grimmer
 
Ceph Day New York 2014: Future of CephFS
Ceph Day New York 2014:  Future of CephFS Ceph Day New York 2014:  Future of CephFS
Ceph Day New York 2014: Future of CephFS Ceph Community
 

Similar to Open stack HA - Theory to Reality (20)

Running OpenStack in Production - Barcamp Saigon 2016
Running OpenStack in Production - Barcamp Saigon 2016Running OpenStack in Production - Barcamp Saigon 2016
Running OpenStack in Production - Barcamp Saigon 2016
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Openstack overview thomas-goirand
Openstack overview thomas-goirandOpenstack overview thomas-goirand
Openstack overview thomas-goirand
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and Metrics
 
Ceph Day Santa Clara: The Future of CephFS + Developing with Librados
Ceph Day Santa Clara: The Future of CephFS + Developing with LibradosCeph Day Santa Clara: The Future of CephFS + Developing with Librados
Ceph Day Santa Clara: The Future of CephFS + Developing with Librados
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
 
Building AuroraObjects- Ceph Day Frankfurt
Building AuroraObjects- Ceph Day Frankfurt Building AuroraObjects- Ceph Day Frankfurt
Building AuroraObjects- Ceph Day Frankfurt
 
Rook - cloud-native storage
Rook - cloud-native storageRook - cloud-native storage
Rook - cloud-native storage
 
Openstack HA
Openstack HAOpenstack HA
Openstack HA
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introduction
 
5 levels of high availability from multi instance to hybrid cloud
5 levels of high availability  from multi instance to hybrid cloud5 levels of high availability  from multi instance to hybrid cloud
5 levels of high availability from multi instance to hybrid cloud
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Comparison between zookeeper, etcd 3 and other distributed coordination systems
Comparison between zookeeper, etcd 3 and other distributed coordination systemsComparison between zookeeper, etcd 3 and other distributed coordination systems
Comparison between zookeeper, etcd 3 and other distributed coordination systems
 
5 Levels of High Availability: From Multi-instance to Hybrid Cloud
5 Levels of High Availability: From Multi-instance to Hybrid Cloud5 Levels of High Availability: From Multi-instance to Hybrid Cloud
5 Levels of High Availability: From Multi-instance to Hybrid Cloud
 
Introduction to Galera Cluster
Introduction to Galera ClusterIntroduction to Galera Cluster
Introduction to Galera Cluster
 
MySQL High Availability Solutions
MySQL High Availability SolutionsMySQL High Availability Solutions
MySQL High Availability Solutions
 
Mysqlhacodebits20091203 1260184765-phpapp02
Mysqlhacodebits20091203 1260184765-phpapp02Mysqlhacodebits20091203 1260184765-phpapp02
Mysqlhacodebits20091203 1260184765-phpapp02
 
MySQL High Availability Solutions
MySQL High Availability SolutionsMySQL High Availability Solutions
MySQL High Availability Solutions
 
Ceph Day New York 2014: Future of CephFS
Ceph Day New York 2014:  Future of CephFS Ceph Day New York 2014:  Future of CephFS
Ceph Day New York 2014: Future of CephFS
 

More from Sriram Subramanian

Distros are Dead, The Future of OpenStack, Jesse Proudman, Blue Box
Distros are Dead, The Future of OpenStack, Jesse Proudman, Blue BoxDistros are Dead, The Future of OpenStack, Jesse Proudman, Blue Box
Distros are Dead, The Future of OpenStack, Jesse Proudman, Blue BoxSriram Subramanian
 
Open stack + Containers + Hyper-V
Open stack + Containers + Hyper-VOpen stack + Containers + Hyper-V
Open stack + Containers + Hyper-VSriram Subramanian
 
OpenStack Telco Cloud Challenges, David Fick, Oracle
OpenStack Telco Cloud Challenges, David Fick, OracleOpenStack Telco Cloud Challenges, David Fick, Oracle
OpenStack Telco Cloud Challenges, David Fick, OracleSriram Subramanian
 
Telco open stack use cases james thorne
Telco open stack use cases   james thorneTelco open stack use cases   james thorne
Telco open stack use cases james thorneSriram Subramanian
 
Telecom Clouds crossing borders, Chet Golding, Zefflin Systems
Telecom Clouds crossing borders, Chet Golding, Zefflin SystemsTelecom Clouds crossing borders, Chet Golding, Zefflin Systems
Telecom Clouds crossing borders, Chet Golding, Zefflin SystemsSriram Subramanian
 
Enterprise Ready OpenStack, Wiekus Beukes, Oracle
Enterprise Ready OpenStack,  Wiekus Beukes, OracleEnterprise Ready OpenStack,  Wiekus Beukes, Oracle
Enterprise Ready OpenStack, Wiekus Beukes, OracleSriram Subramanian
 
Designing Telco Scaled OpenStack Architectures
Designing Telco Scaled OpenStack ArchitecturesDesigning Telco Scaled OpenStack Architectures
Designing Telco Scaled OpenStack ArchitecturesSriram Subramanian
 
Navigating the open stack ecosystem summit vancouver.pptx
Navigating the open stack ecosystem   summit vancouver.pptxNavigating the open stack ecosystem   summit vancouver.pptx
Navigating the open stack ecosystem summit vancouver.pptxSriram Subramanian
 

More from Sriram Subramanian (12)

Distros are Dead, The Future of OpenStack, Jesse Proudman, Blue Box
Distros are Dead, The Future of OpenStack, Jesse Proudman, Blue BoxDistros are Dead, The Future of OpenStack, Jesse Proudman, Blue Box
Distros are Dead, The Future of OpenStack, Jesse Proudman, Blue Box
 
Open stack + Containers + Hyper-V
Open stack + Containers + Hyper-VOpen stack + Containers + Hyper-V
Open stack + Containers + Hyper-V
 
OpenStack Telco Cloud Challenges, David Fick, Oracle
OpenStack Telco Cloud Challenges, David Fick, OracleOpenStack Telco Cloud Challenges, David Fick, Oracle
OpenStack Telco Cloud Challenges, David Fick, Oracle
 
Telco open stack use cases james thorne
Telco open stack use cases   james thorneTelco open stack use cases   james thorne
Telco open stack use cases james thorne
 
Kubernetes on OpenStack @eBay
Kubernetes on OpenStack @eBayKubernetes on OpenStack @eBay
Kubernetes on OpenStack @eBay
 
Telecom Clouds crossing borders, Chet Golding, Zefflin Systems
Telecom Clouds crossing borders, Chet Golding, Zefflin SystemsTelecom Clouds crossing borders, Chet Golding, Zefflin Systems
Telecom Clouds crossing borders, Chet Golding, Zefflin Systems
 
Enterprise Ready OpenStack, Wiekus Beukes, Oracle
Enterprise Ready OpenStack,  Wiekus Beukes, OracleEnterprise Ready OpenStack,  Wiekus Beukes, Oracle
Enterprise Ready OpenStack, Wiekus Beukes, Oracle
 
Designing Telco Scaled OpenStack Architectures
Designing Telco Scaled OpenStack ArchitecturesDesigning Telco Scaled OpenStack Architectures
Designing Telco Scaled OpenStack Architectures
 
Navigating the open stack ecosystem summit vancouver.pptx
Navigating the open stack ecosystem   summit vancouver.pptxNavigating the open stack ecosystem   summit vancouver.pptx
Navigating the open stack ecosystem summit vancouver.pptx
 
Gorillas in the mist
Gorillas in the mistGorillas in the mist
Gorillas in the mist
 
OpenStack's 4th Anniversary!
OpenStack's 4th Anniversary!OpenStack's 4th Anniversary!
OpenStack's 4th Anniversary!
 
OpenStack in india
OpenStack in indiaOpenStack in india
OpenStack in india
 

Recently uploaded

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Open stack HA - Theory to Reality

  • 1. OpenStack HA - Theory to Reality GERD PRÜßMANN SHAMAIL TAHIR SRIRAM SUBRAMANIAN KALIN NIKOLOV
  • 2. Gerd Prüßmann Shamail Tahir Cloud Architect Cloud Architect Deutsche Telekom AG EMC Office of the CTO Sriram Subramanian Kalin Nikolov Founder & Cloud Specialist Cloud Engineer CloudDon PayPal @2digitsleft @ShamailXD @sriramhere
  • 3. Agenda OpenStack HA - Introduction Active/ Active Active/ Passive DT Implementation eBay/PayPal Implementation Summary
  • 4. OpenStack HA - Introduction What does it mean? Why is it not by default? Stateless vs Stateful Challenges More than one way Active/ Passive Active/ Active
  • 7. Active/ Active API Service Endpoints Database Networking
  • 8. Active/ Active ● OS High Availability (HA) concept depends on components used for i.e. network virtualization, storage backend, database system etc. ● Various technologies available to realize HA: Vendors use combinations: i.e. Pacemaker, Corosync, Galera, Keepalived, HAProxy, VRRP, DRBD … or their own tools The following description is derived from the generic proposal from the OpenStack HA guide: http://docs.openstack.org/high-availability-guide/content/index.html
  • 9. Active/ Active ● Target: Try to have all services of the platform highly available Redundancy and resiliency against single service / node failure ● stateless services are load balanced (HAproxy + keepalived) o i.e. API endpoints / nova-scheduler ● stateful services use individual HA technologies o i.e. RabbitMQ, MySQL DB etc. o might be load balanced as well ● some services/agents where no built in HA feature is available
  • 10. Active/ Active - API service endpoints API endpoints ● deploy on multiple nodes ● configure load balancing with virtual IPs in HAproxy ● use HAproxy’s VIPs to configure respective identity endpoints ● all service configuration files refer to these VIPs only schedulers ● nova-scheduler, nova-conductor, cinder-scheduler, neutron-server, ceilometer-collector, heat-engine ● schedulers will be configured with clustered RabbitMQ nodes
  • 11. Active/ Active - Databases ● MySQL or MariaDB with Galera cluster (wsrep) library extension o transaction commit level replication ● synchronous multiple master nodes setup o min. 3 nodes to get quorum in case of network partition ● Write and read to any node ● other databases options possible: Percona XtraDB, PostgreSQL etc.
  • 12. Active/ Active - RabbitMQ ● RabbitMQ nodes clustered ● mirrored queues configured via policy (i.e. ha-mode all) ● all services use the RabbitMQ nodes
  • 13. Active/ Active - Networking Network ● deploy multiple network nodes ● Neutron DHCP agent – configure multiple DHCP agents (dhcp_agents_per_network) ● Neutron L3 agent o Automatic L3 agent HA (allow_automatic_l3agent_failover) o VRRP (l3_ha, max_l3_agents_per_router, min_l3_agents_per_router) ● Neutron L2 agent - no HA available ● Neutron metadata agent – no HA availailable ● Neutron LBaaS agent – no HA available ● no HA feature available: active/passive pacemaker / corosync solution
  • 14. Active/ Active - Example Deployment example
  • 16. Active/ Passive: General ● Components should leverage a Virtual IP ● The primary tools used for Active/Passive OpenStack configurations are general (non- OpenStack specific): Pacemaker + Corosync, and DRBD
  • 17. Corosync ● Messaging Layer used by Cluster ● Responsibilities include cluster membership and messaging ● Leverages RRP (Redundant Ring Protocol) o Rings can be set up as A/A or A/P o UDP Only o mcastport specifies rcv port; mcastport minus 1 is send port
  • 18. Pacemaker ● Cluster Resource Manager ● Cluster Information Base (CIB) o Represents current state of resources and cluster configuration (XML) ● Cluster Resource Management Daemon (CRMd) o Acts as decision maker (one master) ● Policy Engine (PEngine) o Send instructions to LRMd and CRMd ● STONITHd o Fencing mechanism CRMd STONITHd CIB PEngine LRMd
  • 19. DRBD ● Distributed Replicated Block Device ● Creates logical block devices (e.g. /dev/drbdX) that having backing volumes ● Reads serviced locally ● Primary node writes are sent to secondary node
  • 20. Host1 Active/Passive: Database MySQL Host2 MySQL DRBD DRBD Pacemaker Pacemaker Corosync Corosync ● Use DRBD to back MySQL ● Leverage VIP that can float between hosts ● Manage all resources (including MySQL Daemon) with Pacemaker ● MySQL/Galera is an alternative but current version of HA Guide does not recommend it
  • 21. Host1 Active/Passive: RabbitMQ RabbitMQ Host2 RabbitMQ DRBD DRBD Pacemaker Pacemaker Corosync Corosync ● Use DRBD to back RabbitMQ ● Leverage VIP that can float between hosts ● Ensure erlang.cookie are identical on all nodes o Enables ability to communicate with each other ● RabbitMQ clustering does not tolerate network partitions well
  • 22. Active/Passive: Overview (From Guide) ● Leverage DB, RabbitMQ VIP in configuration files ● Configure Pacemaker Resources for OpenStack Services o Image API o Identity o Block Storage API o Telemetry Central Agent o Networking o L3-Agent o DHCP
  • 23. DT Implementation - Overview ● Business Market Place (BMP) ● SaaS offering ● https://portal.telekomcloud.com/ ● SaaS Applications from Software Partners (ISVs) and DT offered to SME customers ● Platform based on Open Source technologies only (OpenStack, CEPH, Linux) ● Project started in 2012 with OS Essex, CEPH ● In production since 3/13
  • 24. DT Implementation DTAG scale out project (ongoing) Target: Migrate production to a new DC and scale out Requirements: ● scale out compute by 30%, storage by 40% ● eliminate all SPOFs ● Setup in two fire protection areas / physically separated DC rooms
  • 25. DT Implementation ● single region HA OS instance ● all services distributed over two DC rooms o Compute and Storage distributed equally o All OpenStack services HA (as far as possible)  OSS (DNS, NTP, puppet master, Mirror etc., redundant perimeter firewall) ● Instance distribution: 4 Availability Zones, multiple host aggregates and scheduler filters
  • 26. DT Implementation ● Load Balancing o HAproxy for MySQL, services, RabbitMQ, APIs (nginx under test) ● MySQL o Galera Multi Master Node replication (3 nodes) ● RabbitMQ o 2 nodes cluster / mirrored queues ● Neutron o DHCP multiple agents started; Pacemaker/Corosync ● API Endpoints o Loadbalancing with round robin distribution ● Storage o 2 shared, distributed CEPH clusters (RBD/S3)
  • 27. DT Implementation Tests/Experiences so far ● Load balancing works well ● Database: OpenStack multi-node write issues o 1 node write / 2 nodes backup: diminishes Galera HA efficiency (monitoring) ● Specific issues with deployment in 2 DC rooms / uneven distribution of services (Galera) o if the “wrong” room fails  Galera: quorum requires majority! room with 2 nodes goes down → 3rd node will deactivate itself → DB outage  Storage specific:  CEPH may lose 2/3 of the replicas → heavy replication load on CEPH cluster  danger of losing data (OSD/disk failure) → raise replica level / adapt crush map  Network: recovering from a neutron / L3 failure: <15 minutes to recover o pet applications vulnerable – may suffer from hick-ups at disasters anyway ● DHCP agent failures
  • 28. DT Implementation Plans for the future ● use DVR / VRRP in the future o make network more resilient and elastic ● a third DC room would be desirable :-) o CEPH replicas / MONs, MySQL Galera
  • 29. eBay/PayPal Implementation The scope of Ebay/PayPal OpenStack Clouds ● 100% of PayPal web/mid tier ● Most of Dev/QA ● Number of HVs: 8,500 ● Number of Virtual Machines: 70,000 ● Number of users: Several thousands ● Availability zones: 10
  • 30. eBay/PayPal Implementation ● Database MySQL MMM replication, VIP with FailoverPersistence / Galera ● RabbitMQ VIP with SingleNode FailoverPersistence or 3 nodes with mirrored queues ● NeutronDHCP / LBaaS Corosync/Pacemaker ● API Endpoints LB VIPs for every service with either RR or least connection ● Storage Shared storage with nfs/iscsi
  • 31. eBay/PayPal Implementation Successful HA Implementations ● LoadBalanced HA - VIPs for every service ● LB Single Node Failover Persistence Profile ● Galera/Percona for Identity Service ● Global Identity Service using GLB
  • 32. eBay/PayPal Implementation HA Failures ● Corosync/Pacemaker NeutronDHCP and LBaaS - missing advanced health checks ● RabbitMQ Single Node Failover Persistence ● MySQL Replication Single Node Failover Persistence sometimes doesn't work well Implemented external monitoring and disabling of the failed member. ● VIPs without ECV health checks
  • 33. eBay/PayPal Implementation Future direction ● HA on Global or Regional Services One leg in each Availability Zone (Keystone, LBaaS, Swift) ● RabbitMQ with 3 node/mirrored queues LB VIP with least connections ● No shared NFS for Glance
  • 35. eBay/PayPal Implementation Lessons Learned ● Try not to overcomplicate ● Simulate Failures Before placing in production make sure HA works ● Place your services in different Availability zones or at least different FaultZones ● Always make backups No matter how robust your HA solution is
  • 36. ● OpenStack HA Guide Update Efforts ● WTE Work Group (now known as ‘Enterprise’) ● Share Best Practices Call to Action
  • 37. Reference OpenStack HA guide: http://docs.openstack.org/high-availability-guide/content/index.html Percona Resources https://www.percona.com/resources/mysql-webinars/high-availability-using- mysql-cloud-today-tomorrow-and-keys-your-success HA Proxy Documentation: http://www.haproxy.org/

Editor's Notes

  1. Explain the notion of High Availability under the context of OpenStack. Ensuring high availability of OpenStack Services, API services, and supporting infrastructure including databases, message queues. HA means different at different contexts - is it guest availability? is it DB? is it storage? or is it application availability? if there is a failure, should the application fail over or should the underlying infra? Broadly, protect against system down time and prevent accidental data loss. There could be multiple SPOFs - services, API end points, network components, storage components, infrastructure components such as power, cooling etc. Provide redundancy at appropriate levels OpenStack is a collection of services sharing some common infrastructure. It is not a monolithic application that can be made highly available by slapping in a load balancer. These services are independent and self-contained services with some shared infrastructure among them. They have different configuration, settings and more Some of the components are stateless - such as nova-api, keystone-api, glance-api etc. Some of the components are databases/ message queues OpenStack architecture is very complete Acitve/ Passive - one ‘active’ and on failure, the reduntant service/ system is brought in to action. For stateless services, very minimal config needed. For Stateful services, additional applications such as Pacemaker, CoreSync are needed Active/ Active - both active and redundant systemns are maintained in same state concurrently. For stateless services, active and redundant instances are load balanced using a LB such as HAProxy. Stateless services will need to be maintained in same state. Again, need an LB.