SlideShare a Scribd company logo
1 of 25
TRUSTED CLOUD SOLUTIONS
OpenStack Summit Austin
WE HAVE OUR RIGHT TO SLEEP
Sadique Puthen & Dustin Black
Cloud Success Architect
26th April 2016
How To Troubleshoot Openstack Without
Losing Sleep
sputhenp@redhat.com
@sadiquepp
dustin@redhat.com
@dustinlblack
Manifestation of a Problem
“Our compute service on the compute node is stuck in a state of activating.”
“Most OpenStack Overcloud neutron services inactive and disabled”
No valid host was found. Exceeded max scheduling attempts 3 for instance
PortLimitExceeded: Maximum number of ports exceeded
“User unable to launch new instances”
Instance failed to spawn
Over-Working RabbitMQ
Click to add subtitle
Insert paragraph of copy here and graphic in
box to the right.
● Bullet
● Bullet
● Bullet
Over-Working RabbitMQ
Problem Description: Our compute service on the compute node is stuck in a state of activating
Initial evidence are non-descriptive timeouts:
# journalctl --all --this-boot --no-pager | grep nova
May 27 16:20:50 host.example.com systemd[1]: openstack-nova-
compute.service operation timed out. Terminating.
May 27 16:20:50 host.example.com systemd[1]: Unit openstack-nova-
compute.service entered failed state.
May 27 16:20:50 host.example.com systemd[1]: openstack-nova-
compute.service holdoff time over, scheduling restart.
Rebooting the compute node doesn’t help.
Over-Working RabbitMQ
Problem Description: Our compute service on the compute node is stuck in a state of activating
An strace of the nova-compute service reveals our trouble communicating with rabbit:
# grep :5672 compute.strace
12938 03:29:28.320069 write(3, "2015-05-28 03:29:28.319 12938 ERROR
oslo.messaging._drivers.impl_rabbit [-] AMQP server on
192.168.100.47:5672 is unreachable: Socket closed. Trying again in 1
seconds.n", 169) = 169 <0.000019>
12938 03:29:29.321779 write(3, "2015-05-28 03:29:29.321 12938 INFO
oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on
192.168.100.48:5672n", 126) = 126 <0.000061>
12938 03:29:30.333894 write(3, "2015-05-28 03:29:30.333 12938 INFO
oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on
192.168.100.48:5672n", 123) = 123 <0.000013>
Over-Working RabbitMQ
The strace leads to more logs...
The logs lead to an existing bug report...
The bug report leads to an upstream discussion...
Yadda Yadda Yadda
The rabbitmq-server process is
out of file descriptors!
Problem Description: Our compute service on the compute node is stuck in a state of activating
https://github.com/puppetlabs/puppetlabs-rabbitmq/pull/215#discussion_r24977957
Now you Know!
Too few RabbitMQ file
descriptors is a recipe for
sleepless nights.
Set the rabbitmq-server
NOFILE limit to 65436*
*Be careful if you’re using pacemaker -- limits are set by the resource agent.
Knowledge-Centered Support
● Continuous improvement of the knowledgebase
simplifies troubleshooting of future issues
● Knowledge automatically captured as a by-product of
the problem solving process
● Search and reuse as core disciplines of the support
team
● Fast track to publication means easier self-resolution
https://access.redhat.com/solutions/1465753
WE HAVE OUR RIGHT TO SLEEPIssue #2: Random failure while spawning large
number of instances
$ nova list
ERROR (ConnectionRefused): Unable to establish
connection to http://192.168.1.1:35357/v2.0/tokens
● Connection to various openstack service APIs (nova-
api, cinder-api, neutron-api, etc times out randomly.
● Not reproducible in most of the environments. When
it happens, the failure is random without any pattern.
Sometimes 1 in 100 or 1 in 500, etc.
● Obviously keystone is up and running perfectly fine.
connection
refused!!
neutron-apicinder-apinova-api
Keystone
Issue #2: The symptom is same as issue #1
Result: Random failure in spawning instances, creating volumes, networks, etc.
First suspect is Keystone, but he is innocent.
Where one can go wrong?
Looking at the error message, It’s natural to point fingers at keystone.
● Looked at keystone api logs. No clue!!
● Can see abnormal number of of keystone connections
in CLOSE_WAIT status. Focused and wasted a lot of
time by investigating in that direction.
● It’s time to understand how the connections from end
user to api and keystone goes by focusing on how the
dots are connected.
17
How does it work under the Hood?
connection
refused!!
haproxy
nova-api keystone
mariadb-galera
haproxy
nova-api keystone
mariadb-galera
haproxy
nova-api keystone
mariadb-galera
VIP
nova-api
keystone
database
controller-1 controller-2 controller-3
Possibilities?
Keystone is already ruled out.
● Intermittent network packet drop?
● Haproxy (load balancer) drops connection?
end user -> nova
nova -> keystone
keystone -> database
No, ruled out by network troubleshooting
Likely?
Highly unlikely as the error is when nova connects to keystone.
Slightly likely.
Highly likely. Enabled logging and found heavy client termination
messages.
haproxy[22346]: 10.243.232.62:48999 [10/Jul/2015:01:41:34.706] galera galera/pcmk-hovsh0800sdc-06 1/0/8734961 37181 cD 1369/1337/1337/1337/0 0/0
haproxy[22346]: 10.243.232.14:53092 [10/Jul/2015:02:37:43.666] galera galera/pcmk-hovsh0800sdc-06 1/0/5400007 2875 cD 1375/1337/1337/1337/0 0/0
haproxy[22346]: 10.243.232.62:41742 [10/Jul/2015:01:47:44.819] galera galera/pcmk-hovsh0800sdc-06 1/0/8400246 38448 cD 1376/1336/1336/1336/0 0/0
haproxy[22346]: 10.243.232.14:53318 [10/Jul/2015:02:37:47.499] galera galera/pcmk-hovsh0800sdc-06 1/0/5400005 3414 cD 1384/1335/1335/1335/0 0/0
haproxy[22346]: 10.243.232.62:42507 [10/Jul/2015:02:37:47.529] galera galera/pcmk-hovsh0800sdc-06 1/0/5400006 2875 cD 1383/1334/1334/1334/0 0/0
haproxy[22346]: 10.243.232.62:42609 [10/Jul/2015:02:37:49.103] galera galera/pcmk-hovsh0800sdc-06 1/0/5400315 35783 cD 1384/1334/1334/1334/0 0/0
haproxy[22346]: 10.243.232.62:42684 [10/Jul/2015:02:37:50.598] galera galera/pcmk-hovsh0800sdc-06 1/0/5400259 28994 cD 1384/1334/1334/1334/0 0/0
haproxy[22346]: 10.243.232.14:53493 [10/Jul/2015:02:37:50.885] galera galera/pcmk-hovsh0800sdc-06 1/0/5400007 2875 cD 1383/1333/1333/1333/0 0/0
haproxy[22346]: 10.243.232.14:53674 [10/Jul/2015:02:37:53.874] galera galera/pcmk-hovsh0800sdc-06 1/0/5400007 3498 cD 1404/1335/1335/1335/0 0/0
haproxy[22346]: 10.243.232.14:54625 [10/Jul/2015:02:38:11.399] galera galera/pcmk-hovsh0800sdc-06 1/0/5400008 12461 cD 1407/1335/1335/1335/0 0/0
19
galera: sessions
max: 2000 Limit: 2000
Hold on, but where did I set it? Nowhere!!!
● Then from where this limit comes to effect?
This is the default hard coded limit for each
proxy if one is explicitly not defined.
● Then why there is no proper error message?
Connection by haproxy is sent into a queue
waiting for free database connection, then
terminated when it hits timeout.
Haproxy has hit maxconn for galera!
listen galera
bind 10.243.232.62:3306
mode tcp
option tcplog
option httpchk
option tcpka
stick on dst
stick-table type ip size 2
timeout client 90m
timeout server 90m
server controller-1 10.243.232.14:3306 check inter 1s on-
marked-down shutdown-sessions
server controller-2 10.243.232.15:3306 check inter 1s on-
marked-down shutdown-sessions
server controller-3 10.243.232.16:3306 check inter 1s on-
marked-down shutdown-sessions
global
daemon
group haproxy
maxconn 40000
pidfile /var/run/haproxy.pid
user haproxy
defaults
log 127.0.0.1 local2 warning
mode tcp
option tcplog
option redispatch
retries 3
timeout connect 5s
timeout client 30s
timeout server 30s
maxconn 2000
20
I solved your problem, can I go and sleep? Hold on..
● It took more time to determine the right value for maximum database
connection because it depends on,
○ How many workers are spawned by each api?
■ Depends on api_workers/workers configuration for
each service.
● Depends on how many cpu cores are there on
each controller?
■ This can differ from deployment to deployment.
○ Each worker process opens five long lived database
connection.
○ There are also some short lived connections by each worker.
What should be the maxconn for galera?
Now I can
sleep like
him.
# Number of workers for OpenStack API service. The default
will be the number of CPUs available. (integer value)
21
nova-api
24x3 = 72
mariadb-galera
controller-1
cores = 24
Based on default deployment by RHEL Openstack Platform Director.
What should be the maxconn for galera?
keystone
24x2 = 48
neutron-s
24x2 = 48
glance-ap
24x1 = 24
cinder-api
24x1 = 24
glance-re
24x1 = 24
nova-con
24x1 = 24
nova-api
24x3 = 72
controller-1
cores = 24
keystone
24x2 = 48
neutron-s
24x2 = 48
glance-ap
24x1 = 24
cinder-api
24x1 = 24
glance-re
24x1 = 24
nova-con
24x1 = 24
nova-api
24x3 = 72
controller-1
cores = 24
keystone
24x2 = 48
neutron-s
24x2 = 48
glance-ap
24x1 = 24
cinder-api
24x1 = 24
glance-re
24x1 = 24
nova-con
24x1 = 24
mariadb-galera mariadb-galera
total = 264x5 =1320
Haproxy-VIP Total is 3960
total = 264x5 =1320 total = 264x5 =1320
Add 1024 for:
1 - Short lived connections
2 - Other services.
3 - New services.
Total = 4960
22
To sleep like a …..?
Setting the right maxconn value upfront for database
proxy can save you from sleepless nights.
● Decide how many worker threads are required by
each api for optimum performance. A 96 core system
does not need x3 nova worker processes.
● Automate this calculation and set it during
deployment time itself.
Both haproxy and for database server.
max_connections
● Those use different load balancers, make sure to
address this problem, if applicable.
Decide and Set the right value upfront before
going to bed.
Proactive alerts
Real-time
risk assessment
No infrastructure cost Validated resolution
Tailored resolution
Quick setup
SaaS
Discover the Beta: access.redhat.com/insights
sputhenp@redhat.com
@sadiquepp
dustin@redhat.com
@dustinlblack
THANK YOU
plus.google.com/+RedHat
youtube.com/user/RedHatVideos
facebook.com/redhatinc
twitter.com/RedHatNewslinkedin.com/company/red-hat

More Related Content

What's hot

Building IAM for OpenStack
Building IAM for OpenStackBuilding IAM for OpenStack
Building IAM for OpenStackSteve Martinelli
 
Pulsar Functions Deep Dive_Sanjeev kulkarni
Pulsar Functions Deep Dive_Sanjeev kulkarniPulsar Functions Deep Dive_Sanjeev kulkarni
Pulsar Functions Deep Dive_Sanjeev kulkarniStreamNative
 
[오픈소스컨설팅]오픈스택에 대하여
[오픈소스컨설팅]오픈스택에 대하여[오픈소스컨설팅]오픈스택에 대하여
[오픈소스컨설팅]오픈스택에 대하여Ji-Woong Choi
 
Zabbix, garder un oeil toujours ouvert
Zabbix, garder un oeil toujours ouvertZabbix, garder un oeil toujours ouvert
Zabbix, garder un oeil toujours ouvertLook a box
 
OpenStack networking (Neutron)
OpenStack networking (Neutron) OpenStack networking (Neutron)
OpenStack networking (Neutron) CREATE-NET
 
OVN - Basics and deep dive
OVN - Basics and deep diveOVN - Basics and deep dive
OVN - Basics and deep diveTrinath Somanchi
 
VXLAN and FRRouting
VXLAN and FRRoutingVXLAN and FRRouting
VXLAN and FRRoutingFaisal Reza
 
Red Hat OpenStack 17 저자직강+스터디그룹_1주차
Red Hat OpenStack 17 저자직강+스터디그룹_1주차Red Hat OpenStack 17 저자직강+스터디그룹_1주차
Red Hat OpenStack 17 저자직강+스터디그룹_1주차Nalee Jang
 
Building Multi-Site and Multi-OpenStack Cloud with OpenStack Cascading
Building Multi-Site and Multi-OpenStack Cloud with OpenStack CascadingBuilding Multi-Site and Multi-OpenStack Cloud with OpenStack Cascading
Building Multi-Site and Multi-OpenStack Cloud with OpenStack CascadingJoe Huang
 
Network analysis Using Wireshark Lesson 11: TCP and UDP Analysis
Network analysis Using Wireshark Lesson 11: TCP and UDP AnalysisNetwork analysis Using Wireshark Lesson 11: TCP and UDP Analysis
Network analysis Using Wireshark Lesson 11: TCP and UDP AnalysisYoram Orzach
 
Overview of Distributed Virtual Router (DVR) in Openstack/Neutron
Overview of Distributed Virtual Router (DVR) in Openstack/NeutronOverview of Distributed Virtual Router (DVR) in Openstack/Neutron
Overview of Distributed Virtual Router (DVR) in Openstack/Neutronvivekkonnect
 
Issues of OpenStack multi-region mode
Issues of OpenStack multi-region modeIssues of OpenStack multi-region mode
Issues of OpenStack multi-region modeJoe Huang
 
OpenStack Tutorial
OpenStack TutorialOpenStack Tutorial
OpenStack TutorialBret Piatt
 
L3HA-VRRP-20141201
L3HA-VRRP-20141201L3HA-VRRP-20141201
L3HA-VRRP-20141201Manabu Ori
 
Introduction of OpenStack cascading solution
Introduction of OpenStack cascading solutionIntroduction of OpenStack cascading solution
Introduction of OpenStack cascading solutionJoe Huang
 
OpenStack Networking
OpenStack NetworkingOpenStack Networking
OpenStack NetworkingIlya Shakhat
 

What's hot (20)

Building IAM for OpenStack
Building IAM for OpenStackBuilding IAM for OpenStack
Building IAM for OpenStack
 
Meetup 23 - 02 - OVN - The future of networking in OpenStack
Meetup 23 - 02 - OVN - The future of networking in OpenStackMeetup 23 - 02 - OVN - The future of networking in OpenStack
Meetup 23 - 02 - OVN - The future of networking in OpenStack
 
Pulsar Functions Deep Dive_Sanjeev kulkarni
Pulsar Functions Deep Dive_Sanjeev kulkarniPulsar Functions Deep Dive_Sanjeev kulkarni
Pulsar Functions Deep Dive_Sanjeev kulkarni
 
[오픈소스컨설팅]오픈스택에 대하여
[오픈소스컨설팅]오픈스택에 대하여[오픈소스컨설팅]오픈스택에 대하여
[오픈소스컨설팅]오픈스택에 대하여
 
Zabbix, garder un oeil toujours ouvert
Zabbix, garder un oeil toujours ouvertZabbix, garder un oeil toujours ouvert
Zabbix, garder un oeil toujours ouvert
 
EMEA Airheads - Aruba Remote Access Point (RAP) Troubleshooting
EMEA Airheads - Aruba Remote Access Point (RAP) TroubleshootingEMEA Airheads - Aruba Remote Access Point (RAP) Troubleshooting
EMEA Airheads - Aruba Remote Access Point (RAP) Troubleshooting
 
OpenStack networking (Neutron)
OpenStack networking (Neutron) OpenStack networking (Neutron)
OpenStack networking (Neutron)
 
Aruba instant iap setup rev3
Aruba instant iap setup rev3Aruba instant iap setup rev3
Aruba instant iap setup rev3
 
OVN - Basics and deep dive
OVN - Basics and deep diveOVN - Basics and deep dive
OVN - Basics and deep dive
 
EMEA Airheads- Aruba Central with Instant AP
EMEA Airheads- Aruba Central with Instant APEMEA Airheads- Aruba Central with Instant AP
EMEA Airheads- Aruba Central with Instant AP
 
VXLAN and FRRouting
VXLAN and FRRoutingVXLAN and FRRouting
VXLAN and FRRouting
 
Red Hat OpenStack 17 저자직강+스터디그룹_1주차
Red Hat OpenStack 17 저자직강+스터디그룹_1주차Red Hat OpenStack 17 저자직강+스터디그룹_1주차
Red Hat OpenStack 17 저자직강+스터디그룹_1주차
 
Building Multi-Site and Multi-OpenStack Cloud with OpenStack Cascading
Building Multi-Site and Multi-OpenStack Cloud with OpenStack CascadingBuilding Multi-Site and Multi-OpenStack Cloud with OpenStack Cascading
Building Multi-Site and Multi-OpenStack Cloud with OpenStack Cascading
 
Network analysis Using Wireshark Lesson 11: TCP and UDP Analysis
Network analysis Using Wireshark Lesson 11: TCP and UDP AnalysisNetwork analysis Using Wireshark Lesson 11: TCP and UDP Analysis
Network analysis Using Wireshark Lesson 11: TCP and UDP Analysis
 
Overview of Distributed Virtual Router (DVR) in Openstack/Neutron
Overview of Distributed Virtual Router (DVR) in Openstack/NeutronOverview of Distributed Virtual Router (DVR) in Openstack/Neutron
Overview of Distributed Virtual Router (DVR) in Openstack/Neutron
 
Issues of OpenStack multi-region mode
Issues of OpenStack multi-region modeIssues of OpenStack multi-region mode
Issues of OpenStack multi-region mode
 
OpenStack Tutorial
OpenStack TutorialOpenStack Tutorial
OpenStack Tutorial
 
L3HA-VRRP-20141201
L3HA-VRRP-20141201L3HA-VRRP-20141201
L3HA-VRRP-20141201
 
Introduction of OpenStack cascading solution
Introduction of OpenStack cascading solutionIntroduction of OpenStack cascading solution
Introduction of OpenStack cascading solution
 
OpenStack Networking
OpenStack NetworkingOpenStack Networking
OpenStack Networking
 

Viewers also liked

Troubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issuesTroubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issuesMichael Klishin
 
Troubleshooting RabbitMQ and services that use it
Troubleshooting RabbitMQ and services that use itTroubleshooting RabbitMQ and services that use it
Troubleshooting RabbitMQ and services that use itMichael Klishin
 
Openstack on Fedora, Fedora on Openstack: An Introduction to cloud IaaS
Openstack on Fedora, Fedora on Openstack: An Introduction to cloud IaaSOpenstack on Fedora, Fedora on Openstack: An Introduction to cloud IaaS
Openstack on Fedora, Fedora on Openstack: An Introduction to cloud IaaSSadique Puthen
 
Multi tier-app-network-topology-neutron-final
Multi tier-app-network-topology-neutron-finalMulti tier-app-network-topology-neutron-final
Multi tier-app-network-topology-neutron-finalSadique Puthen
 
Learning From Real Practice of Providing Highly Available Hybrid Cloud Servic...
Learning From Real Practice of Providing Highly Available Hybrid Cloud Servic...Learning From Real Practice of Providing Highly Available Hybrid Cloud Servic...
Learning From Real Practice of Providing Highly Available Hybrid Cloud Servic...LF Events
 
Neutron Network Namespaces and IPtables--A Technical Deep Dive
Neutron Network Namespaces and IPtables--A Technical Deep DiveNeutron Network Namespaces and IPtables--A Technical Deep Dive
Neutron Network Namespaces and IPtables--A Technical Deep DiveMirantis
 
OpenStack Architecture and Use Cases
OpenStack Architecture and Use CasesOpenStack Architecture and Use Cases
OpenStack Architecture and Use CasesJalal Mostafa
 
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and Fanout
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and FanoutOpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and Fanout
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and FanoutSaju Madhavan
 
Getting Started With OpenStack (Havana)
Getting Started With OpenStack (Havana)Getting Started With OpenStack (Havana)
Getting Started With OpenStack (Havana)Kenneth Hui
 
Simplifying the OpenStack and Kubernetes network stack with Romana
Simplifying the OpenStack and Kubernetes network stack with RomanaSimplifying the OpenStack and Kubernetes network stack with Romana
Simplifying the OpenStack and Kubernetes network stack with RomanaJuergen Brendel
 
Summit 16: Cengn Experience in Opnfv Projects
Summit 16: Cengn Experience in Opnfv ProjectsSummit 16: Cengn Experience in Opnfv Projects
Summit 16: Cengn Experience in Opnfv ProjectsOPNFV
 
Monasca 를 이용한 cloud 모니터링 final
Monasca 를 이용한 cloud 모니터링 finalMonasca 를 이용한 cloud 모니터링 final
Monasca 를 이용한 cloud 모니터링 finalSangWook Byun
 
Mining Your Logs - Gaining Insight Through Visualization
Mining Your Logs - Gaining Insight Through VisualizationMining Your Logs - Gaining Insight Through Visualization
Mining Your Logs - Gaining Insight Through VisualizationRaffael Marty
 
Bridging The Gap: Explaining OpenStack To VMware Administrators
Bridging The Gap: Explaining OpenStack To VMware AdministratorsBridging The Gap: Explaining OpenStack To VMware Administrators
Bridging The Gap: Explaining OpenStack To VMware AdministratorsKenneth Hui
 
Apricot2017 Request tracing in distributed environment
Apricot2017 Request tracing in distributed environmentApricot2017 Request tracing in distributed environment
Apricot2017 Request tracing in distributed environmentHieu LE ☁
 
OpenStack本番環境の作り方 - Interop 2016
OpenStack本番環境の作り方 - Interop 2016OpenStack本番環境の作り方 - Interop 2016
OpenStack本番環境の作り方 - Interop 2016VirtualTech Japan Inc.
 
OpenStack networking-sfc flow 분석
OpenStack networking-sfc flow 분석OpenStack networking-sfc flow 분석
OpenStack networking-sfc flow 분석Yongyoon Shin
 
Internet Resource Management (IRM) & Internet Routing Registry (IRR)
Internet Resource Management (IRM) & Internet Routing Registry (IRR)Internet Resource Management (IRM) & Internet Routing Registry (IRR)
Internet Resource Management (IRM) & Internet Routing Registry (IRR)APNIC
 

Viewers also liked (20)

Troubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issuesTroubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issues
 
Troubleshooting RabbitMQ and services that use it
Troubleshooting RabbitMQ and services that use itTroubleshooting RabbitMQ and services that use it
Troubleshooting RabbitMQ and services that use it
 
RabbitMQ Operations
RabbitMQ OperationsRabbitMQ Operations
RabbitMQ Operations
 
Openstack on Fedora, Fedora on Openstack: An Introduction to cloud IaaS
Openstack on Fedora, Fedora on Openstack: An Introduction to cloud IaaSOpenstack on Fedora, Fedora on Openstack: An Introduction to cloud IaaS
Openstack on Fedora, Fedora on Openstack: An Introduction to cloud IaaS
 
Multi tier-app-network-topology-neutron-final
Multi tier-app-network-topology-neutron-finalMulti tier-app-network-topology-neutron-final
Multi tier-app-network-topology-neutron-final
 
Learning From Real Practice of Providing Highly Available Hybrid Cloud Servic...
Learning From Real Practice of Providing Highly Available Hybrid Cloud Servic...Learning From Real Practice of Providing Highly Available Hybrid Cloud Servic...
Learning From Real Practice of Providing Highly Available Hybrid Cloud Servic...
 
Neutron Network Namespaces and IPtables--A Technical Deep Dive
Neutron Network Namespaces and IPtables--A Technical Deep DiveNeutron Network Namespaces and IPtables--A Technical Deep Dive
Neutron Network Namespaces and IPtables--A Technical Deep Dive
 
OpenStack Architecture and Use Cases
OpenStack Architecture and Use CasesOpenStack Architecture and Use Cases
OpenStack Architecture and Use Cases
 
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and Fanout
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and FanoutOpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and Fanout
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and Fanout
 
Getting Started With OpenStack (Havana)
Getting Started With OpenStack (Havana)Getting Started With OpenStack (Havana)
Getting Started With OpenStack (Havana)
 
Simplifying the OpenStack and Kubernetes network stack with Romana
Simplifying the OpenStack and Kubernetes network stack with RomanaSimplifying the OpenStack and Kubernetes network stack with Romana
Simplifying the OpenStack and Kubernetes network stack with Romana
 
Summit 16: Cengn Experience in Opnfv Projects
Summit 16: Cengn Experience in Opnfv ProjectsSummit 16: Cengn Experience in Opnfv Projects
Summit 16: Cengn Experience in Opnfv Projects
 
Monasca 를 이용한 cloud 모니터링 final
Monasca 를 이용한 cloud 모니터링 finalMonasca 를 이용한 cloud 모니터링 final
Monasca 를 이용한 cloud 모니터링 final
 
Mining Your Logs - Gaining Insight Through Visualization
Mining Your Logs - Gaining Insight Through VisualizationMining Your Logs - Gaining Insight Through Visualization
Mining Your Logs - Gaining Insight Through Visualization
 
Bridging The Gap: Explaining OpenStack To VMware Administrators
Bridging The Gap: Explaining OpenStack To VMware AdministratorsBridging The Gap: Explaining OpenStack To VMware Administrators
Bridging The Gap: Explaining OpenStack To VMware Administrators
 
Apricot2017 Request tracing in distributed environment
Apricot2017 Request tracing in distributed environmentApricot2017 Request tracing in distributed environment
Apricot2017 Request tracing in distributed environment
 
OpenStack本番環境の作り方 - Interop 2016
OpenStack本番環境の作り方 - Interop 2016OpenStack本番環境の作り方 - Interop 2016
OpenStack本番環境の作り方 - Interop 2016
 
How to Develop OpenStack
How to Develop OpenStackHow to Develop OpenStack
How to Develop OpenStack
 
OpenStack networking-sfc flow 분석
OpenStack networking-sfc flow 분석OpenStack networking-sfc flow 분석
OpenStack networking-sfc flow 분석
 
Internet Resource Management (IRM) & Internet Routing Registry (IRR)
Internet Resource Management (IRM) & Internet Routing Registry (IRR)Internet Resource Management (IRM) & Internet Routing Registry (IRR)
Internet Resource Management (IRM) & Internet Routing Registry (IRR)
 

Similar to How to Troubleshoot OpenStack Without Losing Sleep

Anatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersAnatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersSadique Puthen
 
FPC for the Masses - CoRIIN 2018
FPC for the Masses - CoRIIN 2018FPC for the Masses - CoRIIN 2018
FPC for the Masses - CoRIIN 2018Xavier Mertens
 
Velocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayVelocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayCosimo Streppone
 
Finding an unusual cause of max_user_connections in MySQL
Finding an unusual cause of max_user_connections in MySQLFinding an unusual cause of max_user_connections in MySQL
Finding an unusual cause of max_user_connections in MySQLOlivier Doucet
 
Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...
Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...
Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...idsecconf
 
Evolution of Openstack Networking at CERN
Evolution of Openstack Networking at CERNEvolution of Openstack Networking at CERN
Evolution of Openstack Networking at CERNBelmiro Moreira
 
DDoS: Practical Survival Guide
DDoS: Practical Survival GuideDDoS: Practical Survival Guide
DDoS: Practical Survival GuideHLL
 
Can you trust Neutron?
Can you trust Neutron?Can you trust Neutron?
Can you trust Neutron?salv_orlando
 
Tuning TCP and NGINX on EC2
Tuning TCP and NGINX on EC2Tuning TCP and NGINX on EC2
Tuning TCP and NGINX on EC2Chartbeat
 
Couch to OpenStack: Neutron (Quantum) - August 13, 2013 Featuring Sean Winn
Couch to OpenStack: Neutron (Quantum) - August 13, 2013 Featuring Sean WinnCouch to OpenStack: Neutron (Quantum) - August 13, 2013 Featuring Sean Winn
Couch to OpenStack: Neutron (Quantum) - August 13, 2013 Featuring Sean WinnTrevor Roberts Jr.
 
Load Balancing MySQL with HAProxy - Slides
Load Balancing MySQL with HAProxy - SlidesLoad Balancing MySQL with HAProxy - Slides
Load Balancing MySQL with HAProxy - SlidesSeveralnines
 
Dockerizing the Hard Services: Neutron and Nova
Dockerizing the Hard Services: Neutron and NovaDockerizing the Hard Services: Neutron and Nova
Dockerizing the Hard Services: Neutron and Novaclayton_oneill
 
Boot Strapping in Cassandra
Boot Strapping  in CassandraBoot Strapping  in Cassandra
Boot Strapping in CassandraArunit Gupta
 

Similar to How to Troubleshoot OpenStack Without Losing Sleep (20)

Anatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersAnatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoorters
 
FPC for the Masses - CoRIIN 2018
FPC for the Masses - CoRIIN 2018FPC for the Masses - CoRIIN 2018
FPC for the Masses - CoRIIN 2018
 
Velocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayVelocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard Way
 
Finding an unusual cause of max_user_connections in MySQL
Finding an unusual cause of max_user_connections in MySQLFinding an unusual cause of max_user_connections in MySQL
Finding an unusual cause of max_user_connections in MySQL
 
Who Broke My Crypto
Who Broke My CryptoWho Broke My Crypto
Who Broke My Crypto
 
lightning talk proposal
lightning talk proposallightning talk proposal
lightning talk proposal
 
Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...
Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...
Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...
 
Evolution of Openstack Networking at CERN
Evolution of Openstack Networking at CERNEvolution of Openstack Networking at CERN
Evolution of Openstack Networking at CERN
 
DDoS: Practical Survival Guide
DDoS: Practical Survival GuideDDoS: Practical Survival Guide
DDoS: Practical Survival Guide
 
Long live to CMAN!
Long live to CMAN!Long live to CMAN!
Long live to CMAN!
 
Haproxy - zastosowania
Haproxy - zastosowaniaHaproxy - zastosowania
Haproxy - zastosowania
 
Nova HA
Nova HANova HA
Nova HA
 
T.Pollak y C.Yaconi - Prey
T.Pollak y C.Yaconi - PreyT.Pollak y C.Yaconi - Prey
T.Pollak y C.Yaconi - Prey
 
Can you trust Neutron?
Can you trust Neutron?Can you trust Neutron?
Can you trust Neutron?
 
Tuning TCP and NGINX on EC2
Tuning TCP and NGINX on EC2Tuning TCP and NGINX on EC2
Tuning TCP and NGINX on EC2
 
Couch to OpenStack: Neutron (Quantum) - August 13, 2013 Featuring Sean Winn
Couch to OpenStack: Neutron (Quantum) - August 13, 2013 Featuring Sean WinnCouch to OpenStack: Neutron (Quantum) - August 13, 2013 Featuring Sean Winn
Couch to OpenStack: Neutron (Quantum) - August 13, 2013 Featuring Sean Winn
 
Load Balancing MySQL with HAProxy - Slides
Load Balancing MySQL with HAProxy - SlidesLoad Balancing MySQL with HAProxy - Slides
Load Balancing MySQL with HAProxy - Slides
 
DDoS: practical survival
DDoS: practical survivalDDoS: practical survival
DDoS: practical survival
 
Dockerizing the Hard Services: Neutron and Nova
Dockerizing the Hard Services: Neutron and NovaDockerizing the Hard Services: Neutron and Nova
Dockerizing the Hard Services: Neutron and Nova
 
Boot Strapping in Cassandra
Boot Strapping  in CassandraBoot Strapping  in Cassandra
Boot Strapping in Cassandra
 

Recently uploaded

Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
NSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationNSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationMarko4394
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxeditsforyah
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 

Recently uploaded (17)

Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
NSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationNSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentation
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptx
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 

How to Troubleshoot OpenStack Without Losing Sleep

  • 2. WE HAVE OUR RIGHT TO SLEEP
  • 3. Sadique Puthen & Dustin Black Cloud Success Architect 26th April 2016 How To Troubleshoot Openstack Without Losing Sleep
  • 5. Manifestation of a Problem “Our compute service on the compute node is stuck in a state of activating.” “Most OpenStack Overcloud neutron services inactive and disabled” No valid host was found. Exceeded max scheduling attempts 3 for instance PortLimitExceeded: Maximum number of ports exceeded “User unable to launch new instances” Instance failed to spawn
  • 6.
  • 7. Over-Working RabbitMQ Click to add subtitle Insert paragraph of copy here and graphic in box to the right. ● Bullet ● Bullet ● Bullet
  • 8. Over-Working RabbitMQ Problem Description: Our compute service on the compute node is stuck in a state of activating Initial evidence are non-descriptive timeouts: # journalctl --all --this-boot --no-pager | grep nova May 27 16:20:50 host.example.com systemd[1]: openstack-nova- compute.service operation timed out. Terminating. May 27 16:20:50 host.example.com systemd[1]: Unit openstack-nova- compute.service entered failed state. May 27 16:20:50 host.example.com systemd[1]: openstack-nova- compute.service holdoff time over, scheduling restart. Rebooting the compute node doesn’t help.
  • 9. Over-Working RabbitMQ Problem Description: Our compute service on the compute node is stuck in a state of activating An strace of the nova-compute service reveals our trouble communicating with rabbit: # grep :5672 compute.strace 12938 03:29:28.320069 write(3, "2015-05-28 03:29:28.319 12938 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on 192.168.100.47:5672 is unreachable: Socket closed. Trying again in 1 seconds.n", 169) = 169 <0.000019> 12938 03:29:29.321779 write(3, "2015-05-28 03:29:29.321 12938 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on 192.168.100.48:5672n", 126) = 126 <0.000061> 12938 03:29:30.333894 write(3, "2015-05-28 03:29:30.333 12938 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 192.168.100.48:5672n", 123) = 123 <0.000013>
  • 10. Over-Working RabbitMQ The strace leads to more logs... The logs lead to an existing bug report... The bug report leads to an upstream discussion... Yadda Yadda Yadda The rabbitmq-server process is out of file descriptors! Problem Description: Our compute service on the compute node is stuck in a state of activating
  • 12. Now you Know! Too few RabbitMQ file descriptors is a recipe for sleepless nights. Set the rabbitmq-server NOFILE limit to 65436* *Be careful if you’re using pacemaker -- limits are set by the resource agent.
  • 13. Knowledge-Centered Support ● Continuous improvement of the knowledgebase simplifies troubleshooting of future issues ● Knowledge automatically captured as a by-product of the problem solving process ● Search and reuse as core disciplines of the support team ● Fast track to publication means easier self-resolution https://access.redhat.com/solutions/1465753
  • 14. WE HAVE OUR RIGHT TO SLEEPIssue #2: Random failure while spawning large number of instances
  • 15. $ nova list ERROR (ConnectionRefused): Unable to establish connection to http://192.168.1.1:35357/v2.0/tokens ● Connection to various openstack service APIs (nova- api, cinder-api, neutron-api, etc times out randomly. ● Not reproducible in most of the environments. When it happens, the failure is random without any pattern. Sometimes 1 in 100 or 1 in 500, etc. ● Obviously keystone is up and running perfectly fine. connection refused!! neutron-apicinder-apinova-api Keystone Issue #2: The symptom is same as issue #1 Result: Random failure in spawning instances, creating volumes, networks, etc.
  • 16. First suspect is Keystone, but he is innocent. Where one can go wrong? Looking at the error message, It’s natural to point fingers at keystone. ● Looked at keystone api logs. No clue!! ● Can see abnormal number of of keystone connections in CLOSE_WAIT status. Focused and wasted a lot of time by investigating in that direction. ● It’s time to understand how the connections from end user to api and keystone goes by focusing on how the dots are connected.
  • 17. 17 How does it work under the Hood? connection refused!! haproxy nova-api keystone mariadb-galera haproxy nova-api keystone mariadb-galera haproxy nova-api keystone mariadb-galera VIP nova-api keystone database controller-1 controller-2 controller-3
  • 18. Possibilities? Keystone is already ruled out. ● Intermittent network packet drop? ● Haproxy (load balancer) drops connection? end user -> nova nova -> keystone keystone -> database No, ruled out by network troubleshooting Likely? Highly unlikely as the error is when nova connects to keystone. Slightly likely. Highly likely. Enabled logging and found heavy client termination messages. haproxy[22346]: 10.243.232.62:48999 [10/Jul/2015:01:41:34.706] galera galera/pcmk-hovsh0800sdc-06 1/0/8734961 37181 cD 1369/1337/1337/1337/0 0/0 haproxy[22346]: 10.243.232.14:53092 [10/Jul/2015:02:37:43.666] galera galera/pcmk-hovsh0800sdc-06 1/0/5400007 2875 cD 1375/1337/1337/1337/0 0/0 haproxy[22346]: 10.243.232.62:41742 [10/Jul/2015:01:47:44.819] galera galera/pcmk-hovsh0800sdc-06 1/0/8400246 38448 cD 1376/1336/1336/1336/0 0/0 haproxy[22346]: 10.243.232.14:53318 [10/Jul/2015:02:37:47.499] galera galera/pcmk-hovsh0800sdc-06 1/0/5400005 3414 cD 1384/1335/1335/1335/0 0/0 haproxy[22346]: 10.243.232.62:42507 [10/Jul/2015:02:37:47.529] galera galera/pcmk-hovsh0800sdc-06 1/0/5400006 2875 cD 1383/1334/1334/1334/0 0/0 haproxy[22346]: 10.243.232.62:42609 [10/Jul/2015:02:37:49.103] galera galera/pcmk-hovsh0800sdc-06 1/0/5400315 35783 cD 1384/1334/1334/1334/0 0/0 haproxy[22346]: 10.243.232.62:42684 [10/Jul/2015:02:37:50.598] galera galera/pcmk-hovsh0800sdc-06 1/0/5400259 28994 cD 1384/1334/1334/1334/0 0/0 haproxy[22346]: 10.243.232.14:53493 [10/Jul/2015:02:37:50.885] galera galera/pcmk-hovsh0800sdc-06 1/0/5400007 2875 cD 1383/1333/1333/1333/0 0/0 haproxy[22346]: 10.243.232.14:53674 [10/Jul/2015:02:37:53.874] galera galera/pcmk-hovsh0800sdc-06 1/0/5400007 3498 cD 1404/1335/1335/1335/0 0/0 haproxy[22346]: 10.243.232.14:54625 [10/Jul/2015:02:38:11.399] galera galera/pcmk-hovsh0800sdc-06 1/0/5400008 12461 cD 1407/1335/1335/1335/0 0/0
  • 19. 19 galera: sessions max: 2000 Limit: 2000 Hold on, but where did I set it? Nowhere!!! ● Then from where this limit comes to effect? This is the default hard coded limit for each proxy if one is explicitly not defined. ● Then why there is no proper error message? Connection by haproxy is sent into a queue waiting for free database connection, then terminated when it hits timeout. Haproxy has hit maxconn for galera! listen galera bind 10.243.232.62:3306 mode tcp option tcplog option httpchk option tcpka stick on dst stick-table type ip size 2 timeout client 90m timeout server 90m server controller-1 10.243.232.14:3306 check inter 1s on- marked-down shutdown-sessions server controller-2 10.243.232.15:3306 check inter 1s on- marked-down shutdown-sessions server controller-3 10.243.232.16:3306 check inter 1s on- marked-down shutdown-sessions global daemon group haproxy maxconn 40000 pidfile /var/run/haproxy.pid user haproxy defaults log 127.0.0.1 local2 warning mode tcp option tcplog option redispatch retries 3 timeout connect 5s timeout client 30s timeout server 30s maxconn 2000
  • 20. 20 I solved your problem, can I go and sleep? Hold on.. ● It took more time to determine the right value for maximum database connection because it depends on, ○ How many workers are spawned by each api? ■ Depends on api_workers/workers configuration for each service. ● Depends on how many cpu cores are there on each controller? ■ This can differ from deployment to deployment. ○ Each worker process opens five long lived database connection. ○ There are also some short lived connections by each worker. What should be the maxconn for galera? Now I can sleep like him. # Number of workers for OpenStack API service. The default will be the number of CPUs available. (integer value)
  • 21. 21 nova-api 24x3 = 72 mariadb-galera controller-1 cores = 24 Based on default deployment by RHEL Openstack Platform Director. What should be the maxconn for galera? keystone 24x2 = 48 neutron-s 24x2 = 48 glance-ap 24x1 = 24 cinder-api 24x1 = 24 glance-re 24x1 = 24 nova-con 24x1 = 24 nova-api 24x3 = 72 controller-1 cores = 24 keystone 24x2 = 48 neutron-s 24x2 = 48 glance-ap 24x1 = 24 cinder-api 24x1 = 24 glance-re 24x1 = 24 nova-con 24x1 = 24 nova-api 24x3 = 72 controller-1 cores = 24 keystone 24x2 = 48 neutron-s 24x2 = 48 glance-ap 24x1 = 24 cinder-api 24x1 = 24 glance-re 24x1 = 24 nova-con 24x1 = 24 mariadb-galera mariadb-galera total = 264x5 =1320 Haproxy-VIP Total is 3960 total = 264x5 =1320 total = 264x5 =1320 Add 1024 for: 1 - Short lived connections 2 - Other services. 3 - New services. Total = 4960
  • 22. 22 To sleep like a …..? Setting the right maxconn value upfront for database proxy can save you from sleepless nights. ● Decide how many worker threads are required by each api for optimum performance. A 96 core system does not need x3 nova worker processes. ● Automate this calculation and set it during deployment time itself. Both haproxy and for database server. max_connections ● Those use different load balancers, make sure to address this problem, if applicable. Decide and Set the right value upfront before going to bed.
  • 23. Proactive alerts Real-time risk assessment No infrastructure cost Validated resolution Tailored resolution Quick setup SaaS Discover the Beta: access.redhat.com/insights