3. Booking.com
• One of the largest travel e-commerce sites in the world
• part of the Priceline Group (NASDAQ: PCLN)
• We offer accommodation in 228 countries
• Our website and customer service in 40 languages
• More than 15,000 employees in 204 offices in 70 countries
• We use thousands of MySQL servers:
• we use orchestrator to manage the topology and handle master and
intermediate master failures
2
6. Orchestrator
• Written by Shlomi Noach
• he started on this at outbrain and is now working at github.com
• He introduced booking.com to orchestrator about 3 years ago when we were
looking for something to handle failovers automatically
5
15. Integration with Internal Infrastructure
• Populate the metadata db on the master to:
• Map host or instance names to more familiar cluster names
• How to determine replication delay
• Configuration of acceptable levels of replication delay
• Add and removal of servers/instances as they are deployed or
removed from service
• Setup of Pseudo-GTID (if not using GTID)
14
27. Performance
golang specific -isms
• Orchestrator by default uses database/sql and by default sends a
query with parameters using MySQL’s Prepare/Execute syntax
• This generates 2 rtt’s and on slower connections can affect the elapsed time
to complete a query
• Options to disable this by interpolating parameter values prior to sending SQL
• Go (orchestrator code) is quite happy to try to poll 10,000 servers at
once
• Sometimes that is not sensible
• Throttling to avoid thundering herd is necessary
26
33. Special Cases
• Testing MySQL 8.0 or MariaDB 10.3?
• “Let’s not promote to this box”
• Same applies while testing new minor versions of course
• Some topologies have slaves with aggregate data
• Do not treat them as a normal box – should not be candidate masters
• Orchestrator can not handle GR or multi-source replication yet
• Best to avoid these boxes (for automatic failover) until we have solutions
• Patches welcome to solve such missing functionality
32
35. Provide Wider User Access
• Orchestrator fan club
• Different groups of users like orchestrator
• DBAs, Developers, Sysadmins, Auditors, Managers
• Use nginx (or similar)
• Provides authentication
• Provides TLS
• The combination can be used with unix groups to allow user or admin access
to orchestrator
• Combined with a load balancer provides easy access for users and
also for applications (using api calls)
34
44. Configuration settings
• Recovery settings
• Regexp filters – very site dependent
• RecoverMasterClusterFilters – white-list masters by cluster name
• RecoverIntermediateMasterClusterFilters
• PromotionIgnoreHostnameFilters – ignore servers from being promoted*
• RecoveryIgnoreHostnameFilters – ignore special servers from recovery
43
* Does not scale well
45. Configuration settings
• Failover settings
• OnFailureDetectionProcesses – what to do when a failure is detected
• PreFailoverProcesses – what to do prior to starting recovery
• PostFailoverProcesses – what to do after completing recovery
• PostUnsuccessfulFailoverProcesses – what to do if recovery fails
• PostMasterFailoverProcesses – what to do after IM recovery
• PostIntermediateMasterFailoverProcesses – what to do after Master recovery
44
46. Configuration settings
• Authentication settings (e.g. if using nginx with LDAP)
• "AuthenticationMethod": "proxy",
• "HTTPAuthUser": ”user1",
• "HTTPAuthPassword": ”pass1",
• "AuthUserHeader": ”SomeHeader",
• “PowerAuthUsers": [ "api-user1", "api-user2", ”realuser1" ]
• PowerAuthGroups": [ ”special_sysadmins”, “dbas” ],
45
* Does not scale well
50. Way Forward
• Simplify configuration and entry to orchestrator
• Shlomi is doing a very good job with sqlite and raft setups
• Configuration could be simpler and more automatic for most people
• Need to standardise orchestrator setups more?
• Extend functionality to cover more of the MySQL eco-system
• AWS and other cloud systems
• Group Replication or Galera
• Multi-source
49
52. Way Forward
• Reduce recovery time
• Speeding up detection to recovery time would be good as reduces downtime
• Should be possible to react to failure event (knowing state of other servers)
immediately
• state currently stored in backend db
• analysis and detection phase happens independently of server polling
• With reduced default poll time of 5 seconds recovery is likely to be triggered
within 10 seconds
• not critical for most people?
51