Virtual IPs or floating IPs have long been the workhorse mechanism for providing high-availability for database systems, however floating IP addresses have several limitations that make it problematic in modern data centers and cloud environments, notably that it requires all members be in the same Layer-2 domain. consul is a strongly consistent way of providing high-availability services in Layer-3 environments and provides fail-over across different geographic regions. In this talk we will discuss the benefits, setup, and use of consul for fail-over of PostgreSQL, both in a local data center scenario and a geographic redundancy scenario where databases are split across multiple data centers.
6. HASHICORP
Key Value Store
HTTP API
Host & Service
Level Health
Checks
Datacenter Aware
Consul solves four central challenges with SOA
Service
Discovery
HTTP + DNS
8. HASHICORP
Overview
1. Introduction to Consul
2. Review of Consul
a. Architecture
b. Agent Functionality
c. Agent Configuration
d. Features
3. Further Reading
12. HASHICORP
Glossary
Agent - Long-running daemon on every member of the Consul
cluster. The agent is able to run in either client or server mode.
Client - Agent that forwards all RPCs to a server and
participates in the LAN gossip pool.
Server - Agent that maintains cluster state, responds to RPC
queries, exchanges WAN gossip with other datacenters, and
forwards queries to leaders of remote datacenters.
Consensus - Agreement upon the elected leader
13. HASHICORP
Glossary
Gossip - Random node-to-node communication primarily over
UDP that provides membership, failure detection, and event
broadcast information to the cluster. Built on Serf. Consul has
both LAN and WAN Gossip.
Datacenter - Networking environment that is private, low latency,
and high bandwidth. A Consul cluster is run per datacenter, so its
important to have low latency for the gossip protocol.
14. HASHICORP
Consul vs. Other Software
- Opinionated framework for service discovery using DNS
or HTTP
- Scalable gossip system that links server nodes and clients
- Distributed health checking with edge triggered updates
- Globally aware with multi-datacenter support
- Operationally simple
- Incorporation into the HashiCorp ecosystem
23. HASHICORP
DNS Failover
• Works across L3 boundaries
in LAN environments
• Works across L3 boundaries
in WAN environments
• Small TTLs
• Workload Distribution
• Clients cache DNS data
• Not subject to spanning-tree
• Requires TCP connections
be reset on failover
• Clients can cache stale DNS
data
Pro Con
27. HASHICORP
Consul Server 3/3
% cat svc/run
#!/bin/sh --
set -e
exec 2>&1
exec
/usr/bin/env -i
./bin/consul agent
-config-file=./config.json
-config-dir=./conf.d/
% cat svc/log/run
#!/bin/sh —
set -e
set 2>&1
exec chpst -u _log:_log svlogd ./main
28. HASHICORP
Consul Cluster
% consul members
Node Address Status Type Build Protocol DC
vm1 172.16.139.140:8301 alive server 0.7.0dev 2 lab1
% consul join 172.16.139.139 172.16.139.138
Successfully joined cluster by contacting 2 nodes.
% consul members
Node Address Status Type Build Protocol DC
vm1 172.16.139.140:8301 alive server 0.7.0dev 2 lab1
vm2 172.16.139.138:8301 alive server 0.7.0dev 2 lab1
vm3 172.16.139.139:8301 alive server 0.7.0dev 2 lab1
29. HASHICORP
Consul Cluster
% consul info
agent:
check_monitors = 0
check_ttls = 0
checks = 0
services = 1
build:
prerelease = dev
revision = 'fa26d5f
version = 0.7.0
consul:
bootstrap = false
known_datacenters = 2
leader = false
leader_addr = 172.16.139.139:8300
server = true
[snip]
30. HASHICORP
Consul Cluster
% consul info
[snip]
raft:
applied_index = 103339
commit_index = 103339
fsm_pending = 0
last_contact = 82.95803ms
last_log_index = 103339
last_log_term = 50663
last_snapshot_index = 98437
last_snapshot_term = 2228
num_peers = 2
raft_peers =
172.16.139.139:8300,172.16.139.138:8300,172.16.139.140:8300
state = Follower
term = 50663
[snip]
43. Text Editor
HASHICORP
% cat conf.d/mem-check.json
{
"check": {
"id": "mem-util",
"name": "Memory utilization",
"script": "/usr/local/bin/mem_check.sh",
"interval": "10s"
}
}
Creating a check
Use a custom script
44. Text Editor
HASHICORP
% cat conf.d/http-check.json
{
"check": {
"id": "api",
"name": "HTTP API on port 4455",
"http": "http://localhost:4455/_health",
"interval": "10s",
"timeout": "1s"
}
}
Creating a check
Use a built-in check type
61. Use Case
• Multiple instances of a given service exist in
multiple datacenters
• Clients can talk to any of them, and always prefer
the instances with lowest latency
• Policies can change, desire to not have the clients
know the details of how to locate a healthy service
62. Prepared Queries
• New query namespace, similar to services
• Register queries to answer for parts of this
namespace
• Clients use APIs, or “.query.consul” DNS lookups
to run queries
• Magic happens :-)
65. Catch All Template
$ curl -X POST -d
'{
"Name": "",
"Template": {
"Type": "name_prefix_match"
},
"Service": {
"Service": "${name.full}",
"Failover": {
"NearestN": 3
}
}
}' localhost:8500/v1/query
*.query.consul
With a single query template, all
services can fail over to the nearest
healthy service in a different datacenter!
66. Under the Hood: Network Tomography
• Rides on pings that are part of LAN and WAN
gossip
• Models networking round trip time using simple
physics simulation with masses and springs
• Develops a set of “network coordinates” for round
trip time estimation with a simple calculation
69. HASHICORP
Key Value Store
HTTP API
Host & Service
Level Health
Checks
Datacenter Aware
Consul solves four central challenges with SOA
Service
Discovery
HTTP + DNS
70. HASHICORP
Further reading
- Consul vs. Other Software:
consul.io/intro/vs/index.html
- Consul Agent:
consul.io/docs/agent/basics.html
- Consul Commands:
consul.io/docs/commands/index.html
- Consul Internals:
consul.io/docs/internals/index.html