Right Money Management App For Your Financial Goals
Overview of Zookeeper, Helix and Kafka (Oakjug)
1. @crichardson
Distributed system goodies:
Zookeeper, Helix and Kafka
Chris Richardson
Author of POJOs in Action
Founder of the original CloudFoundry.com
@crichardson
chris@chrisrichardson.net
http://plainoldobjects.com
http://microservices.io
7. @crichardson
Apache ZooKeeper is an open source distributed
configuration service, synchronization service,
and naming registry for large distributed systems
https://zookeeper.apache.org/
8. @crichardson
Distributed system use
cases…
Name service
lookup by name,
e.g. service discovery: name => [host, port]*
Group membership
E.g. distributed cache
Cluster members need to talk amongst themselves
Clients need to discover the group members
9. @crichardson
…Use cases
Leader election
N servers, one of which needs to be the master
e.g. master/slave replication
Distributed locking and latches
e.g. cluster wide singleton
Queues
…
11. @crichardson
Zookeeper clients
Languages:
Ships with Java, C, Perl, and Python
Community: Scala, NodeJS, Go, Lua, …
Client connects to one of a list of servers
Client establishes a session
Survives TCP disconnects
Client-specified session timeout
https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZKClientBindings
12. Zookeeper data model
Hierarchical tree of named
znodes
Znodes have binary data
and children
Znodes can be ephemeral
- live for as long as the
client session
Clients can watch a node
- get notified of changes
13. @crichardson
Zookeeper operations
create(path, data, mode)
Persistence or ephemeral?
Sequential: append parent’s counter value to name?
delete(path)
exists(path)
readData(path, watch?) : Object
writeData(path, data)
getChildren(path, watch?) : List[String]
15. @crichardson
Using the zkCli
$ bin/zkCli.sh -server $DOCKER_HOST_IP
[zk] create /cer x
Created /cer
[zk] create /cer/foo y
Created /cer/foo
[zk] get /cer/foo watch
y
[zk] set /cer/foo z
set /cer/foo z
WatchedEvent state:SyncConnected
type:NodeDataChanged path:/cer/foo
16. @crichardson
Creating an ephemeral
sequential node
[zk] create -s -e /cer/baz aa
Created /cer/baz0000000001]
[zk] ls /cer watch
ls /cer watch
[baz0000000001, foo]
[Zk] exit
WatchedEvent
state:SyncConnected
type:NodeChildrenChanged
path:/cer
[zk] ls /cer watch
ls /cer watch
[foo]
18. @crichardson
Apache Curator
Open source library developed by Netflix
Simplifies connection management
Simplifies error handling
Implements recipes
Three projects: client, framework, and recipes
http://techblog.netflix.com/2011/11/introducing-curator-netflix-zookeeper.html
19. @crichardson
Netflix Exhibitor
Supervisory process for managing a Zookeeper instance
Watches a ZK instance and makes sure it is running
Performs periodic backups
Perform periodic cleaning of ZK log directory
A GUI explorer for viewing ZK nodes
A rich REST API
https://github.com/Netflix/exhibitor/wiki
22. @crichardson
Typical distributed systems
Partitioning - e.g. use a PK (or other attribute) to choose
server
Replication - for availability
State machines, e.g. master/slave replication
One replica is the master
Other replica is the slave
23. @crichardson
Use cases - master/slave
replication
MySQL master/slave replication or MongoDB replica sets
N machines
1 master, N slaves
If the master dies then elect a new master
24. @crichardson
Use cases - Cassandra
Cluster consists of N nodes
Data consists of M partitions (aka vnodes)
Each partition has R replicas
Client can read/write any replica - no master/slave concept
Dynamic assignment of M*R partition replicas to N nodes
25. @crichardson
Use case - abstractly
Cluster:
Set of N nodes (machines)
One or more resources
A resource is
partitioned and replicated
Resource has a state machine
e.g. offline/online, master/slave
State machine has constraints: 1 master replica, other replicas are slaves
Helix
dynamically assigns partitions to nodes
Manages state transitions and notifies nodes
29. @crichardson
Helix cluster setup
val admin = new ZKHelixAdmin(ZK_ADDRESS)
admin.addStateModelDef(clusterName, STATE_MODEL_NAME,
new StateModelDefinition(StateModelConfigGenerator.generateConfigForLeaderStandby()));
admin.addResource(clusterName, RESOURCE_NAME, NUM_PARTITIONS,
STATE_MODEL_NAME, "AUTO")
HelixControllerMain.startHelixController(ZK_ADDRESS, clusterName,
nodeInfo.nodeId.id,
HelixControllerMain.STANDALONE)
30. @crichardson
Adding an instance to the
cluster
val ic = new InstanceConfig(nodeInfo.nodeId.id)
ic.setHostName(nodeInfo.host)
ic.setPort("" + nodeInfo.port)
ic.setInstanceEnabled(true)
admin.addInstance(clusterName, ic)
admin.rebalance(clusterName, RESOURCE_NAME, NUM_REPLICAS)
Assign to newly
added nodes
31. @crichardson
Helix - connecting to the
cluster
manager =
HelixManagerFactory.getZKHelixManager(clusterName, instanceName,
InstanceType.PARTICIPANT, ZK_ADDRESS)
val stateModelFactory = new MyStateModelFactory
val stateMach = manager.getStateMachineEngine
stateMach.registerStateModelFactory(STATE_MODEL_NAME, stateModelFactory)
manager.connect()
Connect as a
participant
Supply factory to create
callbacks for state transitions
32. @crichardson
State transition callbacks
class MyStateModel(partitionName: String) extends StateModel {
def onBecomeStandbyFromOffline(message: Message, context: NotificationContext) {
…
}
def onBecomeLeaderFromStandby(message: Message, context: NotificationContext) {
…
}
…
}
class MyStateModelFactory extends StateModelFactory[StateModel] {
def createNewStateModel(partitionName: String) =
new MyStateModel(partitionName)
} <resourceName>_<partitionNumber>
invoked by Helix
33. @crichardson
More about Helix
Spectators
Non-participants - don’t have resources/partitioned assigned to them
Get notified of changes to cluster
Property store
Write through cache of properties in Zookeeper
Messaging
Intra-cluster communication
…
37. Kakfa concepts - topic
Clients publish messages to
a topic
A topics has a name
A topic is a partitioned log
Topics live on disk
Messages have an offset
within partition
Messages are kept for a
retention period
38. @crichardson
Kafka is clustered
Kafka cluster consists of N machines
Each topic partition has R replicas
1 machine is the leader (think master) for the topic partition
Clients publish/consume to/from leader
R - 1 machines are followers (think slaves)
Followers consume messages from the leader
Messages are committed when all replicas have written to the log
Producers can optionally wait for a message to be committed
Consumers only ever see committed messages
39. @crichardson
Kafka producers
Publish message to a topic
Message = (key, body)
Hash of key determines topic partition
Carefully choose key to preserve ordering, e.g. stock ticker
symbol => all prices for same symbol end up in same
partition
Makes request to topic partition’s leader
40. @crichardson
Kafka consumer
Consumes the messages from the partitions of one or more
topics
Makes a fetch request to a topic partition’s leader
specifies the partition offset in each request
gets back a chunk of messages
Scale by having N topic partitions, N consumers
41. @crichardson
Kafka consumers - between a
rock and a hard place
Simple Kafka consumer
Very flexible
BUT you are responsible for contacting leaders for each
topics’ partition, storing offsets
High level consumer
Does a lot: stores offsets in Zookeeper, deals with leaders, ….
BUT it assumes that if you read a message it has been
processed
More flexible consumer is on the way
43. @crichardson
Kafka at LinkedIn
1100 Kafka brokers organized into more than 60 clusters.
Writes:
Over 800 billion messages per day
Over 175 terabytes of data
Over 650 terabytes of messages are consumed daily
Peak
13 million messages per second
2.75 gigabytes of data per second
https://engineering.linkedin.com/kafka/running-kafka-scale