This talk covers why Apache Zookeeper is a good fit for coordinating processes in a distributed environment, prior Python attempts at a client and the current state of the art Python client library, how unifying development efforts to merge several Python client libraries has paid off, features available to Python processes, and how to gracefully handle failures in a set of distributed processes.
14. • Last for duration of client session
• Session dies when connection is closed or expires
• Can’t have children znodes
EPHEMERAL ZNODES
15. SEQUENTIAL ZNODES
• Supply a node name (or not), get node name back with a trailing sequence
number (0001, 0002, 0003, etc.)
• Can be combined with ephemeral flag
30. BASIC STEPS
• Create lock parent node if needed
• Create ephemeral+sequence node under parent, store node name
returned
• Get children of lock node
• Sort children list by sequence number
• First child in the list has the lock!
31. THINGS TO WATCH OUT FOR
• Avoid the thundering herd, use watches only when needed
• When our node isn’t the lowest, watch the one in front of us
• Only one client wanting a lock is ‘woken’ when the lock is released by a
different client
33. ROBUST CODE TAKES EFFORT
• What happens when a server fails?
• What happens when the client fails?
• What happens when we don’t know if the server has failed?
37. FAILURE WILL HAPPEN
• Fail fast, fail completely.
• Session expiration is a good time to sys.exit
• Always include jitter (kazoo includes jitter on its connection and command
retry operations)
• Consider what exceptions can occur in any code relying on a distributed
system
38. • Distributed systems are hard
• Use existing battle-proven tools (Zookeeper, Kazoo)
• Always consider everything that can fail, and how
• Be wary of tools that don’t tell you how they fail
• Read Kyle Kingsbury’s Jepsen posts to see examples of
systems failing: http://aphyr.com/tags/jepsen