Successfully reported this slideshow.
Upcoming SlideShare
×

# Distributed Consensus A.K.A. "What do we eat for lunch?"

Distributed Consensus is everywhere! Even if not obvious at first, most apps nowadays are distributed systems, and these sometimes have to "agree on a value", this is where consensus algorithms come in. In this session we'll look at the general problem and solve a few example cases using the RAFT algorithm implemented using Akka's Actor and Cluster modules.

• Full Name
Comment goes here.

Are you sure you want to Yes No

### Distributed Consensus A.K.A. "What do we eat for lunch?"

1. 1. Konrad 'ktoso' Malawski Distributed Consensus “What do we eat for lunch?” GeeCON 2014 @ Kraków, PL A.K.A. Konrad `@ktosopl` Malawski
2. 2. Konrad 'ktoso' Malawski Distributed Consensus GeeCON 2014 @ Kraków, PL A.K.A. “What do we eat for lunch?” real world edition Konrad `@ktosopl` Malawski
3. 3. hAkker @ Konrad `@ktosopl` Malawski
4. 4. hAkker @ Konrad `@ktosopl` Malawski typesafe.com geecon.org Java.pl / KrakowScala.pl sckrk.com / meetup.com/Paper-Cup @ London GDGKrakow.pl meetup.com/Lambda-Lounge-Krakow
5. 5. You? Distributed systems?
6. 6. You? Distributed systems? ?
7. 7. You? Distributed systems? ? ?
8. 8. What is this talk about? The network. ! How to think about distributed systems. ! Some healthy madness. Code in slides covers only “simplest possible case”.
9. 9. Ordering[T] Slightly chronological. ! By no means is it “worst to best”.
10. 10. Consensus
11. 11. Consensus - informal “we all agree on something”
12. 12. Consensus - formal Termination Every correct process decides some value. ! Validity If all correct processes propose the same value v, then all correct processes decide v. ! Integrity If a correct process decides v, then v must have been proposed by some correct process. ! Agreement Every correct process must agree on the same value.
13. 13. Consensus
14. 14. Consensus
15. 15. Distributed Consensus
16. 16. Distributed Consensus What is a distributed system anyway?
17. 17. Distributed system definition A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable. — Leslie Lamport http://research.microsoft.com/en-us/um/people/lamport/pubs/distributed-system.txt
18. 18. Distributed system definition A system in which participants communicate asynchronously using messages. http://research.microsoft.com/en-us/um/people/lamport/pubs/distributed-system.txt
19. 19. Distributed Systems - failure detection
20. 20. Distributed Systems - failure detection
21. 21. Distributed Systems - failure detection Jim had quit CorpSoft a while ago, but no-one ever told Bob…
22. 22. Distributed Systems - failure detection
23. 23. Distributed Systems - failure detection Failure detection: • can only rely on external knowledge • but what if there’s no-one to tell you? • thus: must be in-some-way time based
24. 24. Two Generals Problem
25. 25. Two Generals Problem Yellow and Blue armies must attack Pink City. They must attack together, otherwise they’ll die in vain. Now they must agree on the exact time of the attack. ! They can only send messengers, which Pink may intercept and kill.
26. 26. Two Generals Problem
27. 27. Two Generals Problem - happy case I need to inform blue about my attack plan. I don’t know when yellow will attack…
28. 28. Two Generals Problem - happy case
29. 29. 1) Initial message not lost
30. 30. Two Generals Problem - happy case I don’t know if Blue will also attack at 13:37… I’ll wait until I hear back from him.
31. 31. Two Generals Problem - happy case I don’t know if Blue will also attack at 13:37… I’ll wait until I hear back from him. Why?
32. 32. 2) Message might have not reached blue
33. 33. Blue must confirm the reception of the command
34. 34. 1) Yellow is now sure, but Blue isn’t!
35. 35. 1) Yellow is now sure, but Blue isn’t! Why?
36. 36. 2) Blue’s confirmation might have been lost!
37. 37. This is exactly mirrors the initial situation!
38. 38. 2 Generals Problem Translated to Akka
39. 39. 2 Generals translated to Akka: Akka Actors implement the Actor Model: ! Actors: • communicate via messages • create other actors • change their behaviour on receiving a msg !
40. 40. 2 Generals translated to Akka: Akka Actors implement the Actor Model: ! Actors: • communicate via messages • create other actors • change their behaviour on receiving a msg ! Gains? Distribution / separation / modelling abstraction
41. 41. 2 Generals translated to Akka: case class AttackAt(when: Date) Presentation–sized–snippet = does not cover all cases
42. 42. 2 Generals translated to Akka: ! ! class General(general: Option[ActorRef]) extends Actor {! ! ! val WhenIWantToAttack: Date = ???! ! general foreach { _ ! AttackAt(WhenIWantToAttack) }! ! def receive = {! case AttackAt(when) =>! println(s”General \${otherGeneralName} attacks at \$when”)! ! ! ! println(s”I must confirm this!")! ! sender() ! AttackAt(when)! }! ! def otherGeneralName = ! ! ! ! if(self.path.name == “blue")!“yellow" else "blue"! }! Presentation–sized–snippet = does not cover all cases
43. 43. 2 Generals translated to Akka: ! ! class General(general: Option[ActorRef]) extends Actor {! ! ! val WhenIWantToAttack: Date = ???! ! general foreach { _ ! AttackAt(WhenIWantToAttack) }! ! def receive = {! case AttackAt(when) =>! println(s”General \${otherGeneralName} attacks at \$when”)! ! ! ! println(s”I must confirm this!")! ! sender() ! AttackAt(when)! }! ! def otherGeneralName = ! ! ! ! if(self.path.name == “blue")!“yellow" else "blue"! }! Presentation–sized–snippet = does not cover all cases
44. 44. 2 Generals translated to Akka: ! ! class General(general: Option[ActorRef]) extends Actor {! ! ! val WhenIWantToAttack: Date = ???! ! general foreach { _ ! AttackAt(WhenIWantToAttack) }! ! def receive = {! case AttackAt(when) =>! println(s”General \${otherGeneralName} attacks at \$when”)! ! ! ! println(s”I must confirm this!")! ! sender() ! AttackAt(when)! }! ! def otherGeneralName = ! ! ! ! if(self.path.name == “blue")!“yellow" else "blue"! }! Presentation–sized–snippet = does not cover all cases
45. 45. 2 Generals translated to Akka: ! ! class General(general: Option[ActorRef]) extends Actor {! ! ! val WhenIWantToAttack: Date = ???! ! general foreach { _ ! AttackAt(WhenIWantToAttack) }! ! def receive = {! case AttackAt(when) =>! println(s”General \${otherGeneralName} attacks at \$when”)! ! ! ! println(s”I must confirm this!")! ! sender() ! AttackAt(when)! }! ! def otherGeneralName = ! ! ! ! if (self.path.name == “blue")!"yellow" else "blue"! }! Presentation–sized–snippet = does not cover all cases
46. 46. 2 Generals translated to Akka: val system = ActorSystem("two-generals")! ! val blue = ! system.actorOf(Props(new General(general = None)), name = "blue")! ! val yellow = ! system.actorOf(Props(new General(Some(blue))), name = "yellow")! The blue general attacks at 13:37, I must confirm this!! The yellow general attacks at 13:37, I must confirm this!! The blue general attacks at 13:37, I must confirm this!! ... Presentation–sized–snippet = does not cover all cases
47. 47. 8 Fallacies of Distributed Computing
48. 48. 8 Fallacies of Distributed Computing 1. The network is reliable. 2. Latency is zero. 3. Bandwidth is infinite. 4. The network is secure. 5. Topology doesn’t change. 6. There is one administrator. 7. Transport cost is zero. 8. The network is homogeneous. Peter Deutsch “The Eight Fallacies of Distributed Computing” https://blogs.oracle.com/jag/resource/Fallacies.html
49. 49. Failure Models
50. 50. Failure models: Fail – Stop Fail – Recover Byzantine
51. 51. Failure models: Fail – Stop Fail – Recover Byzantine
52. 52. Failure models: Fail – Stop Fail – Recover Byzantine
53. 53. Failure models: Fail – Stop Fail – Recover Byzantine
54. 54. 2-phase commit
55. 55. 2PC - step 1: Propose value
56. 56. 2PC - step 1: Propose value
57. 57. 2PC - step 1: Promise to agree on write
58. 58. 2PC - step 2: Commit the write
59. 59. 2PC - step 1: Propose value, and die
60. 60. 2PC - step 1: Propose value to 1 node, and die
61. 61. 2PC: Prepare needs timeouts
62. 62. 2PC: Timeouts + recovery committer
63. 63. 2PC: Timeouts + recovery committer
64. 64. 2PC: Timeouts + recovery committer
65. 65. 2PC: Timeouts + recovery committer
66. 66. 2PC: Timeouts + recovery committer
67. 67. Still can’t tolerate if the “accepted value” Actor dies
68. 68. 2PC: Timeouts + recovery committer
69. 69. 2PC: Timeouts + recovery committer
70. 70. 2 Phase Commit translated to Akka
71. 71. 2PC translated to Akka case class Prepare(value: Any)! case object Commit! ! sealed class AcceptorStatus! case object Prepared extends AcceptorStatus! case object Conflict extends AcceptorStatus! ! Presentation–sized–snippet = does not cover all cases
72. 72. 2PC translated to Akka case class Prepare(value: Any)! case object Commit! ! sealed class AcceptorStatus! case object Prepared extends AcceptorStatus! case object Conflict extends AcceptorStatus! ! Presentation–sized–snippet = does not cover all cases
73. 73. 2PC translated to Akka class Proposer(acceptors: List[ActorRef]) extends Actor {! var transactionId = 0! var preparedAcceptors = 0! ! def receive = {! case value: String =>! transactionId += 1! acceptors foreach { _ ! Prepare(transactionId, value) }! ! case Prepared =>! preparedAcceptors += 1! ! if (preparedAcceptors == acceptors.size)! acceptors foreach { _ ! Commit }! ! case Conflict =>! ! ! ! ! ! context stop self! }! }! Presentation–sized–snippet = does not cover all cases
74. 74. 2PC translated to Akka class Proposer(acceptors: List[ActorRef]) extends Actor {! var transactionId = 0! var preparedAcceptors = 0! ! def receive = {! case value: String =>! transactionId += 1! acceptors foreach { _ ! Prepare(transactionId, value) }! ! case Prepared =>! preparedAcceptors += 1! ! if (preparedAcceptors == acceptors.size)! acceptors foreach { _ ! Commit }! ! case Conflict =>! ! ! ! ! ! context stop self! }! }! Presentation–sized–snippet = does not cover all cases
75. 75. 2PC translated to Akka class Proposer(acceptors: List[ActorRef]) extends Actor {! var transactionId = 0! var preparedAcceptors = 0! ! def receive = {! case value: String =>! transactionId += 1! acceptors foreach { _ ! Prepare(transactionId, value) }! ! case Prepared =>! preparedAcceptors += 1! ! if (preparedAcceptors == acceptors.size)! acceptors foreach { _ ! Commit }! ! case Conflict =>! ! ! ! ! ! context stop self! }! }! Presentation–sized–snippet = does not cover all cases
76. 76. 2PC with ResumeProposer in Akka case class Prepare(value: Any)! case object Commit! ! sealed class AcceptorStatus! case object Prepared extends AcceptorStatus! case object Conflict extends AcceptorStatus! case class Committed(value: Any) extends AcceptorStatus! Presentation–sized–snippet = does not cover all cases
77. 77. 2PC with ResumeProposer in Akka ! class ResumeProposer(! proposer: ActorRef, ! acceptors: List[ActorRef]) extends Actor {! ! context watch proposer! ! var anyAcceptorCommitted = false! ! def receive = {! case Terminated(`proposer`) =>! println("Proposer died! Try to finish the transaction...")! acceptors map { _ ! StatusPlz }! ! case _: AcceptorStatus =>! // impl of recovery here! }! } Presentation–sized–snippet = does not cover all cases
78. 78. 2PC with ResumeProposer in Akka Presentation–sized–snippet = does not cover all cases
79. 79. Quorum
80. 80. Quorum voting From the perspective of the Omnipotent Observer *
81. 81. Quorum voting From the perspective of the Omnipotent Observer * * does not exist in a running system
82. 82. Quorum voting
83. 83. Quorum voting
84. 84. Quorum voting
85. 85. Quorum voting
86. 86. Quorum voting
87. 87. Quorum voting
88. 88. Quorum voting – split votes
89. 89. Quorum voting – split votes
90. 90. Quorum voting – split votes
91. 91. Quorum voting – split votes
92. 92. Quorum voting – split votes
94. 94. Paxos
95. 95. Basic Paxos = “choose exactly one value”
96. 96. Paxos - photo by Luigi Piazzi
97. 97. Paxos: a high-level overview It’s the distributed systems algorithm
98. 98. Paxos: a high-level overview JavaZone had a full session on Paxos already today…
99. 99. A few Paxos whitepapers "Reaching Agreement in the Presence of Faults” – Lamport, 1980 … “FLP Impossibility Result” – Fisher et al, 1985 “The Part Time Parliament” – Lamport, 1998 … “Paxos made Simple” – Lamport, 2001 “Fast Paxos” – Lamport, 2005 … “Paxos made Live” – Chandra et al, 2007 … “Paxos made Moderately Complex” – Rennesse, 2011 ;-)
100. 100. Lamport’s “Replicated State Machine”
101. 101. Paxos: The cast
102. 102. Paxos: The cast
103. 103. Paxos: The cast
104. 104. Paxos: The cast
105. 105. Paxos: The cast
106. 106. Paxos: The cast
107. 107. ! Consensus time! Chose a value (raise your hand)
108. 108. Consensus time! Chose a value (raise your hand): v1 = Basic Paxos + Raft v2 = Just Raft
109. 109. Consensus time! Chose a value (raise your hand): v1 = Basic Paxos + Raft v2 = Just Raft
110. 110. Consensus time! Chose a value (raise your hand): v1 = Basic Paxos + Raft v2 = Just Raft
111. 111. Consensus time! Chose a value (raise your hand): v1 = Basic Paxos + Raft v2 = Just Raft (if enough time, Paxos)
112. 112. Basic Paxos simple example
113. 113. Paxos: Proposals ProposalNr must: • be greaterThan any prev proposalNr used by this Proposer • example: [roundNr|serverId]
114. 114. Paxos: 2 phases Phase 1: Prepare Phase 2: Accept
115. 115. Paxos, Prepare Phase n = nextSeqNr()
116. 116. Paxos, Prepare Phase acceptors ! Prepare(n, value)
117. 117. Paxos, Prepare Phase case Prepare(n, value) =>! if (n > minProposal) {! minProposal = n! accVal = value! }! ! sender() ! Accepted(minProposal, accVal)
118. 118. Paxos, Prepare Phase case Prepare(n, value) =>! if (n > minProposal) {! minProposal = n! accVal = value! }! ! sender() ! Accepted(minProposal, accVal)
119. 119. Paxos, Prepare Phase value = highestN(responses).accVal ! // replace my value, with accepted value!
120. 120. Paxos, Accept Phase acceptors ! Accept(n, value)
121. 121. Paxos, Accept Phase case Accept(n, value) =>! if (n >= minProposal) {! acceptedProposal = minProposal = n! acceptedValue = value! }! ! learners ! Learn(value)! sender() ! minProposal
122. 122. Paxos, Accept Phase
123. 123. Paxos, Accept Phase
124. 124. Paxos, Accept Phase if (acceptedN > n) restartPaxos()! else println(n + “ was chosen!”)
125. 125. Basic Paxos Basic Paxos, needs extensions for the “real world”. Additions: • “stable leader” • performance (basic = 2 * broadcast roundtrip) • ensure full replication • configuration changes
126. 126. Multi Paxos
127. 127. Multi Paxos “Basically everyone does it, but everyone does it differently.”
128. 128. Multi Paxos • Keeps the Leader • Clients find and talk to the Leader • Skips Phase 1, in stable state • 2 delays instead of 4, until learning a value
129. 129. Raft
130. 130. Raft – inspired by Paxos Paxos is great. Multi-Paxos is great, but no “common understanding”. ! ! Raft wants to be understandable and just as solid. "In search of an understandable consensus protocol" (2013)
131. 131. Raft – inspired by Paxos ! ! • Leader based • Less processes than Paxos • It’s goal is simplicity • “Basic” includes snapshotting / membership
132. 132. Raft - summarised on one page Diego Ongaro & John Ouserhout – In search of an understandable consensus protocol
133. 133. Raft
134. 134. Raft
135. 135. Raft - starting the cluster
136. 136. Raft - Election timeout
137. 137. Raft - 1st election
138. 138. Raft - 1st election
139. 139. Raft - Election Timeout
140. 140. Raft - Election Phase
141. 141. Raft
142. 142. Raft
143. 143. Raft
144. 144. Raft
145. 145. Raft
146. 146. Raft
147. 147. Raft
148. 148. Raft
149. 149. Raft
150. 150. Raft
151. 151. Raft – heartbeat = empty entries
152. 152. Raft – heartbeat = empty entries
153. 153. Akka–Raft ! (community project) (work in progress)
154. 154. Raft, reminder:
155. 155. Raft translated to Akka abstract class RaftActor ! ! extends Actor ! ! with FSM[RaftState, Metadata]
156. 156. Raft translated to Akka abstract class RaftActor ! ! extends Actor ! ! with FSM[RaftState, Metadata]
157. 157. Raft translated to Akka onTransition {! ! case Follower -> Candidate =>! self ! BeginElection! resetElectionDeadline()! ! // ...! }
158. 158. Raft translated to Akka onTransition {! ! case Follower -> Candidate =>! self ! BeginElection! resetElectionDeadline()! ! // ...! }
159. 159. Raft translated to Akka ! case Event(BeginElection, m: ElectionMeta) =>! log.info("Init election (among {} nodes) for {}”,! m.config.members.size, m.currentTerm)! ! val request = RequestVote(m.currentTerm, m.clusterSelf, replicatedLog.lastTerm, replicatedLog.lastIndex)! ! m.membersExceptSelf foreach { _ ! request }! ! val includingThisVote = m.incVote! stay() using includingThisVote.withVoteFor(m.currentTerm, m.clusterSelf)! }!
160. 160. Raft translated to Akka
161. 161. Raft Heartbeat using Akka sendHeartbeat(m)! log.info("Starting hearbeat, with interval: {}", heartbeatInterval)! setTimer(HeartbeatName, SendHeartbeat, heartInterval, repeat = true)! akka-raft is a work in progress community project – it may change a lot
162. 162. Raft Heartbeat using Akka sendHeartbeat(m)! log.info("Starting hearbeat, with interval: {}", heartbeatInterval)! setTimer(HeartbeatName, SendHeartbeat, heartInterval, repeat = true)! akka-raft is a work in progress community project – it may change a lot
163. 163. Raft Heartbeat using Akka sendHeartbeat(m)! log.info("Starting hearbeat, with interval: {}", heartbeatInterval)! setTimer(HeartbeatName, SendHeartbeat, heartInterval, repeat = true)! val leaderBehaviour = {! // ...! case Event(SendHeartbeat, m: LeaderMeta) =>! sendHeartbeat(m)! stay()! akka-raft is a work in progress community project – it may change a lot }
164. 164. Akka-Raft in User-Land //alpha!!! class WordConcatRaftActor extends RaftActor {! ! type Command = Cmnd! ! var words = Vector[String]()! ! /** Applied when command committed by Raft consensus */! def apply = {! case AppendWord(word) =>! words = words :+ word! word! ! case GetWords =>! log.info("Replying with {}", words.toList)! words.toList! }! }! akka-raft is a work in progress community project – it may change a lot
165. 165. FLP Impossibility
166. 166. FLP Impossibility Proof (19 Impossibility of Distributed Consensus with One Faulty Process 1985 by Fisher, Lynch, Paterson
167. 167. FLP Impossibility Result Impossibility of Distributed Consensus with One Faulty Process 1985 by Fisher, Lynch, Paterson
168. 168. FLP Impossibility Result Impossibility of Distributed Consensus with One Faulty Process 1985 by Fisher, Lynch, Paterson
169. 169. ktoso @ typesafe.com twitter: ktosopl github: ktoso blog: project13.pl team blog: letitcrash.com JavaZone @ Oslo 2014 ! ! Takk! Dzięki! Thanks! ありがとう！ akka.io
170. 170. Happy Byzantine Lunch-time! Konrad 'ktoso' Malawski GeeCON 2014 @ Kraków, PL