Predicting your adversary's behaviour is the holy grail of threat modeling. This talk will explore the problem of adversarial reasoning under uncertainty through the lens of game theory, the study of strategic decision-making among cooperating or conflicting agents. Starting with a thorough grounding in classical two-player games such as the Prisoner's Dilemma and the Stag Hunt, we will also consider the curious patterns that emerge in iterated, round-robin, and societal iterated games.
But as a tool for the real world, game theory seems to put the cart before the horse: how can you choose the proper strategy if you don't necessarily even know what game you're playing? For this, we turn to the relatively young field of probabilistic programming, which enables us to make powerful predictions about adversaries' strategies and behaviour based on observed data.
This talk is intended for a general audience; if you can compare two numbers and know which one is bigger than the other, you have all the mathematical foundations you need.
2. I hate boring problems
I especially hate solving tiny variations on the same
boring problem over and over again
The internet is full of the same boring problems over
and over again
Both in the cloud …
… and in the circus
Not my circus, not my monkeys
MOTIVATION
3. Information theory
Probability theory
Formal language theory (of course)
Control theory
First-order logic
Haskell
ALSO APPEARING IN THIS TALK
4. When an unknown agent acts, how do you react?
Observation of side effects
Signals the agent sends
Past interactions with others
Formal language theory
(if you’re a computer)
Systematic knowledge about the
structure of interactions and the
incentives involved in them
IT IS PITCH BLACK. YOU ARE LIKELY TO
BE EATEN BY A GRUE.
5. Everything You Actually Need to Know About
Classical Game Theory
in math …
… and psychology
Changing the Game
Extensive form and signaling games
Multiplayer and long-running games
Reasoning Under Uncertainty, Over Real Data
OUTLINE
7. Players
Information available at each decision point
Possible actions at each decision point
Payoffs for each outcome
Strategies (pure or mixed)
Or behaviour, in iterated or turn-taking games
Equilibria
Different kinds of games have different kinds of equilibria
WHAT’S IN A GAME?
8. a, b c, d
e, f g, h
A NORMAL FORM GAME
Cooperate
Defect
Cooperate Defect
9. Pure strategy: fully specified set of moves for every
situation
Mixed strategy: probability assigned to each possible
move, random path through game tree
Behaviour strategies: probabilities assigned at
information sets
STRATEGIES
10. PRISONER’S DILEMMA
-1, -1 -3, 0
0, -3 -2, -2
Cooperate
Defect
Cooperate Defect
d, e > a, b > g, h > c, f
11. MATCHING PENNIES
1, -1 -1, 1
-1, 1 1, -1
Heads
Tails
Heads Tails
a = d = f = g > b = c = e = h
12. DEADLOCK
1, 1 0, 3
3, 0 2, 2
Cooperate
Defect
Cooperate Defect
e > g > a > c and d > h > b > f
13. STAG HUNT
2, 2 0, 1
1, 0 1, 1
Stag
Hare
Stag Hare
a = b > d = e = g = h > c = f
14. CHICKEN
0, 0 -1, 1
1, -1 -10, -10
Swerve
Straight
Swerve Straight
e > a > c > g and d > b > f > h
16. BATTLE OF THE SEXES
3, 2 0, 0
0, 0 2, 3
Opera
Football
Opera Football
(a > g and h > b) > c = d = e = f
17. Games can be zero-sum or non-zero-sum
Games can be about conflict or cooperation
Actions are not inherently morally valenced
Payoffs determine type of game, strategy
WHAT HAVE WE SEEN SO FAR?
18. Cournot equilibrium: each actor’s output maximizes
its profit given the outputs of other actors
Nash equilibrium: each actor is making the best
decision they can, given what they know about each
other’s decisions
Subgame perfect equilibrium: eliminates non-
credible threats
Trembling hand equilibrium: considers the possibility
that a player might make an unintended move
EQUILIBRIUM
20. MIND GAMES
“As far as the theory of games is concerned,
the principle which emerges here is that any
social intercourse whatsoever has a biological
advantage over no intercourse at all.”
22. “Hands” or roles = players
Extensive form; players move in response to each
other
Advantages
Existential advantage: confirmation of existing beliefs
Internal psychological advantage: direct emotional payoff
External psychological advantage: avoiding a feared
situation
Internal social advantage: structure/position with respect
to other players
External social advantage: as above, wrt non-players
BERNE’S GAMES: STRUCTURE
23. Kick Me
Goal: Sympathy
Find someone to beat on you, then whine about it
“My misfortunes are better than yours”
Ain’t It Awful
Can be a pastime, but also manifests as a game
Player displays distress; payoff is sympathy and help
Why Don’t You – Yes, But
Player claims to want advice. Player doesn’t really want it.
Goal: Reassurance
BERNE’S GAMES: EXAMPLES
24. Now I’ve Got You, You Son Of A Bitch
Goal: Justification (or just money)
Three-handed version is the badger game
Roles
Victim
Aggressor
Confederate
Moves
Provocation → Accusation
Defence → Accusation
Defence → Punishment
THE BADGER GAME
25. “Schlemiel,” in Berne’s glossary
Moves:
Provocation → resentment
(repeat)
If B responds with anger, A appears justified in more
anger
If B keeps their cool, A still keeps pushing
TROLLING
26. Social media
Organic responses against predatory games
Predator Alert Tool
/r/TumblrInAction “known trolls” wiki
Those just happen to be ones I know about
A truly generic reputation system is probably a pipe dream
Wikipedia
eBay
But for these, we have to extend the basic
mathematical model.
OTHER MONKEY GAMEBOARDS
44. Strategies now depend on payoff matrix and history
Axelrod, 1981: how well do these strategies perform
against each other over time?
“Ecological” tournaments: players abandon bad strategies
Rapoport: if the only information you have is how
player X interacted with you last time, the best you
can do is Tit-for-Tat
TFT cannot score higher than its opponent
Axelrod: “Don’t be envious”
Against TFT, no one can do better than cooperate
Axelrod: “Don’t be too clever”
ITERATED GAMES
45. Nice: S is a nice strategy iff it will not defect on
someone who has not defected on it
Retaliatory: S is a retaliatory strategy iff it will
defect on someone who defects on it
Forgiving: S is a forgiving strategy iff it will stop
defecting on someone who stops defecting on it
PROPERTIES
46. Ord/Blair, 2002: what happens when strategies can
take into account all past interactions?
We can express strategies in convenient first-order
logic, as it turns out
Tit-for-Tat: D(c, r, p)
Tit-for-Two-Tats: D(c, r, p) ∧ D(c, r, b(p))
Grim: ∃t D(c, r, t)
Bully: ¬∃t D(c, r, t)
Spiteful-Bully: ¬∃t D(c, r, t) ∨ ∃s (D(c, r, s) ∧ D(c, r, b(s)) ∧
D(c, r, b(b(s))))
Vigilante: ¬∃j D(c, j, p)
Police: D(c, r, p) ∨ ∃j (D(c, j, p) ∧ ¬∃k(D(j, k, b(p)))
SOCIETAL ITERATED GAME THEORY
47. EVOLUTION IS A HARSH MISTRESS
Tit-for-Tat All-Cooperate Spiteful-Bully
49. In a society, niceness is more nuanced
Individually nice: will not defect on someone who has not
defected on it
Meta-individually nice: will not defect on individually nice
Communally nice: will not defect on someone who has not
defected at all
Meta-communally nice: will not defect on communally nice
Same applies to forgiveness and retaliation
Loyalty: will not defect on the same strategy as itself
NICENESS AND LOYALTY
50. Peacekeepers don’t always agree
Police will defect on Vigilantes and vice versa
Peacekeepers protect non-peacekeeping strategies
at their own expense
META-PEACEKEEPING
Police
All-Cooperate
Spiteful-Bully
Tit-for-Tat
54. Frequentist: probability is the long-term frequency of
events
Reasoning from absolute probabilities
What happens if an event only happens once?
Returns an estimate
Bayesian: probability is a measure of confidence that
an event will occur
Reasoning from relative probabilities
Returns a probability distribution over outcomes
Update beliefs (confidence) as new evidence arrives
TWO INTERPRETATIONS OF PROBABILITY
P(A|X) =
P X A P(A)
P(X)
55. Probability distribution function: assigns
probabilities to outcomes
Discrete: a finite set of values (enumeration)
Function also called a probability mass function
Poisson, binomial, Bernoulli, discrete uniform…
Continuous: arbitrary-precision values
Function also called a probability density function
Exponential, Gaussian (normal), chi-squared, continuous
uniform…
Mixed: both discrete and continuous
Narrower distribution = greater certainty
DISTRIBUTIONS
𝐸 𝑍 𝜆 = 𝜆 𝐸 𝑍 𝜆 =
1
𝜆
56. Game theory is great when you know the payoffs
What can you do if you don’t know the payoffs?
Or what the game tree looks like?
Well…
You usually have some educated guesses about who the
players are
You have some idea what your possible actions are, as
well as the other players’
You can look at past interactions and make inferences
Which of these can be random variables? All of them.
Deterministic: if all inputs are known, value is known
Stochastic: even if all inputs are known, still random
YOU DON’T KNOW WHAT YOU DON’T KNOW
57. Figure out what distribution to use
Figure out what parameter you need to estimate
Figure out a distribution for it, and any parameters
Observing data tells you what your priors are
Fixing values for stochastic variables
Markov Chain Monte Carlo: sampling the posterior
distribution thousands of times
DON’T WAIT — SIMULATE
58. Prerequisites:
A Markov chain with an equilibrium distribution
A function f proportional to the density of the distribution
you care about
Choose some initial set of values for all variables
(state, S)
Modify S according to Markov chain state transitions
If f(S’)/f(S) ≥ 1, S’ is more likely than S, so accept
Otherwise, accept S’ with probability f(S’)/f(S)
Repeat
CONVERGING ON EXPECTED VALUES
59. A GAME WITHOUT PAYOFFS
type Outcome = Measure (Bool, Bool)
type Trust = Double
type Strategy = Trust -> Bool -> Bool -> Measure Bool
tit :: Trust -> Bool -> Bool -> Measure Bool
tit me True _ = conditioned $ bern 0.9
tit me False _ = conditioned $ bern me
60. CHOOSING WHICH HOLE TO FILL IN
play :: Strategy -> Strategy ->
(Bool, Bool) -> (Trust, Trust) -> Outcome
play strat_a strat_b (last_a,last_b) (a,b) = do
a_action <- strat_a a last_b last_a
b_action <- strat_b b last_a last_b
return (a_action, b_action)
iterated_game :: Measure (Double, Double)
iterated_game = do
let a_initial = False
let b_initial = False
a <- unconditioned $ uniform 0 1
b <- unconditioned $ uniform 0 1
rounds <- replicateM 10 $ return (a, b)
foldM_ (play tit tit) (a_initial, b_initial) rounds
return (a, b)
61. LET’S PLAY A GAME
games = [Just (toDyn False), Just (toDyn False),
Just (toDyn False), Just (toDyn True),
Just (toDyn False), Just (toDyn False),
Just (toDyn False), Just (toDyn True),
Just (toDyn False), Just (toDyn True),
Just (toDyn False), Just (toDyn False),
Just (toDyn False), Just (toDyn True),
Just (toDyn False), Just (toDyn True),
Just (toDyn False), Just (toDyn True),
Just (toDyn False), Just (toDyn False)]
do
l <- mcmc iterated_game games
return [makeHistogram 30 (Data.Vector.fromList $ map fst
(take 5000 l)) "A's paranoia",
makeHistogram 30 (Data.Vector.fromList $ map snd
(take 5000 l)) "B's paranoia"]
65. LET’S PLAY ANOTHER GAME
iterated_game2 :: Measure (SChoice, SChoice)
iterated_game2 = do
let a_initial = False
let b_initial = False
a <- unconditioned $ uniform 0 1
b <- unconditioned $ uniform 0 1
na <- strat
let a_strat = chooseStrategy na
nb <- strat
let b_strat = chooseStrategy nb
rounds <- replicateM 10 $ return (a, b)
foldM_ (play a_strat b_strat) (a_initial, b_initial) rounds
return (na, nb)
do
l <- mcmc iterated_game2 games
return [makeDiscrete (map fst (take 1000 l)) "A strategy",
makeDiscrete (map snd (take 1000 l)) "B strategy"]
67. Probabilistic SIPD
Extensive form SIPD with signaling
And channels with decidable vs. heuristic recognisers
Coordination. Enough said.
System 1/System 2 conflict
Sentiment analysis → payoff data
Start small: the stroke is the smallest unit of interaction
Data where information about players is limited
IP flows
Anonymity networks
Signaling game about type: are two actors the same
person?
FUTURE WORK
This is mostly a talk about game theory, founded by John von Neumann and Oskar Morgenstern in 1944.
Game theory is part of econ, which is way more than just macro/micro “where money goes”
Weird that the study of decision-making is called “the dismal science,” though to be fair the more you look at the problem of allocating finite resources, the more hard truths you run up against about physics and human nature
Game theory provides a framework for refining our decision-making models as more information about data’s structure comes in
“the circus” = social media
I’m largely giving this talk because I’m tired of assholes being better at coordination than people who aren’t assholes.
Keith Alexander is consulting for $600K/month on the grounds of some kind of behaviour analysis secret sauce. So, other people are thinking about these problems too.
Keep the Shannon/Weaver model of communication in your head: two endpoints communicating over a possibly noisy channel of finite bandwidth, who have to serialize their messages to the channel and parse incoming messages off the channel. Both serialization and parsing can produce errors.
This isn’t really a langsec talk, but we’ll still be talking about boundaries of competence. In a signaling game, how much confidence you can have in the signal you received being the one that was transmitted depends on how reliably you can receive signals in the language of the channel – and how reliably the sender serializes them.
We won’t be getting all that deeply into feedback loops, but if you know how they work, keep them in mind.
I kinda lied about the only math you need being the ability to compare two numbers; it’ll help later in the talk if you can read first-order logic notation, but it’s not really necessary.
1: I.e., effects on the environment.
2: So important, they named a class of games after them.
3: The quality of your data is really important here.
4: Langsec won’t be making much of an appearance in this talk, but when all the agents are machines, it’s relevant. Who do you think is going to be driving all those automated exploit generators DARPA is soliciting? People? At first, maybe, but not for long. Drones are expensive and hard to build. More servers are not. And in any case, being able to tell where FLT matters and where it doesn’t is an important distinction. Decidable problems are priceless; for everything else there’s heuristics, and when those inevitably fail, there’s Mastercard.
5: Game theory is the framework we’ll be building up this knowledge around, but we’ll be pulling from all the fields I mentioned earlier.
The four elements at the top are all you need to define a game.
Strategies and equilibria are derived from the structure of the game you’re playing.
Behavior strategies and mixed strategies are functionally equivalent as long as the player has perfect recall. (Kuhn’s theorem) So behavior strategies are a bit more like how people act in real life.
First described in 1950 by Merrill Flood and Melvin Dresher
Four payoffs: Temptation, for screwing the other guy, Reward, for cooperating, Punishment, for defecting, and Sucker, for being defected on.
Because Reward > Punishment, mutual cooperation is better than mutual defection
Because Temptation > Reward and Punishment > Sucker, defection is the dominant strategy for both agents
It’s a dilemma because mutual cooperation is better than mutual defection, but at the *individual* level, defection is superior to cooperation.
Basically rock-paper-scissors but with only two options.
There is no pure strategy that is a best response here, since what you always want is to choose the opposite of what your opponent picked.
Here, the mutually beneficial outcome is also the dominant outcome: there is no conflict between self-interest and mutual benefit. Still, it’s an interesting basis for a signaling game, since there’s still some incentive to screw the other guy.
The classic social cooperation game, originally described by Jean-Jacques Rousseau.
Two pure-strategy equilibria: both cooperate or both defect. Cooperating is payoff dominant, defecting is risk dominant.
Chicken is more of an “anti-coordination game” – choosing the same action creates negative externalities, so you want to not coordinate
Proposed by John Maynard Smith and George Price in 1973 in Nature to describe conflict among animals over resources
V is the value of the contested resource, C is the cost of getting into a fight
Often considered as a signaling game – there’s a round of threatening each other before choosing their moves
Also known as “conflicting interest coordination”
One partner wants to go to the opera, the other wants to go to the ball game, but they’d both rather be together than go to different events. They forgot which one to go to, each knows that the other forgot, and they can’t communicate. Where should each go?
Two pure strategy equilibria: both opera or both football. But this is unfair, since one person consistently gets a higher payoff than the other.
One mixed strategy: go to your preferred event with 60% probability. But this is inefficient, because players miscoordinate 52% of the time, so the expected utility is 1.2, which is worse than if either person always goes to their non-preferred event.
Types of games overlap in various ways
Zero-sum: the gains/losses of all players balance out to zero. Matching Pennies is zero-sum; Prisoner’s Dilemma and Stag Hunt are non-zero sum.
All zero-sum games are competitive; non-zero-sum games can be competitive or noncompetitive
An action is just an action. There’s nothing inherently good or bad about choosing Heads or Tails in Matching Pennies; the morality of snitching in PD depends on your ethical framework around snitching, the morality of going off to hunt rabbits in Stag Hunt depends on whether you agreed to hunt a stag beforehand and how seriously you take keeping your word.
As we go on, we’ll look at more complicated games – ones that go on longer, have more players, where players have uncertain information about each other, and even ones where the game being played changes form as the game goes on.
Cournot equilibrium: Antoine Augustin Cournot, 1838. He was talking about businesses, e.g. factories, but it generalises.
Nash equilibrium: nobody can do better by changing their strategy. In the Prisoner’s Dilemma, this is clear: any player who wants to cooperate knows that the other guy can defect on him and screw him, so he’s better off defecting.
A subgame is a subset of the tree of a game. In subgame perfect equilibrium, all subgames have a Nash equilibrium. Start at the outcomes, work backward, removing branches that involve a player making a non-optimal move.
“Trembling hand” – i.e., you might miss and hit the big red button instead
Traditional game theory assumes that all agents are rational. But in the 1960s, Eric Berne looked at irrational games – the sorts of social games that people entice each other into for attention, sympathy, and other kinds of psychological payoffs, while hiding their true motives.
Berne drops the assumption that players are driven by the most rational angels of their nature, and looks at the payoffs of ulterior-motive social games as ways for players to satisfy unmet emotional needs. So in effect we’re now considering players to have two sets of preferences that impact their decision-making: one that the rational System 2 uses when making considered decisions, one that the prerational System 1 uses when making quick heuristic decisions.
Humans are social animals. We all have biological drives to interact with other members of our species to some extent or another – and when that drive is demanding to be satisfied, an argument can serve the same purpose as a productive discussion or even a hug, if what a person is fundamentally looking for is external recognition that they exist.
“Payoff” comes in the form of neurotransmitter activity. Berne didn’t go into that, and the imaging equipment we need to investigate this directly doesn’t exist yet, but we can black-box it (Skinner-box it?) with behaviorism: each player experiences some consequences from each interaction, as reinforcement or as punishment.
Positive reinforcement – a rewarding stimulus (a chocolate, a kiss, &c)
Negative reinforcement – removal of an aversive stimulus (eg when someone stops yelling at you)
Positive punishment – an aversive stimulus
Negative reinforcement – removal of a rewarding stimulus
Berne identified stimulus hunger, recognition hunger, and structure hunger. Status hunger is probably a combination of the latter two.
Procedure: a series of complementary transactions toward some physical end.
Operation: a set of transactions undertaken for a specific, stated purpose. If you ask explicitly for something, like reassurance or support, and you get it, that’s an operation.
Ritual: “a stereotyped series of simple complementary transactions programmed by external social forces”
Pastime: an iterated ritual, with state; can turn into status gaming (establishment of a “pecking order”)
People spend a *lot* of time on pastimes – that’s why they’re called that. Facebook is largely a pastime for most people. So is Twitter. When different clusters’ pastimes collide, you get fireworks because pastimes have a ritual quality (jargon, signaling certain beliefs, &c) and people don’t know what pre-existing state they’re walking into.
Game: “an ongoing series of complementary ulterior transactions progressing to a well-defined predictable outcome.” IOW, the initiator of the game has a goal in mind and isn’t being upfront about it. If you ask for reassurance and then turn that against the person, that’s a game.
Berne’s work is pretty heavily based in Freud; he’s got this parent/child/adult triad of “ego states”, and posits that people fall into authoritarian parent modes or contrarian child modes when they play power games with each other. It’s kind of a just-so story, so we’re not really going to get into it. But we will look at the roles that the context of various mind games establishes for the players.
Since games are a series of complementary ulterior transactions, that means there’s turn-taking. Each move is considered to be a stroke, i.e., something that affects the other player in some way.
Advantages ~ payoffs.
Existential advantage is that sense that events in the world are confirming your beliefs about how the world works, even if you manipulated the events to that end.
Emotional payoff here is analogous to positive reinforcement, external psychological advantage is analogous to negative reinforcement. If you win the game, you’re raising the likelihood that you’ll behave that way again, because you’ve reinforced the evidence that playing games works.
Internal and external social advantage are about status and limiting other players’ moves. If you signal as “oppressed”, people who prioritize oppression will limit what they do on your behalf.
“Ain’t It Awful” taken to the pathological extreme manifests as things like Munchausen syndrome or M-by-proxy
In “Why Don’t You – Yes But”, the initiator really wants reassurance that their problem is not their fault, but they get it manipulatively by challenging people to present solutions they can’t find fault with. Obviously they can nitpick anything to death.
“Courtroom” – pick a victim/scapegoat and pick them apart, most effectively in front of a “jury of their peers”
Introduce the idea of changing the game here – the mark thinks it’s one game (the one where if he wins he gets laid at the end), but what he doesn’t know is that he’s playing a different game (the one where if he wins he doesn’t get beaten up but does lose his wallet).
Can be played with just a victim and an aggressor, as long as the victim does something that the aggressor can construe as the victim screwing up in some way
Confederate lures the victim into provoking the aggressor.
Often about getting the target to embarrass themselves in some way – typically by overreacting and saying something they’ll regret later. (I’m doubtful as to whether the target ever does actually regret it later, but we’ll set that aside for now.)
Berne talks about there being an “apology->forgiveness” phase of the game, though trolls really aren’t in it for the forgiveness. So this might be better considered a modification.
Note that a troll’s actions revolve around sending signals to some receiver in an attempt to provoke an overreaction. Engaging is therefore a feedback loop providing the troll with more material to feed into its signal generation function. Proceed with caution.
And on that note, let’s take a closer look at the class of games that we can use to model interactions involving two-way communication: signaling games.
Get it out of your system now, because you’re going to hear “balls” more often than any other noun in the clips that follow. I counted.
This is the beginning of an extensive form game tree for this game.
The unfilled dot in the center is the root. It indicates who makes the first move – in this case player 1.
Traditionally the first move is made by “Nature” and is taken to be the type of the player – in a job interview, whether the candidate being interviewed is competent or incompetent; when you buy someone a drink, whether they’re interested in you or not interested in you; when you’re deciding whether to tell someone a secret, whether they’re trustworthy or untrustworthy.
But since player 1 has already decided whether he’s going to split or steal, he’s making the first move.
Similar to Prisoner’s Dilemma, except that if you decide to screw each other, you both get screwed just as badly as you would if you cooperated but the other guy defected. Being a sucker isn’t any worse for you – materially, at least – than betting you can screw the other guy and being wrong.
Poll the audience after this segment is over. What do they think Ibrahim will pick? What do they think Nick will pick?
Radiolab interviewed both these guys after the show. In the studio, the argument went on for 45 minutes and the audience was booing Nick over and over again. He stuck to his guns the whole time, so in uncompressed time, his signal was fairly unambiguous.
We don’t know whether Nick has actually chosen Split or Steal at this point. He’s signaled unambiguously that he plans to steal, which means that if Ibrahim decides his signal is credible, Ibrahim can only operate on the lower right quadrant of the graph.
At this point, Nick’s signal has changed the structure of the game they’re playing: it’s no longer Friend-or-Foe, it’s Ultimatum. <stuff about Ultimatum here> So the risk Nick is taking now is whether Ibrahim will decide that the ultimatum is so insulting that he should punish Nick by forcing them both to go home with nothing, or whether the promise of £6800 after the show is a credible enough incentive that he should cooperate.
Takeaway: extensive form helps you see how a game’s structure changes as branches of the decision tree are pruned away
Axelrod’s initial tournaments just played strategies against each other 200x and totaled up points at the end. In ecological (or evolutionary) tournaments, each strategy’s success in the previous round determines how prevalent it is in the current round – and cooperative strategies outcompeted non-cooperative ones.
It would be really great if players in the real world abandoned bad strategies as soon as they recognised the strategies weren’t working, but in practice people are actually pretty bad at recognising this. People are unusually invested in the strategies they choose. Confirmation bias, choice-supportive bias, &c.
Complex inferences just didn’t work very well – the inferences were usually wrong.
In Axelrod’s IPD, success – i.e., doing the best you can possibly do – requires a strategy that satisfies all these properties. Such strategies also outcompete strategies that don’t satisfy these properties.
But can we do better than an eye for an eye and a tooth for a tooth? Certainly in the real world there are plenty of people whose modus operandi is moving from victim to victim, opportunistically defecting whenever they think they can get away with it; and remember Berne’s games. Are there strategies that can incorporate other information to expose social predators?
c is the column player, r is the row player (ie you); p is the last round, b() is a predecessor function
TFT: “Defect on them if they defected on me last round.”
TFTT: “Defect on them if they defected on me last round and the round before.”
Grim: “Defect on them if they ever defected on me in the past.”
Bully: “Defect on them if they’ve *never* defected on me in the past.” Spiteful-Bully similar, but also defects if it’s been defected on 3x
Vigilante: “Defect on them if they defected on anyone else last round.”
Police: “Defect on them if they defected on me last round, or if last round they defected on someone who had just cooperated with everyone.”
Vigilante and Police are peacekeeping strategies: they ignore who someone defected on, only care that they did it
All individually nice strategies are communally nice, but not necessarily vice versa. All individually forgiving strategies are communally forgiving, and all communally retaliatory strategies are individually retaliatory.
Individually retaliatory: defects on someone who defects on it.
Communally retaliatory: defects on someone who defects on anyone.
Individually forgiving: stops defecting on someone who stops defecting on it
Communally forgiving: stops defecting on someone who stops defecting on everyone
TFT is loyal; if it plays another TFT, they’ll cooperate forever. Same for Police, but Vigilante is not loyal – Vigilantes will defect on other Vigilantes. TFT is individually nice, retaliatory and forgiving; Vigilante is communally nice, retaliatory and forgiving.
Absolutist: “Defect on c iff c has ever cooperated with someone when you defected, or vice versa.”
Absolutist is loyal: it doesn’t defect on other Absolutists of its own kind. Note that if you put two groups of Absolutists into a population, they’ll defect on each other.
It’s also unforgiving: it never stops defecting on someone once it’s started, like Grim.
Neither individually nice nor communally nice, since it will defect on All-C (cooperated in the past with a defector)
Really only works when there’s no noise in players’ information or actions
The frequentist perspective operates under the assumption that the long-term absolute probability of an event occurring can be known.
The Bayesian interpretation is a subjective one, depending entirely on the information available to the agent.
For a large enough number of samples – as evidence accumulates – the Bayesian and frequentist interpretations typically converge. But you don’t always have all that many samples to choose from.
Really big data problems can be solved by frequentist analysis. But for medium-sized data and really small data, Bayesian analysis performs much better.
A is the parameters, X is the evidence.
P(A): prior probability of A. A belief, i.e., a measure of confidence.
P(A|X): posterior probability of A, given X – the conditional probability of A, based on evidence X.
P(X|A): posterior probability of X, given A – the likelihood, or the probability of the evidence given the parameters.
(Avoiding the post hoc ergo propter hoc fallacy, statistically.)
P(X) decomposes to P(X|A)P(A) + P(X|~A)P(~A): the probability that X occurs whether A happens or not
Probability mass function: gives the probability that a discrete random variable has some particular value
Poisson is basically the bell curve for discrete outcomes; binomial gives the probability of an event occurring over N trials given probability p that it occurs in one trial; Bernoulli is binomial with one trial.
Expected value of Z in the Poisson distribution is equal to its parameter, lambda; in the exponential distribution, it’s equal to the inverse of the parameter.
Probability density function: gives the probability that a continuous random variable has some particular value; for a range, take the integral of the variable’s density over that range.
All that we see is Z. We have to estimate lambda, and that’s why Bayesian analysis is useful: it gives us useful tools for updating our beliefs about lambda even though we can’t see it.
Figuring out the right distribution to use with your data is important. There are a lot of them, useful in different situations, and that’s outside the scope of this talk.
We’re treating “input” here as anything that influences the value of a variable. Deterministic entails decidability.
So you’ve got some data! What are you going to do with it?
Questions to ask yourself when modeling:
What am I interested in?
What does it look like?
What influences it?
Data conditions the values of random variables: the conditional distribution of Y given X is the probability distribution of Y when X is known to be a particular value.
You can keep on assigning distributions to parameters as long as it’s useful, but if you don’t have any strong beliefs about a parameter, this is probably not useful. Pick an average value and let inference update it for you. Or you can also use a uniform distribution for it, and infer what its value is likely to be. It’s just another prior, after all.
Monte Carlo simulation: also discovered by John Von Neumann. In normal MC, variables are independent and identically distributed; sample and average. in MCMC, variables can condition each other, conditioning defines the chain. When you combine probabilities, you’re reducing the effective volume of your search space; MCMC helps you narrow the search to the areas where you’re likely to find values that satisfy the data and the conditions.
With this definition, the payoffs are completely hidden; all we assume is that the players consider some actions to be “cooperating” and others to be “defecting,” and that whether they consider an action to be cooperative or defecting is conditioned on how trusting they are. In this case, a higher value means “more paranoid.”
If the other player defects on them (the True case), then the probability distribution of this player defecting is a Bernoulli distribution with p = 0.9 – this parameter could have been a random variable as well, but for this toy example we’re fixing its value.
If the other player cooperates, then the probability that this player defects is also a Bernoulli distribution, with p = whatever the player’s paranoia is.
Here, a and b are a’s and b’s paranoia values; we don’t know what they are, we just know that they’re chosen uniformly from values between 0 and 1, inclusive.
When we sample hypothetical games with these players, each game will last 10 rounds. The actions sampled will converge on the strategy we defined on the last slide – defecting based on whether the other player defected the last round, conditioned by how paranoid this player is – and from the values we observe in the samples after Markov chain convergence (hopefully!), we can get a better estimate of how paranoid A and B are.
For Grim Trigger, the fact that we’ve defected on a previous round tells us that we should continue to defect on that person. Note that we’re not making this conditional on paranoia.
Probabilistic SIPD: How large of a sample do we actually need to infer a player’s strategy?
Inference about System 1 vs. System 2 influencing a player’s actions will require modeling the preferences and strategies of each system separately, and modeling how they interact