SlideShare a Scribd company logo
1 of 8
Download to read offline
Practical tutorial: part 1
CHAIN Hokudai Winter School 2021
8 & 9 January 2021
Wataru Toyokawa
Step-by-step walking through of the model
A two-armed risky bandit task
March, J. G. (1996). Learning to Be Risk Averse. Psychological Review, 103(2), 309-319.
Denrell, J. (2007).Adaptive learning and risk taking. Psychological review, 114(1), 177.
Hertwig, R., Barron, G.,Weber, E. U., & Erev, I. (2004). Decisions from experience and the effect of rare events in risky choice. Psychological science, 15(8), 534-539.
a
b
c d
Risky but higher payoff option
Safe but lower payoff option
Trials (decision horizon)
Consider a decision-maker facing a repeated choice between a safe (i.e. certain)
alternative 𝑠 and a risky (i.e. uncertain) alternative 𝑟 for over 𝑇 trials.
A decision-maker’s goal is to maximise his/her total payoff obtained over the trials.
Due to a limited time horizon, there is a trade-off between exploration and
exploitation.
Though this task setting might seem too artificial, the task captures the basic principle
underlying exploration-exploitation dilemma and decision-making from experiences,
which is related to various real life situations ranging from choosing better
restaurants, investing profitable stocks, and finding nicer mates, to developing new
technologies and innovations.
t = 1
Reinforcement learning model
(i.e. the baseline asocial learning model)
q values
Choice
Prob
125 0 10 10
t = 2
Update
t = 3
Update
t = 4
Update
Rescorla-Wagner Rule for
Value Updating
(1-α) × α ×
Q-values at
t = 2
Q-values at
t = 1
+ 125
Payoff at
t = 1
A decision-maker updates their value of choosing each of the two alternatives at time
t, following the Rescorla-Wagner rule.
α is a learning rate (i.e. step size parameter), manipulating a step size of belief-
updating.The larger α, the more weight is given to recent experience (i.e. myopic
learning).The Q-value for the unchosen option is unchanged.
Q
(decision values)
The ‘Softmax’ Choice Rule
Choice
probability‘softmax’
transformation
e Qi
P
k e Qk
<latexit sha1_base64="jD0IVOPlXP+U4dUGFpM7VCuSEhQ=">AAACE3icbVDLSsNAFJ34rPUVdelmsAjioiRV0GXRjcsW7AOaGCbTm3bo5MHMRCgh/+DGX3HjQhG3btz5N07bLGrrgQuHc+7l3nv8hDOpLOvHWFldW9/YLG2Vt3d29/bNg8O2jFNBoUVjHouuTyRwFkFLMcWhmwggoc+h449uJ37nEYRkcXSvxgm4IRlELGCUKC155rkTCEIzeMgcHxTBTY/leebINPRGeE4d5blnVqyqNQVeJnZBKqhAwzO/nX5M0xAiRTmRsmdbiXIzIhSjHPKyk0pICB2RAfQ0jUgI0s2mP+X4VCt9HMRCV6TwVJ2fyEgo5Tj0dWdI1FAuehPxP6+XquDazViUpAoiOlsUpByrGE8Cwn0mgCo+1oRQwfStmA6JDknpGMs6BHvx5WXSrlXti2qteVmp3xRxlNAxOkFnyEZXqI7uUAO1EEVP6AW9oXfj2Xg1PozPWeuKUcwcoT8wvn4BvD6esA==</latexit>
Then Q-values are translated into choice probabilities through a softmax (or multinoimal-
logistic) function. β is an inverse temperature, regulating how sensitive the choice probability is
to the value of the Q.As β decreases and approaches to 0, the choice probability approximates
to a random choice (i.e. highly explorative). Conversely, a large β makes choices almost
deterministic in favour of the option with highest Q value (i.e. highly exploitative).
A collective learning situation
Safe Risky
10
?
5
10
10
?
?
?
Time
Choice
Safe Risky
Round: 2/70
Make a next choice!
4 people
chose this
2 people
chose this
Let’s consider a collective learning situation under which multiple individuals play a
task simultaneously and obtain social information during the play.
A frequency-based social cue suggests how many people chose each slot in
the preceding round. The others’ payoff information is kept private.
Social learning model
Toyokawa et al. 2017; 2019;Aplin et al. 2017; McElreath et al. 2005; 2008; Deffner et al. 2020
Relying on
social information
σ
θ = Conformity exponent
Pi =
F✓
i
P
F✓
k
F1 F2
Reward based
reinforcement learning
1 - σ
Softmax choice based on the
reinforcement
Pi =
exp( Qi)
P
exp( Qi)<latexit sha1_base64="ZPWF1IOMhurpw07U9hAle85OVIQ=">AAACnHicSyrIySwuMTC4ycjEzMLKxs7BycXNw8vHLyAoFFacX1qUnBqanJ+TXxSRlFicmpOZlxpaklmSkxpRUJSamJuUkxqelO0Mkg8vSy0qzszPCympLEiNzU1Mz8tMy0xOLAEKxQuYB8RnKtgqxKQVJSZXx6RWFGjEJKWWJCoExldn1mrWVscUl+ZiEa+NF1A20DMAAwVMhiGUocwABQH5AssZYhhSGPIZkhlKGXIZUhnyGEqA7ByGRIZiIIxmMGQwYCgAisUyVAPFioCsTLB8KkMtAxdQbylQVSpQRSJQNBtIpgN50VDRPCAfZGYxWHcy0JYcIC4C6lRgUDW4arDS4LPBCYPVBi8N/uA0qxpsBsgtlUA6CaI3tSCev0si+DtBXblAuoQhA6ELr5tLGNIYLMBuzQS6vQAsAvJFMkR/WdX0z8FWQarVagaLDF4D3b/Q4KbBYaAP8sq+JC8NTA2azcAFjABD9ODGZIQZ6RkC2YEmyg5O0KjgYJBmUGLQAIa3OYMDgwdDAEMo0N65DIcZzjCcZZJjcmHyZvKFKGVihOoRZkABTGEASk2fOg==</latexit><latexit sha1_base64="ZPWF1IOMhurpw07U9hAle85OVIQ=">AAACnHicSyrIySwuMTC4ycjEzMLKxs7BycXNw8vHLyAoFFacX1qUnBqanJ+TXxSRlFicmpOZlxpaklmSkxpRUJSamJuUkxqelO0Mkg8vSy0qzszPCympLEiNzU1Mz8tMy0xOLAEKxQuYB8RnKtgqxKQVJSZXx6RWFGjEJKWWJCoExldn1mrWVscUl+ZiEa+NF1A20DMAAwVMhiGUocwABQH5AssZYhhSGPIZkhlKGXIZUhnyGEqA7ByGRIZiIIxmMGQwYCgAisUyVAPFioCsTLB8KkMtAxdQbylQVSpQRSJQNBtIpgN50VDRPCAfZGYxWHcy0JYcIC4C6lRgUDW4arDS4LPBCYPVBi8N/uA0qxpsBsgtlUA6CaI3tSCev0si+DtBXblAuoQhA6ELr5tLGNIYLMBuzQS6vQAsAvJFMkR/WdX0z8FWQarVagaLDF4D3b/Q4KbBYaAP8sq+JC8NTA2azcAFjABD9ODGZIQZ6RkC2YEmyg5O0KjgYJBmUGLQAIa3OYMDgwdDAEMo0N65DIcZzjCcZZJjcmHyZvKFKGVihOoRZkABTGEASk2fOg==</latexit><latexit sha1_base64="ZPWF1IOMhurpw07U9hAle85OVIQ=">AAACnHicSyrIySwuMTC4ycjEzMLKxs7BycXNw8vHLyAoFFacX1qUnBqanJ+TXxSRlFicmpOZlxpaklmSkxpRUJSamJuUkxqelO0Mkg8vSy0qzszPCympLEiNzU1Mz8tMy0xOLAEKxQuYB8RnKtgqxKQVJSZXx6RWFGjEJKWWJCoExldn1mrWVscUl+ZiEa+NF1A20DMAAwVMhiGUocwABQH5AssZYhhSGPIZkhlKGXIZUhnyGEqA7ByGRIZiIIxmMGQwYCgAisUyVAPFioCsTLB8KkMtAxdQbylQVSpQRSJQNBtIpgN50VDRPCAfZGYxWHcy0JYcIC4C6lRgUDW4arDS4LPBCYPVBi8N/uA0qxpsBsgtlUA6CaI3tSCev0si+DtBXblAuoQhA6ELr5tLGNIYLMBuzQS6vQAsAvJFMkR/WdX0z8FWQarVagaLDF4D3b/Q4KbBYaAP8sq+JC8NTA2azcAFjABD9ODGZIQZ6RkC2YEmyg5O0KjgYJBmUGLQAIa3OYMDgwdDAEMo0N65DIcZzjCcZZJjcmHyZvKFKGVihOoRZkABTGEASk2fOg==</latexit><latexit sha1_base64="ZPWF1IOMhurpw07U9hAle85OVIQ=">AAACnHicSyrIySwuMTC4ycjEzMLKxs7BycXNw8vHLyAoFFacX1qUnBqanJ+TXxSRlFicmpOZlxpaklmSkxpRUJSamJuUkxqelO0Mkg8vSy0qzszPCympLEiNzU1Mz8tMy0xOLAEKxQuYB8RnKtgqxKQVJSZXx6RWFGjEJKWWJCoExldn1mrWVscUl+ZiEa+NF1A20DMAAwVMhiGUocwABQH5AssZYhhSGPIZkhlKGXIZUhnyGEqA7ByGRIZiIIxmMGQwYCgAisUyVAPFioCsTLB8KkMtAxdQbylQVSpQRSJQNBtIpgN50VDRPCAfZGYxWHcy0JYcIC4C6lRgUDW4arDS4LPBCYPVBi8N/uA0qxpsBsgtlUA6CaI3tSCev0si+DtBXblAuoQhA6ELr5tLGNIYLMBuzQS6vQAsAvJFMkR/WdX0z8FWQarVagaLDF4D3b/Q4KbBYaAP8sq+JC8NTA2azcAFjABD9ODGZIQZ6RkC2YEmyg5O0KjgYJBmUGLQAIa3OYMDgwdDAEMo0N65DIcZzjCcZZJjcmHyZvKFKGVihOoRZkABTGEASk2fOg==</latexit>
Relying on private payoff-
based learning
# of other
individuals
Option 2Option 1
Choice_probability = (1 - σ) Asocial_choice + σ Social_influence

More Related Content

Similar to CHAIN WINTER SCHOOL 2021 - modelling tutorial 1

CMU Trecvid med13 nist
CMU Trecvid med13 nistCMU Trecvid med13 nist
CMU Trecvid med13 nistLu Jiang
 
Ties Adjusted Nonparametric Statististical Method For The Analysis Of Ordered...
Ties Adjusted Nonparametric Statististical Method For The Analysis Of Ordered...Ties Adjusted Nonparametric Statististical Method For The Analysis Of Ordered...
Ties Adjusted Nonparametric Statististical Method For The Analysis Of Ordered...inventionjournals
 
Ties Adjusted Nonparametric Statististical Method For The Analysis Of Ordered...
Ties Adjusted Nonparametric Statististical Method For The Analysis Of Ordered...Ties Adjusted Nonparametric Statististical Method For The Analysis Of Ordered...
Ties Adjusted Nonparametric Statististical Method For The Analysis Of Ordered...inventionjournals
 
QM-013-DOE Introduction
QM-013-DOE IntroductionQM-013-DOE Introduction
QM-013-DOE Introductionhandbook
 
Introduction to the Genetic Algorithm
Introduction to the Genetic AlgorithmIntroduction to the Genetic Algorithm
Introduction to the Genetic AlgorithmQiang Hao
 
EIPOMDP Poster (PDF)
EIPOMDP Poster (PDF)EIPOMDP Poster (PDF)
EIPOMDP Poster (PDF)Teddy Ni
 
Application of Genetic Algorithm and Particle Swarm Optimization in Software ...
Application of Genetic Algorithm and Particle Swarm Optimization in Software ...Application of Genetic Algorithm and Particle Swarm Optimization in Software ...
Application of Genetic Algorithm and Particle Swarm Optimization in Software ...IOSR Journals
 
Qm0021 statistical process control
Qm0021 statistical process controlQm0021 statistical process control
Qm0021 statistical process controlsmumbahelp
 
7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spssDr Nisha Arora
 
Sampling techniques and size
Sampling techniques and sizeSampling techniques and size
Sampling techniques and sizeDr. Keerti Jain
 
Principles of design of experiments (doe)20 5-2014
Principles of  design of experiments (doe)20 5-2014Principles of  design of experiments (doe)20 5-2014
Principles of design of experiments (doe)20 5-2014Awad Albalwi
 
Optimal Stopping Report Final
Optimal Stopping Report FinalOptimal Stopping Report Final
Optimal Stopping Report FinalWilliam Teng
 

Similar to CHAIN WINTER SCHOOL 2021 - modelling tutorial 1 (18)

CMU Trecvid med13 nist
CMU Trecvid med13 nistCMU Trecvid med13 nist
CMU Trecvid med13 nist
 
Ties Adjusted Nonparametric Statististical Method For The Analysis Of Ordered...
Ties Adjusted Nonparametric Statististical Method For The Analysis Of Ordered...Ties Adjusted Nonparametric Statististical Method For The Analysis Of Ordered...
Ties Adjusted Nonparametric Statististical Method For The Analysis Of Ordered...
 
Ties Adjusted Nonparametric Statististical Method For The Analysis Of Ordered...
Ties Adjusted Nonparametric Statististical Method For The Analysis Of Ordered...Ties Adjusted Nonparametric Statististical Method For The Analysis Of Ordered...
Ties Adjusted Nonparametric Statististical Method For The Analysis Of Ordered...
 
QM-013-DOE Introduction
QM-013-DOE IntroductionQM-013-DOE Introduction
QM-013-DOE Introduction
 
Probability
ProbabilityProbability
Probability
 
1582997627872.pdf
1582997627872.pdf1582997627872.pdf
1582997627872.pdf
 
Decision theory
Decision theoryDecision theory
Decision theory
 
Introduction to the Genetic Algorithm
Introduction to the Genetic AlgorithmIntroduction to the Genetic Algorithm
Introduction to the Genetic Algorithm
 
EIPOMDP Poster (PDF)
EIPOMDP Poster (PDF)EIPOMDP Poster (PDF)
EIPOMDP Poster (PDF)
 
RM 701 Genetic Algorithm and Fuzzy Logic lecture
RM 701 Genetic Algorithm and Fuzzy Logic lectureRM 701 Genetic Algorithm and Fuzzy Logic lecture
RM 701 Genetic Algorithm and Fuzzy Logic lecture
 
I2b2 2008
I2b2 2008I2b2 2008
I2b2 2008
 
M017127578
M017127578M017127578
M017127578
 
Application of Genetic Algorithm and Particle Swarm Optimization in Software ...
Application of Genetic Algorithm and Particle Swarm Optimization in Software ...Application of Genetic Algorithm and Particle Swarm Optimization in Software ...
Application of Genetic Algorithm and Particle Swarm Optimization in Software ...
 
Qm0021 statistical process control
Qm0021 statistical process controlQm0021 statistical process control
Qm0021 statistical process control
 
7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spss
 
Sampling techniques and size
Sampling techniques and sizeSampling techniques and size
Sampling techniques and size
 
Principles of design of experiments (doe)20 5-2014
Principles of  design of experiments (doe)20 5-2014Principles of  design of experiments (doe)20 5-2014
Principles of design of experiments (doe)20 5-2014
 
Optimal Stopping Report Final
Optimal Stopping Report FinalOptimal Stopping Report Final
Optimal Stopping Report Final
 

Recently uploaded

Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxSimeonChristian
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 

Recently uploaded (20)

Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 

CHAIN WINTER SCHOOL 2021 - modelling tutorial 1

  • 1. Practical tutorial: part 1 CHAIN Hokudai Winter School 2021 8 & 9 January 2021 Wataru Toyokawa
  • 3. A two-armed risky bandit task March, J. G. (1996). Learning to Be Risk Averse. Psychological Review, 103(2), 309-319. Denrell, J. (2007).Adaptive learning and risk taking. Psychological review, 114(1), 177. Hertwig, R., Barron, G.,Weber, E. U., & Erev, I. (2004). Decisions from experience and the effect of rare events in risky choice. Psychological science, 15(8), 534-539. a b c d Risky but higher payoff option Safe but lower payoff option Trials (decision horizon) Consider a decision-maker facing a repeated choice between a safe (i.e. certain) alternative 𝑠 and a risky (i.e. uncertain) alternative 𝑟 for over 𝑇 trials. A decision-maker’s goal is to maximise his/her total payoff obtained over the trials. Due to a limited time horizon, there is a trade-off between exploration and exploitation. Though this task setting might seem too artificial, the task captures the basic principle underlying exploration-exploitation dilemma and decision-making from experiences, which is related to various real life situations ranging from choosing better restaurants, investing profitable stocks, and finding nicer mates, to developing new technologies and innovations.
  • 4. t = 1 Reinforcement learning model (i.e. the baseline asocial learning model) q values Choice Prob 125 0 10 10 t = 2 Update t = 3 Update t = 4 Update
  • 5. Rescorla-Wagner Rule for Value Updating (1-α) × α × Q-values at t = 2 Q-values at t = 1 + 125 Payoff at t = 1 A decision-maker updates their value of choosing each of the two alternatives at time t, following the Rescorla-Wagner rule. α is a learning rate (i.e. step size parameter), manipulating a step size of belief- updating.The larger α, the more weight is given to recent experience (i.e. myopic learning).The Q-value for the unchosen option is unchanged.
  • 6. Q (decision values) The ‘Softmax’ Choice Rule Choice probability‘softmax’ transformation e Qi P k e Qk <latexit sha1_base64="jD0IVOPlXP+U4dUGFpM7VCuSEhQ=">AAACE3icbVDLSsNAFJ34rPUVdelmsAjioiRV0GXRjcsW7AOaGCbTm3bo5MHMRCgh/+DGX3HjQhG3btz5N07bLGrrgQuHc+7l3nv8hDOpLOvHWFldW9/YLG2Vt3d29/bNg8O2jFNBoUVjHouuTyRwFkFLMcWhmwggoc+h449uJ37nEYRkcXSvxgm4IRlELGCUKC155rkTCEIzeMgcHxTBTY/leebINPRGeE4d5blnVqyqNQVeJnZBKqhAwzO/nX5M0xAiRTmRsmdbiXIzIhSjHPKyk0pICB2RAfQ0jUgI0s2mP+X4VCt9HMRCV6TwVJ2fyEgo5Tj0dWdI1FAuehPxP6+XquDazViUpAoiOlsUpByrGE8Cwn0mgCo+1oRQwfStmA6JDknpGMs6BHvx5WXSrlXti2qteVmp3xRxlNAxOkFnyEZXqI7uUAO1EEVP6AW9oXfj2Xg1PozPWeuKUcwcoT8wvn4BvD6esA==</latexit> Then Q-values are translated into choice probabilities through a softmax (or multinoimal- logistic) function. β is an inverse temperature, regulating how sensitive the choice probability is to the value of the Q.As β decreases and approaches to 0, the choice probability approximates to a random choice (i.e. highly explorative). Conversely, a large β makes choices almost deterministic in favour of the option with highest Q value (i.e. highly exploitative).
  • 7. A collective learning situation Safe Risky 10 ? 5 10 10 ? ? ? Time Choice Safe Risky Round: 2/70 Make a next choice! 4 people chose this 2 people chose this Let’s consider a collective learning situation under which multiple individuals play a task simultaneously and obtain social information during the play. A frequency-based social cue suggests how many people chose each slot in the preceding round. The others’ payoff information is kept private.
  • 8. Social learning model Toyokawa et al. 2017; 2019;Aplin et al. 2017; McElreath et al. 2005; 2008; Deffner et al. 2020 Relying on social information σ θ = Conformity exponent Pi = F✓ i P F✓ k F1 F2 Reward based reinforcement learning 1 - σ Softmax choice based on the reinforcement Pi = exp( Qi) P exp( Qi)<latexit sha1_base64="ZPWF1IOMhurpw07U9hAle85OVIQ=">AAACnHicSyrIySwuMTC4ycjEzMLKxs7BycXNw8vHLyAoFFacX1qUnBqanJ+TXxSRlFicmpOZlxpaklmSkxpRUJSamJuUkxqelO0Mkg8vSy0qzszPCympLEiNzU1Mz8tMy0xOLAEKxQuYB8RnKtgqxKQVJSZXx6RWFGjEJKWWJCoExldn1mrWVscUl+ZiEa+NF1A20DMAAwVMhiGUocwABQH5AssZYhhSGPIZkhlKGXIZUhnyGEqA7ByGRIZiIIxmMGQwYCgAisUyVAPFioCsTLB8KkMtAxdQbylQVSpQRSJQNBtIpgN50VDRPCAfZGYxWHcy0JYcIC4C6lRgUDW4arDS4LPBCYPVBi8N/uA0qxpsBsgtlUA6CaI3tSCev0si+DtBXblAuoQhA6ELr5tLGNIYLMBuzQS6vQAsAvJFMkR/WdX0z8FWQarVagaLDF4D3b/Q4KbBYaAP8sq+JC8NTA2azcAFjABD9ODGZIQZ6RkC2YEmyg5O0KjgYJBmUGLQAIa3OYMDgwdDAEMo0N65DIcZzjCcZZJjcmHyZvKFKGVihOoRZkABTGEASk2fOg==</latexit><latexit sha1_base64="ZPWF1IOMhurpw07U9hAle85OVIQ=">AAACnHicSyrIySwuMTC4ycjEzMLKxs7BycXNw8vHLyAoFFacX1qUnBqanJ+TXxSRlFicmpOZlxpaklmSkxpRUJSamJuUkxqelO0Mkg8vSy0qzszPCympLEiNzU1Mz8tMy0xOLAEKxQuYB8RnKtgqxKQVJSZXx6RWFGjEJKWWJCoExldn1mrWVscUl+ZiEa+NF1A20DMAAwVMhiGUocwABQH5AssZYhhSGPIZkhlKGXIZUhnyGEqA7ByGRIZiIIxmMGQwYCgAisUyVAPFioCsTLB8KkMtAxdQbylQVSpQRSJQNBtIpgN50VDRPCAfZGYxWHcy0JYcIC4C6lRgUDW4arDS4LPBCYPVBi8N/uA0qxpsBsgtlUA6CaI3tSCev0si+DtBXblAuoQhA6ELr5tLGNIYLMBuzQS6vQAsAvJFMkR/WdX0z8FWQarVagaLDF4D3b/Q4KbBYaAP8sq+JC8NTA2azcAFjABD9ODGZIQZ6RkC2YEmyg5O0KjgYJBmUGLQAIa3OYMDgwdDAEMo0N65DIcZzjCcZZJjcmHyZvKFKGVihOoRZkABTGEASk2fOg==</latexit><latexit sha1_base64="ZPWF1IOMhurpw07U9hAle85OVIQ=">AAACnHicSyrIySwuMTC4ycjEzMLKxs7BycXNw8vHLyAoFFacX1qUnBqanJ+TXxSRlFicmpOZlxpaklmSkxpRUJSamJuUkxqelO0Mkg8vSy0qzszPCympLEiNzU1Mz8tMy0xOLAEKxQuYB8RnKtgqxKQVJSZXx6RWFGjEJKWWJCoExldn1mrWVscUl+ZiEa+NF1A20DMAAwVMhiGUocwABQH5AssZYhhSGPIZkhlKGXIZUhnyGEqA7ByGRIZiIIxmMGQwYCgAisUyVAPFioCsTLB8KkMtAxdQbylQVSpQRSJQNBtIpgN50VDRPCAfZGYxWHcy0JYcIC4C6lRgUDW4arDS4LPBCYPVBi8N/uA0qxpsBsgtlUA6CaI3tSCev0si+DtBXblAuoQhA6ELr5tLGNIYLMBuzQS6vQAsAvJFMkR/WdX0z8FWQarVagaLDF4D3b/Q4KbBYaAP8sq+JC8NTA2azcAFjABD9ODGZIQZ6RkC2YEmyg5O0KjgYJBmUGLQAIa3OYMDgwdDAEMo0N65DIcZzjCcZZJjcmHyZvKFKGVihOoRZkABTGEASk2fOg==</latexit><latexit sha1_base64="ZPWF1IOMhurpw07U9hAle85OVIQ=">AAACnHicSyrIySwuMTC4ycjEzMLKxs7BycXNw8vHLyAoFFacX1qUnBqanJ+TXxSRlFicmpOZlxpaklmSkxpRUJSamJuUkxqelO0Mkg8vSy0qzszPCympLEiNzU1Mz8tMy0xOLAEKxQuYB8RnKtgqxKQVJSZXx6RWFGjEJKWWJCoExldn1mrWVscUl+ZiEa+NF1A20DMAAwVMhiGUocwABQH5AssZYhhSGPIZkhlKGXIZUhnyGEqA7ByGRIZiIIxmMGQwYCgAisUyVAPFioCsTLB8KkMtAxdQbylQVSpQRSJQNBtIpgN50VDRPCAfZGYxWHcy0JYcIC4C6lRgUDW4arDS4LPBCYPVBi8N/uA0qxpsBsgtlUA6CaI3tSCev0si+DtBXblAuoQhA6ELr5tLGNIYLMBuzQS6vQAsAvJFMkR/WdX0z8FWQarVagaLDF4D3b/Q4KbBYaAP8sq+JC8NTA2azcAFjABD9ODGZIQZ6RkC2YEmyg5O0KjgYJBmUGLQAIa3OYMDgwdDAEMo0N65DIcZzjCcZZJjcmHyZvKFKGVihOoRZkABTGEASk2fOg==</latexit> Relying on private payoff- based learning # of other individuals Option 2Option 1 Choice_probability = (1 - σ) Asocial_choice + σ Social_influence