2. Summary
■ Todd Hester, MatejVecerik, Olivier Pietquin, Marc Lanctot,Tom Schaul, Bilal Piot, Dan
Horgan, John Quan, Andrew Sendonaris, Ian Osband, Gabriel Dulac-Arnold, John
Agapiou, Joel Z. Leibo, Audrunas Gruslys
– DeepMind
■ AAAI 2018
■ Contribution
– 少量のdemonstration dataを用いて強化学習のpolicyをpre-trainingする
– Demonstration dataを用いたDeep Q-Learningのアルゴリズム(DQfD)を提
案