Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Particle Filter on Episode


Published on

My presentation in IAS-14 (July 6th, 2016)

Published in: Technology
  • Login to see the comments

  • Be the first to like this

Particle Filter on Episode

  1. 1. Particle Filter on Episode for Learning Decision Making Rule Ryuichi Ueda Chiba Inst. of Technology Kotaro Mizuta RIKEN BSI Hiroshi Yamakawa DOWANGO Hiroyuki Okada Tamagawa Univ.
  2. 2. navigation problems in the real world • Not only robots, but also animals solve them. • Mammals have specialized cells for spatial recognition in their brain. – especially around the hippocampus – ex. place cells • They show different reaction at each place of environment. • -> existence of maps in the brain July 6th, 2016 IAS-14 Shanghai 2 Place cells [O'Keefe71] ( wiki/Place_cell)
  3. 3. map vs. memory • Mammals have maps in their brains. • Maps of environments are of concern also in robotics. – SLAM has been one of the most important topic. – studies introducing the function of the hippocampus • RatSLAM [Milford08] • How about memory? – Memory is also handled in the hippocampus. – Sequence of memory is reduced to maps (or state space models). – Robots can record its memory for long time if they has TB level storages. (difference between mammals and robots) July 6th, 2016 IAS-14 Shanghai 3
  4. 4. the purpose • our intuition – If memory is the source of maps, robots will be able to decide its action not from a map but directly from memory. – Some knowledge about handling of memory in the hippocampus and its surroundings will help this attempt. • to implement a learning algorithm that directly utilizes memory – particle filter on episode (PFoE) – validation with an actual robot July 6th, 2016 IAS-14 Shanghai 4
  5. 5. related works • Episode-based reinforcement learning [Unemi 1999] – Its base idea is identical with PFoE. – PFoE simplifies implementation and enables real-time calculation. • RatSLAM [Milford08] – an algorithm for robotics utilizing the knowledge around the hippocampus July 6th, 2016 IAS-14 Shanghai 5
  6. 6. outline of PFoE • In repetitions of a task for learning, a robot stores events. – an event = a set of sensor readings, actions, and rewards given by someone obtained at a discrete time step – the episode: the sequence of the events • The degree of recall of each event is represented as a probability. July 6th, 2016 IAS-14 Shanghai 6 time axis states episode rewards belief s s s s s s s present time 1 -1 a a a a a a a actions past current
  7. 7. decision with the belief and the episode • An action is chosen by calculation of expectation values. July 6th, 2016 IAS-14 Shanghai 7 time axis states episode rewards belief s s s s s s s present time ? 1 -1 a a a a a a a actions When the robot recalls these events, it may obtain +1 reward if it chooses the action as those time. When the robot recalls these events, it should change its action to avoid -1 reward.
  8. 8. representation with particles • The belief is represented with particles. – O(N) even if the episode has infinite length • variables of a particle – its position on the time axis – its weight July 6th, 2016 IAS-14 Shanghai 8 time axis belief present time a particle
  9. 9. operation of PFoE – motion update • When the current time goes to the next time step, particles simply shift to their next time steps. – The episode is extended by an additional event. – Positions of particles are shifted. July 6th, 2016 IAS-14 Shanghai 9 before an action time axis belief after the action time axis belief addition of the event
  10. 10. operation of PFoE – sensor update • The event related to each particle is compared to the last one. – Weights are reduced responding to the difference. • resampled and normalized after reduction of weights • When the sum of weights before normalization is under a threshold, all particles are replaced (a reset). – how to reset? July 6th, 2016 IAS-14 Shanghai 10 time axis belief difference of sensor readings, the reward, or the action e e e e e e compare events
  11. 11. operation of PFoE – retrospective resets • inspired by the retrospective activity of place cells – When a rat recalls past events, place cells become active as if the rat virtually moves. • algorithm – 1. place particles randomly – 2. replay the motion update and the sensor update for M steps with the past M events from the current time July 6th, 2016 IAS-14 Shanghai 11 time axis belief currentM step before ... moved and compared e e e
  12. 12. experiments • the robot: a micromouse that has 4 range sensors • T-maze that has a reward at one of its arms. • The robot chooses a turn right action or a turn left action at the T-junction. • State transition is simplified to cycles of 4 events. – The robot records an event when • it is placed on the initial position • it reaches the T-junction • it turns right or left • it goes to an end of the arm July 6th, 2016 IAS-14 Shanghai 12 direction of sensors a marker of reward
  13. 13. tasks of experiments • a periodical task – The reward is put right or left alternately. – cycles of 8 events • a discrimination task – The reward is put the side where the robot is placed at first. – Right or left is chosen randomly. • not periodical • 1000 particles • 50 trials in an episode x 5 sets July 6th, 2016 IAS-14 Shanghai 13
  14. 14. periodical task with/without the retro. reset • Retrospective resets reallocate particles effectively. July 6th, 2016 IAS-14 Shanghai 14 with random reset with the reset
  15. 15. discrimination task • comparison of thresholds for retro. resets • A higher threshold gives signs of learning. – Particles are replaced frequently and go over the cyclic state transition. – But it is not perfect. July 6th, 2016 IAS-14 Shanghai 15 0.2 (not frequent) 0.5 (frequent)
  16. 16. conclusion • Particle Filter on Episode (PFoE) – estimates the relation between current and past, – has an ability of real-time learning, and – does not require an environmental model except for the Bayes model on the sensor update. • experimental results – It works on the actual robot. – The simple periodical task can be learned within 20 trials. – The discrimination task can be partially learned (75% success). • It seems that the idea of the retrospective resetting should be extended for non-periodical tasks. (future work) July 6th, 2016 IAS-14 Shanghai 16
  17. 17. periodical task again with different threshold • to check ill effects of the high threshold for retrospective resettings in the periodical task • result: no ill effects can be seen July 6th, 2016 IAS-14 Shanghai 17 0.2 0.5