Reinforcement Learning with History Lists
Stephan Timmer and Martin Riedmiller
| What | Talk |
|---|---|
| When |
2008-06-30 14:40
2008-06-30 15:05
2008-06-30 from 14:40 to 15:05 |
| Add event to calendar |
|
To represent an optimal policy for a partially observable Markov decision process (POMDP), it is necessary to use some form of memory. Perfect memory is provided by the belief space, the space of probability distributions over states. Unfortunately, computing policies defined on the belief space requires a model and is expensive in terms of computation time. In this article, we will present a model-free algorithm for solving deterministic POMDPs by using memory based on history lists. In contrast to belief states, history lists do not allow to compute optimal policies, but are far more practical and make the learning process much more efficient. We show that by using abstract state spaces, our method also applies for MDPs with continuous state spaces.




