A framework for solving time-delayed Markov Decision Processes
ORAL
Abstract
Reinforcement learning has revolutionized our understanding of evolved systems and our ability to engineer systems based on a theoretical framework for understanding how to maximize expected reward. However, time delays between the observation and action are estimated to be roughly ~150 ms for humans, and this should affect reinforcement learning algorithms. We reformulate the Markov Decision Process framework to include time delays in action, first deriving a new Bellman equation in a way that unifies previous attempts and then implementing the corresponding SARSA-like algorithm. The main ramification-- potentially useful for both evolved and engineered systems-- is that, when the size of the state space is lower than that of the action space, the modified reinforcement learning algorithms will prefer to operate on sequences of states rather than just the present state with the length of the sequence equal to one plus the time delay.
–
Presenters
-
Sarah Marzen
Scripps, Pitzer & CMC
Authors
-
Sarah Marzen
Scripps, Pitzer & CMC
-
Yorgo Sawaya
Temple University
-
George Issa
U.C. Davis