Novel approaches and bounds for maximum entropy reinforcement learning using nonequilibrium statistical mechanics
ORAL
Abstract
Reinforcement learning (RL) is an important subfield of AI that holds great promise for important applications such as robotic control and autonomous driving. Maximum entropy RL (MaxEnt RL) is a robust and flexible generalization of RL which has recently been connected to applications of large deviation theory in non-equilibrium statistical mechanics. In this approach, the scaled cumulant generating function (scgf) from large deviation theory can be mapped on to the soft value functions in MaxEnt RL. Using this mapping, we have developed novel algorithms to determine the optimal policy and soft value functions in MaxEnt RL. Furthermore, the connections of the scgf to Perron-Frobenius theory allow us to use results from linear algebra to derive bounds and develop useful approximations for the optimal policy and soft value functions. The formalism developed leads to new results for the problem of compositionality in MaxEnt RL that provide insights into how we can combine previously learned behaviors to obtain solutions for more complex tasks.
–
Presenters
-
Jacob Adamczyk
University of Massachusetts Boston
Authors
-
Jacob Adamczyk
University of Massachusetts Boston
-
Argenis Arriojas Maldonado
University of Massachusetts Boston
-
Stas Tiomkin
San Jose State University
-
Rahul V Kulkarni
University of Massachusetts Boston