Novel approaches and bounds for maximum entropy reinforcement learning using nonequilibrium statistical mechanics

Jacob Adamczyk; Argenis Arriojas Maldonado; Stas Tiomkin; Rahul V Kulkarni

Novel approaches and bounds for maximum entropy reinforcement learning using nonequilibrium statistical mechanics

ORAL

Abstract

Reinforcement learning (RL) is an important subfield of AI that holds great promise for important applications such as robotic control and autonomous driving. Maximum entropy RL (MaxEnt RL) is a robust and flexible generalization of RL which has recently been connected to applications of large deviation theory in non-equilibrium statistical mechanics. In this approach, the scaled cumulant generating function (scgf) from large deviation theory can be mapped on to the soft value functions in MaxEnt RL. Using this mapping, we have developed novel algorithms to determine the optimal policy and soft value functions in MaxEnt RL. Furthermore, the connections of the scgf to Perron-Frobenius theory allow us to use results from linear algebra to derive bounds and develop useful approximations for the optimal policy and soft value functions. The formalism developed leads to new results for the problem of compositionality in MaxEnt RL that provide insights into how we can combine previously learned behaviors to obtain solutions for more complex tasks.

March 14, 2022, 4:36 PM – March 14, 2022, 4:48 PM

Presenters

Jacob Adamczyk

University of Massachusetts Boston

Authors

Jacob Adamczyk

University of Massachusetts Boston
Argenis Arriojas Maldonado

University of Massachusetts Boston
Stas Tiomkin

San Jose State University
Rahul V Kulkarni

University of Massachusetts Boston