APS Logo

Novel approaches and bounds for maximum entropy reinforcement learning using nonequilibrium statistical mechanics

ORAL

Abstract

Reinforcement learning (RL) is an important subfield of AI that holds great promise for important applications such as robotic control and autonomous driving. Maximum entropy RL (MaxEnt RL) is a robust and flexible generalization of RL which has recently been connected to applications of large deviation theory in non-equilibrium statistical mechanics. In this approach, the scaled cumulant generating function (scgf) from large deviation theory can be mapped on to the soft value functions in MaxEnt RL. Using this mapping, we have developed novel algorithms to determine the optimal policy and soft value functions in MaxEnt RL. Furthermore, the connections of the scgf to Perron-Frobenius theory allow us to use results from linear algebra to derive bounds and develop useful approximations for the optimal policy and soft value functions. The formalism developed leads to new results for the problem of compositionality in MaxEnt RL that provide insights into how we can combine previously learned behaviors to obtain solutions for more complex tasks.

Presenters

  • Jacob Adamczyk

    University of Massachusetts Boston

Authors

  • Jacob Adamczyk

    University of Massachusetts Boston

  • Argenis Arriojas Maldonado

    University of Massachusetts Boston

  • Stas Tiomkin

    San Jose State University

  • Rahul V Kulkarni

    University of Massachusetts Boston