APS Logo

Closed-Form Analytical Results for Maximum Entropy Reinforcement Learning Using Large Deviation Theory

ORAL

Abstract

Reinforcement learning (RL) is an important field of current research in artificial intelligence which has seen tremendous accomplishments in recent years. Important advances in RL have resulted from the infusion of ideas from statistical physics leading to successful approaches like maximum entropy reinforcement learning (Maxent RL). With the addition of an entropy-based regularization term, the optimal control problem in RL can be transformed into a problem in Bayesian inference. While this control-as-inference approach to RL has led to several advances, obtaining analytical results for the general case of stochastic dynamics has been an open problem. We establish a mapping between Maxent RL and research in non-equilibrium statistical mechanics based on applications of large deviation theory. In the long-time limit, we apply approaches from large deviation theory to derive exact analytical results for the optimal policy and optimal dynamics in Markov Decision Process models of RL. The mapping established connects research in reinforcement learning and non-equilibrium statistical mechanics, thereby opening further avenues for the application of analytical and computational approaches from physics to cutting-edge problems in machine learning.

Publication: Arriojas, A.; Tiomkin, S.; and Kulkarni, R. V. 2021. Closed-Form Analytical Results for Maximum Entropy Reinforcement Learning. arXiv preprint arXiv:2106.03931<br>Submitted to AAAI conference on Artificial Intelligence

Presenters

  • Argenis Arriojas Maldonado

    University of Massachusetts Boston

Authors

  • Argenis Arriojas Maldonado

    University of Massachusetts Boston

  • Jacob Adamczyk

    University of Massachusetts Boston

  • Stas Tiomkin

    San Jose State University

  • Rahul V Kulkarni

    University of Massachusetts Boston