Statistical and Algorithmic Foundations of Reinforcement Learning
Yuejie Chi, Yuxin Chen, Yuting Wei
作为一种在未知环境中进行序列决策的范式,强化学习(RL)近年来受到广泛关注。然而,新兴应用中模型复杂度的爆炸式增长以及非凸性的存在,加剧了在样本匮乏情况下实现高效强化学习的挑战——这些情况下数据收集成本高昂、耗时,甚至具有高风险(例如临床试验、自主系统和在线广告领域)。因此,如何理解和提升RL算法的样本效率与计算效率备受关注。本教程旨在介绍RL领域若干重要的算法与理论进展,着重阐释新思想与经典主题之间的联系。以Markov Decision Processes为核心数学模型,我们涵盖了几种典型的RL场景(包括模拟器环境下的RL、在线RL、离线RL、鲁棒RL以及带人类反馈的RL),并介绍了几种主流RL方法(包括基于模型的方法、基于价值的方法以及策略优化方法)。我们的讨论围绕样本复杂度、计算效率等问题展开,并从非渐进视角探讨算法依赖的下界和信息论下界。
As a paradigm for sequential decision making in unknown environments, reinforcement learning (RL) has received a flurry of attention in recent years. However, the explosion of model complexity in emerging applications and the presence of nonconvexity exacerbate the challenge of achieving efficient RL in sample-starved situations, where data collection is expensive, time-consuming, or even high-stakes (e.g., in clinical trials, autonomous systems, and online advertising). How to understand and en...