活水快报 - 42Digest

强化学习的统计与算法基础

Statistical and Algorithmic Foundations of Reinforcement Learning

Yuejie Chi, Yuxin Chen, Yuting Wei

arXiv

2025年7月19日

作为一种在未知环境中进行序列决策的范式，强化学习（RL）近年来受到广泛关注。然而，新兴应用中模型复杂度的爆炸式增长以及非凸性的存在，加剧了在样本匮乏情况下实现高效强化学习的挑战——这些情况下数据收集成本高昂、耗时，甚至具有高风险（例如临床试验、自主系统和在线广告领域）。因此，如何理解和提升RL算法的样本效率与计算效率备受关注。本教程旨在介绍RL领域若干重要的算法与理论进展，着重阐释新思想与经典主题之间的联系。以Markov Decision Processes为核心数学模型，我们涵盖了几种典型的RL场景（包括模拟器环境下的RL、在线RL、离线RL、鲁棒RL以及带人类反馈的RL），并介绍了几种主流RL方法（包括基于模型的方法、基于价值的方法以及策略优化方法）。我们的讨论围绕样本复杂度、计算效率等问题展开，并从非渐进视角探讨算法依赖的下界和信息论下界。

As a paradigm for sequential decision making in unknown environments, reinforcement learning (RL) has received a flurry of attention in recent years. However, the explosion of model complexity in emerging applications and the presence of nonconvexity exacerbate the challenge of achieving efficient RL in sample-starved situations, where data collection is expensive, time-consuming, or even high-stakes (e.g., in clinical trials, autonomous systems, and online advertising). How to understand and en...

View Source