活水快报 - 42Digest

多CALF:具有统计保证的政策组合方法

Multi-CALF: A Policy Combination Approach with Statistical Guarantees

Georgiy Malaniya, Anton Bolychev, Grigory Yaremenko, Anastasia Krasnaya, Pavel Osinenko

arXiv

2025年5月18日

我们引入了Multi-CALF,这是一种基于其相对价值改进的智能结合强化学习策略的算法。我们的方法将标准的RL政策与理论上支持的替代策略相结合,继承了正式的稳定性保证,同时通常比单独任何一种政策都能更好地获得性能。我们证明我们的综合政策与已知概率设定的指定目标趋同,并在最大偏差和收敛时间上提供精确界限。对控制任务进行经验验证,在保持稳定性保证的同时,提高了性能。

We introduce Multi-CALF, an algorithm that intelligently combines reinforcement learning policies based on their relative value improvements. Our approach integrates a standard RL policy with a theoretically-backed alternative policy, inheriting formal stability guarantees while often achieving better performance than either policy individually. We prove that our combined policy converges to a specified goal set with known probability and provide precise bounds on maximum deviation and convergen...

View Source