活水快报 - 42Digest

资产责任管理的持续时间强化学习

Continuous-Time Reinforcement Learning for Asset-Liability Management

Yilie Huang

arXiv

2025年9月27日

本文提出了一种新的资产责任管理方法(ALM),采用连续时间强化学习(RL),采用线性四度(LQ)公式,包括临时和终端目标。我们开发一种针对 ALM 的无模型、基于策略梯度的软演员-批评算法,用于动态同步资产和负债。为了确保以最小的调音在探索和开发之间实现有效平衡,我们引入了对演员的适应性探索,并为评论家提供了计划探索。我们的实证研究针对两种增强的传统财务策略、基于模型的连续时间RL方法和三种最先进的RL算法对这种方法进行了评估。在200个随机市场场景中进行评估,我们的方法实现了比所有替代策略更高的平均回报,具有快速的初始收益和持续卓越的性能。出色的表现不是来自复杂的神经网络或改进的参数估计,而是直接学习最佳的ALM策略而不学习环境。

This paper proposes a novel approach for Asset-Liability Management (ALM) by employing continuous-time Reinforcement Learning (RL) with a linear-quadratic (LQ) formulation that incorporates both interim and terminal objectives. We develop a model-free, policy gradient-based soft actor-critic algorithm tailored to ALM for dynamically synchronizing assets and liabilities. To ensure an effective balance between exploration and exploitation with minimal tuning, we introduce adaptive exploration for ...

机器学习人工智能最优化与控制数理金融学

View Source