活水快报 - 42Digest

RL-Exec:机会最优清算的影响意识强化学习,在BTC-USD回放上优于TWAP和图书流动性VWAP

RL-Exec: Impact-Aware Reinforcement Learning for Opportunistic Optimal Liquidation, Outperforms TWAP and a Book-Liquidity VWAP on BTC-USD Replays

Enzo Duflot, Stanislas Robineau

arXiv

2025年10月30日

我们研究BTC-USD限价账簿(LOB)固定期限的机会最优清算。我们介绍了RL-Exec,一种经过历史重播训练的PPO代理,增加了内源性瞬态冲击(弹性),部分填充,制造商/获取者费用和延迟。该政策观察到深度-20 LOB特征加上微观结构指标,并在仅销售库存限制下采取行动,以达到剩余目标。评估遵循严格的时间分割(火车:2020年1月;测试:2月2020年)和每日协议:每个测试日,我们运行十个独立开始时间,并汇总到一个单一的每日得分,避免伪复制。我们将代理与(i)TWAP和(ii)使用相反订单簿流动性(前20级)的类似VWAP的基线分配进行比较,两者都在相同的时间戳和成本上执行。统计推理使用单面Wilcoxon签名级测试,每日RL基线差异与Benjamini-Hochberg FDR校正和引导置信区间。在2月2020日的测试中,RL-Exec大大优于基线,差距随执行视界增加(30分钟时+2-3 bps,60分钟+7-8 bps,120分钟+23 bps)。代码: github.com/Giafferri/RL-Exec

We study opportunistic optimal liquidation over fixed deadlines on BTC-USD limit-order books (LOB). We present RL-Exec, a PPO agent trained on historical replays augmented with endogenous transient impact (resilience), partial fills, maker/taker fees, and latency. The policy observes depth-20 LOB features plus microstructure indicators and acts under a sell-only inventory constraint to reach a residual target. Evaluation follows a strict time split (train: Jan-2020; test: Feb-2020) and a per-day...

统计金融学机器学习交易与市场微观结构

View Source