UCPO: A Universal Constrained Combinatorial Optimization Method via Preference Optimization
Zhanhong Fang, Debing Wang, Jinbiao Chen, Jiahai Wang, Zizhen Zhang
神经求解器在组合优化方面表现出显着的成功,在速度,解决方案质量和概括方面往往超越传统方法。 然而,当遇到复杂的限制时,其功效会显着恶化,这些限制无法通过简单的遮蔽机制进行有效管理。 为了解决这一限制,我们引入了通用约束偏好优化(UCPO),这是一种新颖的即插即用框架,通过专门设计的损耗函数将偏好学习无缝集成到现有的神经求解器中,而无需进行架构修改。 UCPO将约束满意度直接嵌入到基于偏好的目标中,无需细致的超参数调优。 利用轻量级的热启动微调协议,UCPO使预训练模型能够在充满挑战的任务上始终如一地产生近乎最优的可行解决方案,以略高于原始培训预算1%的卓越性能。
Neural solvers have demonstrated remarkable success in combinatorial optimization, often surpassing traditional heuristics in speed, solution quality, and generalization. However, their efficacy deteriorates significantly when confronted with complex constraints that cannot be effectively managed through simple masking mechanisms. To address this limitation, we introduce Universal Constrained Preference Optimization (UCPO), a novel plug-and-play framework that seamlessly integrates preference le...