活水快报 - 42Digest

硬 vs. 噪声:通过大型语言模型解决推荐系统中的硬噪声样本混淆

Hard vs. Noise: Resolving Hard-Noisy Sample Confusion in Recommender Systems via Large Language Models

Tianrui Song, Wen-Shuo Chao, Hao Liu

arXiv

2025年11月10日

隐含的反馈,用于培训推荐系统,由于点击错误和位置偏差等因素,不可避免地面对噪音。以前的研究试图通过它们不同的数据模式(如更高的损失值)来识别嘈杂的样本,并通过样本下降或重新加权来减轻它们的影响。然而,我们观察到嘈杂的样品和硬样品表现出类似的模式,导致硬噪音的混乱问题。这种混淆是有问题的,因为硬样本对于用户偏好的建模至关重要。为了解决这个问题,我们提出了LLMHNI框架,利用大型语言模型(LLM)产生的两个辅助用户项目相关性信号来区分硬和嘈杂的样本。 LLMHNI从LLM编码的嵌入中获得用户项语义相关性,该嵌入用于负采样以选择硬否定,同时过滤出嘈杂的假否定。提出了客观的对齐策略,将LLM编码的嵌入(最初用于一般语言任务)投射到针对用户-项目相关性建模优化的表示空间中。 LLMHNI 还利用用户-项目交互中 LLM 推断的逻辑相关性来识别硬和嘈杂的样本。这些LLM推断的相互作用被集成到交互图中,并通过交叉图对比对齐引导去噪。为了消除由LLM幻觉引起的不可靠相互作用的影响,我们提出了一个图形对比学习策略,该策略将随机边缘视图的表示对齐以抑制不可靠的边缘。经验性结果表明,LLMHNI显著提高了去噪和推荐性能。

Implicit feedback, employed in training recommender systems, unavoidably confronts noise due to factors such as misclicks and position bias. Previous studies have attempted to identify noisy samples through their diverged data patterns, such as higher loss values, and mitigate their influence through sample dropping or reweighting. However, we observed that noisy samples and hard samples display similar patterns, leading to hard-noisy confusion issue. Such confusion is problematic as hard sample...

信息检索人工智能

View Source