Generative Recommendation with Semantic IDs: A Practitioner's Handbook
Clark Mingxuan Ju, Liam Collins, Leonardo Neves, Bhuvesh Kumar, Louis Yufeng Wang, Tong Zhao, Neil Shah
与传统模型相比,生成式推荐(Generative Recommendation, GR)因其优异的性能表现受到越来越多的关注。GR成功的关键因素在于语义ID(Semantic ID, SID),它能够将连续的语义表示(例如来自大语言模型的表示)转换为离散的ID序列。这使得采用SID的GR模型既能融合语义信息,又能学习协同过滤信号,同时保留离散解码的优势。然而,现有文献中多样的建模技术、超参数和实验设置使得不同GR方案难以直接比较。此外,缺乏开源统一框架阻碍了系统性基准测试和扩展,拖慢了模型迭代速度。为解决这一问题,我们提出并开源了基于语义ID的生成式推荐框架GRID,其模块化设计便于组件替换和加速想法迭代。通过GRID,我们在公开基准上系统性地实验和消融了采用SID的GR模型的不同组件。全面的GRID实验表明,许多被忽视的架构组件对SID-GR模型的性能有重大影响。这既提供了新的见解,也验证了开源平台对稳健基准测试和GR研究推进的效用。GRID已在https://github.com/snap-research/GRID开源。
Generative recommendation (GR) has gained increasing attention for its promising performance compared to traditional models. A key factor contributing to the success of GR is the semantic ID (SID), which converts continuous semantic representations (e.g., from large language models) into discrete ID sequences. This enables GR models with SIDs to both incorporate semantic information and learn collaborative filtering signals, while retaining the benefits of discrete decoding. However, varied mode...