活水快报 - 42Digest

TurkEmbed4检索:土耳其检索任务的嵌入模型

TurkEmbed4Retrieval: Turkish Embedding Model for Retrieval Task

Özay Ezerceli, Gizem Gümüşçekiçci, Tuğba Erkoç, Berke Özenç

arXiv

2025年11月10日

在这项工作中,我们介绍了TurkEmbed4Retrieval,这是TurkEmbed模型的检索专用变体,最初为自然语言推理(NLI)和语义文本相似性(STS)任务设计。通过使用高级训练技术(包括Matryoshka表示学习和量身定制的多个负数排名损失)对MS MARCO TR数据集上的基础模型进行微调,我们实现了土耳其检索任务的SOTA性能。广泛的实验表明,我们的模型在Scifact TR数据集的关键检索指标上优于土耳其colBERT,比19.26%高,从而为土耳其信息检索建立了新的基准。

In this work, we introduce TurkEmbed4Retrieval, a retrieval specialized variant of the TurkEmbed model originally designed for Natural Language Inference (NLI) and Semantic Textual Similarity (STS) tasks. By fine-tuning the base model on the MS MARCO TR dataset using advanced training techniques, including Matryoshka representation learning and a tailored multiple negatives ranking loss, we achieve SOTA performance for Turkish retrieval tasks. Extensive experiments demonstrate that our model out...

信息检索

View Source