活水快报 - 42Digest

相同的模型,更好的性能:洗牌对DNA语言模型基准的影响

Same model, better performance: the impact of shuffling on DNA Language Models benchmarking

Davide Greco, Konrad Rawlik

arXiv

2025年10月14日

大型语言模型在基因组学中越来越受欢迎,因为它们有可能解码复杂的生物序列。因此,研究人员需要一个标准化的基准来评估DNA语言模型(DNA LMs)的能力。然而,评估DNA LMs是一项复杂的任务,它与基因组的域特异性挑战和机器学习方法相交,其中看似次要的实现细节可能会显着影响基准有效性。我们通过BEND(Benchmarking DNA Language Models)证明了这一点,其中依赖于硬件的超参数 - 数据加载工人数量和缓冲区大小 - 创造了高达4的虚假性能变化。

Large Language Models are increasingly popular in genomics due to their potential to decode complex biological sequences. Hence, researchers require a standardized benchmark to evaluate DNA Language Models (DNA LMs) capabilities. However, evaluating DNA LMs is a complex task that intersects genomic's domain-specific challenges and machine learning methodologies, where seemingly minor implementation details can significantly compromise benchmark validity. We demonstrate this through BEND (Benchma...

基因组学机器学习

View Source