42digest首页
NoLBERT: A No Lookahead(back) 实证研究的基础语言模型

NoLBERT: A No Lookahead(back) Foundational Language Model

Ali Kakhbod and Peiyao Li

arXiv
2025年9月1日

我们介绍了NoLBERT,一种轻量级的,时间戳的基础语言模型,用于社会科学的实证研究,特别是在经济和金融领域。 通过独家对1976-1995年文本进行预先培训,NoLBERT避免了可能破坏计量经济学推断的回顾和远头偏见。 它超过了NLP基准的域特定基线,同时保持了时间的一致性。 应用于专利文本,NoLBERT支持公司级创新网络的建设,并表明创新中心的增长预示着更高的长期利润增长。

We present NoLBERT, a lightweight, timestamped foundational language model for empirical research – particularly for forecasting in economics, finance, and the social sciences. By pretraining exclusively on text from 1976 to 1995, NoLBERT avoids both lookback and lookahead biases (information leakage) that can undermine econometric inference. It exceeds domain-specific baselines on NLP benchmarks while maintaining temporal consistency. Applied to patent texts, NoLBERT enables the construction of...