InterFeat: An Automated Pipeline for Finding Interesting Hypotheses in Structured Biomedical Data
Dan Ofer, Michal Linial, Dafna Shahaf
发现有趣的现象是科学发现的核心,但它是一本手册,定义不清的概念。 我们提出了一个集成管道,用于在结构化生物医学数据中发现有趣的简单假设(具有效应方向的特征和目标关系和潜在潜在机制)。 该管道结合了机器学习、知识图谱、文献搜索和大型语言模型。 我们将“有趣性”正式化为新颖性,实用性和合理性的结合。 在英国生物银行的8种主要疾病中,我们的管道在文献中出现之前,一直在恢复风险因素。 40-53名候选人被验证为有趣,而基线为0-7。 总的来说,28管道解决了“有趣性”和任何目标的挑战。 我们发布数据和代码:https://github.com/LinialLab/InterFeat
Finding interesting phenomena is the core of scientific discovery, but it is a manual, ill-defined concept. We present an integrative pipeline for automating the discovery of interesting simple hypotheses (feature-target relations with effect direction and a potential underlying mechanism) in structured biomedical data. The pipeline combines machine learning, knowledge graphs, literature search and Large Language Models. We formalize "interestingness" as a combination of novelty, utility and pla...