DeepVRegulome: DNABERT-based deep-learning framework for predicting the functional impact of short genomic variants on the human regulome
Pratik Dutta, Matthew Obusan, Rekha Sathian, Max Chao, Pallavi Surana, Nimisha Papineni, Yanrong Ji, Zhihan Zhou, Han Liu, Alisa Yurovsky, Ramana V Davuluri
全基因组测序(WGS)揭示了许多非编码短变体,其功能影响仍然知之甚少。 尽管深度学习基因组方法最近取得了进展,但准确预测和优先考虑基因调控区域的临床相关突变仍然是一个重大挑战。 在这里,我们介绍了Deep VRegulome,一种深度学习方法,用于预测和解释人类regulome中功能破坏性变体,它结合了700个DNABERT微调模型,在大量ENCODE基因调控区域上进行了训练,具有变体评分,主题分析,基于注意力的可视化和生存分析。 我们在TCGA胶质母细胞瘤WGS数据集上展示了其在优先考虑与生存相关的突变和调节区域中的应用。 该分析确定了572个剪接中断和9837个转录因子结合位点改变在超过10%的胶质母细胞瘤样本中发生的突变。 生存分析将1352个突变和563个中断的调控区域与患者结果联系起来,通过非编码突变签名实现分层。 所有代码、微调模型和交互式数据门户都公开可用。
Whole-genome sequencing (WGS) has revealed numerous non-coding short variants whose functional impacts remain poorly understood. Despite recent advances in deep-learning genomic approaches, accurately predicting and prioritizing clinically relevant mutations in gene regulatory regions remains a major challenge. Here we introduce Deep VRegulome, a deep-learning method for prediction and interpretation of functionally disruptive variants in the human regulome, which combines 700 DNABERT fine-tuned...