An Active Learning-Based Streaming Pipeline for Reduced Data Training of Structure Finding Models in Neutron Diffractometry
Tianle Wang, Jorge Ramirez, Cristina Garcia-Cardona, Thomas Proffen, Shantenu Jha and Sudip K. Seal
中子衍射法中的结构测定工作量在计算上很昂贵,通常需要几个小时到许多天才能从其中子衍射模式中确定材料的结构。 最近报道了在模拟中子散射模式上训练的机器学习模型以显着加快这些任务的潜力。 然而,训练这些模型所需的模拟数据量随着要预测的结构参数数量呈指数级增长,并构成重大的计算挑战。 为了克服这一挑战,我们引入了一种新的批模式主动学习(AL)策略,该策略使用不确定性采样来模拟从概率分布中绘制的训练数据,这些数据更喜欢模型最不确定的标记示例。 我们确认其在训练相同型号方面的有效性约为75,同时提高了准确性。 然后,我们讨论使用AL策略的高效流式培训工作流程的设计,并在两个异构平台上进行性能研究,以证明与传统的培训工作流程相比,流式工作流提供约20个
Structure determination workloads in neutron diffractometry are computationally expensive and routinely require several hours to many days to determine the structure of a material from its neutron diffraction patterns. The potential for machine learning models trained on simulated neutron scattering patterns to significantly speed up these tasks have been reported recently. However, the amount of simulated data needed to train these models grows exponentially with the number of structural parame...