Fault Tolerant Reconfigurable ML Multiprocessor
Tangrui Li, Justin Y. Shi, Matteo Spatola, Hongzheng Wang
本文报告了冯·诺依曼启发的可重新配置的容错多处理器用于神经网络(NN)训练工作流程的三项计算实验。 这些实验旨在证明拟议的可重新配置的多处理器架构的可行性,用于不定期的工作流程,具有适应性。 还讨论了与MLIR编译器的潜在集成,用于将各种加速器硬件集成到现有实际应用中。
This paper reports three computational experiments for a von Neumann inspired reconfigurable fault tolerant multiprocessor for neural network (NN) training workflows. The experiments are intended to prove the feasibility of the proposed reconfigurable multiprocessor architecture for non-regular workflows on robustness of adaptability. A potential integration with MLIR compilers is also discussed for integrating diverse accelerator hardware for existing practical applications.