活水快报 - 42Digest

msf-CNN:基于补丁的多阶段融合与卷积神经网络的TinyML

msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyML

Zhaolan Huang, Emmanuel Baccelli

arXiv

2025年5月16日

AI从大型语言模型到微控制器(MCU)上运行的微小模型。非常内存效率的模型架构是决定性的,以适应MCU的微小内存预算,例如128kB的RAM。然而,推理延迟必须保持小,以适应实时约束。解决这个问题的方法是基于补丁的融合,旨在优化跨神经网络层的数据流。在本文中,我们介绍了msf-CNN,这是一种新技术,通过作为定向循环图的融合解决方案空间,有效地为卷积神经网络(CNN)找到最佳的融合设置。与之前关于MCU的CNN融合工作相比,msf-CNN确定了一套更广泛的解决方案。我们发布了在各种微控制器上运行的msf-CNN的实现(ARM Cortex-M,RISC-V,ESP32)。我们表明msf-CNN可以使用50个先验技术(MCUNetV2和StreamNet)实现推理。因此,我们展示了msf-CNN如何为系统设计人员提供额外的灵活性。

AI spans from large language models to tiny models running on microcontrollers (MCUs). Extremely memory-efficient model architectures are decisive to fit within an MCU's tiny memory budget e.g., 128kB of RAM. However, inference latency must remain small to fit real-time constraints. An approach to tackle this is patch-based fusion, which aims to optimize data flows across neural network layers. In this paper, we introduce msf-CNN, a novel technique that efficiently finds optimal fusion settings ...

机器学习性能

View Source