Quantizing Whisper-small: How design choices affect ASR performance
Arthur Söhler, Julian Irigoyen and Andreas Søeborg Kirkedal
像Whisper-small这样的大型语音识别模型实现了高精度,但由于其高计算需求,很难在边缘设备上部署。 为此,我们对Whisper-small上的训练后量化(PTQ)进行了统一的跨图书馆评估,该评估将量化方案,方法,粒度和比特宽的影响分开。 我们的研究基于四个库:PyTorch,Optimum-Quanto,HQQ和比特字节。 关于LibriSpeech测试清洁和测试的其他实验表明,使用Quanto的动态int8量化提供了最佳的权衡,将模型尺寸减少了57%,同时提高了基线的单词错误率。 静态量化表现更差,可能是由于Whisper的Transformer架构,而更具侵略性的格式(例如,nf4, int3)在嘈杂条件下以精确性为代价实现了高达71%的压缩。 总体而言,我们的研究结果表明,精心选择的PTQ方法可以在不进行再培训的情况下大幅降低模型大小和推理成本,从而能够在受限硬件上有效地部署Whisper-small。
Large speech recognition models like Whisper-small achieve high accuracy but are difficult to deploy on edge devices due to their high computational demand. To this end, we present a unified, cross-library evaluation of post-training quantization (PTQ) on Whisper-small that disentangles the impact of quantization scheme, method, granularity, and bit-width. Our study is based on four libraries: PyTorch, Optimum-Quanto, HQQ, and bitsandbytes. Experiments on LibriSpeech test-clean and test-other sh...