Towards a Multimodal Stream Processing System
Uélison Jean Lopes dos Santos, Alessandro Ferri, Szilard Nistor, Riccardo Tommasini, Carsten Binnig, Manisha Luthra
在本文中,我们提出了新一代多模态流系统的愿景,该系统将MLLM作为一流的操作员进行嵌入,实现跨多种模式的实时查询处理。 实现这一目标是微不足道的:虽然最近的工作已经将MLLM集成到多模态查询的数据库中,但由于其严格的延迟和吞吐量要求,流媒体系统需要完全不同的方法。 我们的方法提出了所有级别的新优化,包括逻辑、物理和语义查询转换,可降低模型负载以提高吞吐量,同时保持准确性。 我们用Samsara证明了这一点,Samsara是一个利用这种优化来提高性能的原型。 此外,我们讨论了一个研究路线图,概述了构建可扩展和高效的多式联运流处理系统的开放研究挑战。
In this paper, we present a vision for a new generation of multimodal streaming systems that embed MLLMs as first-class operators, enabling real-time query processing across multiple modalities. Achieving this is non-trivial: while recent work has integrated MLLMs into databases for multimodal queries, streaming systems require fundamentally different approaches due to their strict latency and throughput requirements. Our approach proposes novel optimizations at all levels, including logical, ph...