活水快报 - 42Digest

什么是可生产可能是无法触及的:测量生成模型的可操作性

What's Producible May Not Be Reachable: Measuring the Steerability of Generative Models

Keyon Vafa, Sarah Bentley, Jon Kleinberg, Sendhil Mullainathan

arXiv

2025年3月21日

我们应该如何评价生成模型的质量? 许多现有指标侧重于模型的可生产性,即它可以产生的输出的质量和广度。然而,使用生成模型的实际价值不仅源于它可以产生什么,而且源于具有特定目标的用户是否可以产生满足该目标的输出。我们把这种属性称为可操作性。在本文中,我们首先引入了一个数学分解,用于独立于可预测性地量化可转换性。适可性比可生产性更具挑战性,因为它需要知道用户的目标。我们通过创建一个依赖于一个关键想法的基准任务来解决这个问题:从生成模型的输出样本,并要求用户复制它。我们在文本到图像和大型语言模型的用户研究中实现了这一基准。尽管这些模型能够产生高质量的输出,但它们在可操作性方面都表现不佳。这些结果表明,我们需要专注于提高生成模型的可操作性。我们展示了这样的改进确实是可能的:简单的基于图像的转向机制在这个基准上实现了超过2倍的改进。

How should we evaluate the quality of generative models? Many existing metrics focus on a model's producibility, i.e. the quality and breadth of outputs it can generate. However, the actual value from using a generative model stems not just from what it can produce but whether a user with a specific goal can produce an output that satisfies that goal. We refer to this property as steerability. In this paper, we first introduce a mathematical decomposition for quantifying steerability independent...

机器学习人工智能计算机视觉与模式识别人机交互

View Source