ElicitationGPT: Text Elicitation Mechanisms via Language Models
Yifan Wu, Jason Hartline
评分规则评估未知状态的概率预测与已实现状态,是信息激励引出的基本组成部分。 本文通过将文本信息引出问题减少到预测引出问题,通过对大型语言模型(特别是ChatGPT)的无域知识查询(特别是ChatGPT)进行评分,并评估其与人类偏好的一致,从而开发对引出文本进行评分的机制。 我们的理论分析表明,减少通过黑箱语言模型实现了可证明的正确性。 经验评估是在同行评级数据集的同行评审上进行的,与同行评审的手动指导得分相比。 我们的结果表明了一种算法人工智能的范式,对于开发具有可证明的保证的人工智能技术可能很有用。
Scoring rules evaluate probabilistic forecasts of an unknown state against the realized state and are a fundamental building block in the incentivized elicitation of information. This paper develops mechanisms for scoring elicited text against ground truth text by reducing the textual information elicitation problem to a forecast elicitation problem, via domain-knowledge-free queries to a large language model (specifically ChatGPT), and empirically evaluates their alignment with human preference...