Wisdom from Diversity: Bias Mitigation Through Hybrid Human-LLM Crowds
Axel Abels, Tom Lenaerts
尽管它们的性能,大型语言模型(LLM)可能会无意中延续他们在所训练的数据中发现的偏见。 通过分析LLM对引起偏见的头条新闻的反应,我们发现这些模型通常反映了人类的偏见。 为了解决这个问题,我们探索了通过响应聚合来缓解偏见的基于人群的策略。 我们首先证明,简单地平均响应来自多个LLM,旨在利用“人群的智慧”,可能会加剧现有的偏见,因为LLM人群中的多样性有限。 相比之下,我们表明,本地加权聚合方法更有效地利用LLM人群的智慧,实现了偏置缓解和提高准确性。 最后,认识到LLM(准确性)和人类(多样性)的互补优势,我们证明了包含两者的混合人群显着提高了性能,并进一步减少了种族和性别相关环境中的偏见。
Despite their performance, large language models (LLMs) can inadvertently perpetuate biases found in the data they are trained on. By analyzing LLM responses to bias-eliciting headlines, we find that these models often mirror human biases. To address this, we explore crowd-based strategies for mitigating bias through response aggregation. We first demonstrate that simply averaging responses from multiple LLMs, intended to leverage the "wisdom of the crowd", can exacerbate existing biases due to ...