活水快报 - 42Digest

样本高效的"聚类并治"方法用于并行大规模排序选择

Sample-Efficient "Clustering and Conquer" Procedures for Parallel Large-Scale Ranking and Selection

Zishi Zhang, Yijie Peng

arXiv

2024年2月3日

本工作旨在通过利用相关性信息来提高并行大规模排序选择（R S）问题的样本效率。我们修改了并行计算中常用的"分治"框架，增加了一个基于相关性的聚类步骤，将其转变为"聚类并治"。在对称基准场景下的分析结果表明，这一看似简单的修改为广泛使用的一类样本最优R S过程带来了𝒪(p)的样本复杂度降低。我们的方法具有两个关键优势：1）不需要高精度的相关性估计或精确聚类，2）可以与各种现有R S过程无缝集成，同时实现最优样本复杂度。理论上，我们开发了一个新颖的梯度分析框架来分析样本效率并指导大规模R S过程的设计。我们还引入了一种专为大规模场景定制的新型并行聚类算法。最后，在神经架构搜索等大规模人工智能应用中，我们的方法展示了优越的性能。

This work aims to improve the sample efficiency of parallel large-scale ranking and selection (R S) problems by leveraging correlation information. We modify the commonly used "divide and conquer" framework in parallel computing by adding a correlation-based clustering step, transforming it into "clustering and conquer". Analytical results under a symmetric benchmark scenario show that this seemingly simple modification yields an 𝒪(p) reduction in sample complexity for a widely used class of sa...

统计方法论机器学习

View Source