活水快报 - 42Digest

DeepAtlas:有效多边学习的工具

DeepAtlas: a tool for effective manifold learning

Serena Hughes, Timothy Hamilton, Tom Kolokotrones and Eric J. Deeds

arXiv

2025年8月26日

流形学习建立在“多形假说”的基础上,该假说假设认为高维数据集中的数据来自低维流形。当前的工具生成数据的全局嵌入,而不是用于数学定义流形的本地地图。这些工具也无法评估多面假设是否适用于数据集。在这里,我们描述了DeepAtlas,一种算法,可以生成数据本地社区的低维表示,然后训练深度神经网络,这些神经网络映射到这些局部嵌入和原始数据之间。拓扑失真用于确定数据集是否来自流形,如果是的话,则来自其维度。测试数据集的应用程序表明DeepAtlas可以成功学习多形结构。有趣的是,许多真实的数据集,包括单细胞RNA测序,不符合多形假说。在数据来自流形的情况下,DeepAtlas 构建了一个可以生成式使用的模型,并承诺允许将强大的工具从微分几何应用于各种数据集。

Manifold learning builds on the "manifold hypothesis," which posits that data in high-dimensional datasets are drawn from lower-dimensional manifolds. Current tools generate global embeddings of data, rather than the local maps used to define manifolds mathematically. These tools also cannot assess whether the manifold hypothesis holds true for a dataset. Here, we describe DeepAtlas, an algorithm that generates lower-dimensional representations of the data's local neighborhoods, then trains deep...

机器学习定量方法

View Source