DiRe-RAPIDS: Topology-faithful dimensionality reduction at scale
A novel technique for dimensionality reduction, named DiRe-RAPIDS, surpasses UMAP in maintaining topological integrity while achieving similar processing speeds. This approach rectifies a limitation in conventional local metrics that tend to favor noise memorization, which results in the creation of fictitious cycles and isolated clusters. DiRe is optimized against a benchmark focused on topology-faithfulness using noisy manifolds with established homology, attaining Pareto-optimal results that either match or exceed the performance of GPU-accelerated UMAP in classification tasks and accurately recovers first Betti numbers during stress evaluations. On 723K arXiv paper embeddings, DiRe retains 3-4 times more topological structure than UMAP within similar time constraints. This research is available on arXiv in the fields of computer science and machine learning.
Key facts
- DiRe-RAPIDS is a new dimensionality reduction method.
- It preserves topological structure better than UMAP.
- Standard local metrics reward noise memorisation.
- UMAP can invent cycles and disconnected islands.
- DiRe uses a topology-faithfulness benchmark with noisy manifolds.
- DiRe matches or beats GPU-accelerated UMAP on classification.
- DiRe recovers exact first Betti numbers on stress tests.
- Tested on 723K arXiv paper embeddings.
- DiRe preserves 3-4 times more topological structure than UMAP.
- Published on arXiv under computer science and machine learning.
Entities
Institutions
- arXiv