Hierarchical Clustering Unveils Latent Patterns in Speaker Recognition Networks

other · 2026-04-29

A recent preprint on arXiv (2604.23354) suggests utilizing Single-Linkage Clustering (SLINK) alongside Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) to discover hierarchical patterns within neural network representations used for speaker recognition. Earlier research employing t-SNE and K-means had pinpointed flat clustering in these latent spaces. This study seeks to expose more profound structural connections in the representations developed by networks that discern speaker identity from spoken utterances, thereby enhancing explainable AI (XAI) by clarifying the decision-making processes of these networks.

Key facts

arXiv paper 2604.23354 proposes hierarchical clustering for speaker recognition network representations.
Uses SLINK and HDBSCAN algorithms to analyze latent representations.
Contrasts with prior studies that employed t-SNE and K-means for flat clustering analysis.
Focuses on making neural network decisions understandable within XAI domain.
Networks are trained to recognize speaker identity from utterances.
Study aims to uncover unknown organizational patterns in network representations.
Hierarchical relationships in clusters are the new focus.
Paper is categorized as a cross submission on arXiv.

Hierarchical Clustering Unveils Latent Patterns in Speaker Recognition Networks

Key facts

Entities

Institutions

Sources