Single-Channel Speaker Distance Estimation Depends on Early Reflections
A recent study featured on arXiv (2605.07694) explores the utilization of room impulse response (RIR) components in single-channel speaker distance estimation models. The researchers analyzed simulated RIRs by breaking them down into four types: full, direct-only, no-late, and no-early, employing mixing time derived from the echo density function. They conducted tests across four calibration scenarios, ranging from fully calibrated (synchronized capture with known source level) to completely uncalibrated (random onset and unknown level). Findings indicate that the mean absolute error (MAE) rises to 1.29 meters without time calibration, with the model depending on reverberation cues, particularly early reflections, which proved to be the most valuable. Additional analysis comparing against DRR is currently in progress.
Key facts
- arXiv:2605.07694
- Single-channel speaker distance estimation
- Room impulse response (RIR) decomposition into four variants
- Mixing time estimated from echo density function
- Four calibration scenarios: fully calibrated to fully uncalibrated
- Without time calibration, MAE increases to 1.29 m
- Early reflections are the most informative component
- Further analysis against DRR
Entities
Institutions
- arXiv