NAACA: Training-Free Audio Model Boosts Salience Detection
Researchers have unveiled NAACA, a novel NeuroAuditory Attentive Cognitive Architecture designed to enhance the performance of audio language models in lengthy recordings by treating attention distribution as an auditory salience filtering challenge. Central to this architecture is OWM, a neuro-inspired Oscillatory Working Memory that sustains stable attractor-like states, activating higher cognitive processes only when adaptive energy variations indicate perceptual salience. On the XD-Violence dataset, NAACA boosted AudioQwen's average precision from 53.50% to 70.60%, simultaneously minimizing unnecessary ALM activations. Additionally, qualitative analyses of the Urban Soundscapes of the World (USoW) dataset showcased OWM's capacity to detect new events and subcategory transitions while remaining resilient to brief pauses.
Key facts
- NAACA is a training-free architecture for audio language models.
- It reframes attention allocation as an auditory salience filtering problem.
- OWM is a neuro-inspired Oscillatory Working Memory.
- OWM triggers higher-cognition processing only on perceptual salience.
- On XD-Violence, NAACA improved AudioQwen's AP from 53.50% to 70.60%.
- NAACA reduced unnecessary ALM invocations.
- Qualitative studies used the USoW dataset.
- OWM captures novel events and subcategory shifts.
Entities
—