Sony AI Releases Woosh, an Open Sound Effects Foundation Model

ai-technology · 2026-04-30

Sony AI has unveiled Woosh, a foundational model for sound effects, sharing its architecture, training methodology, and comparative evaluations with other open models. Woosh features a high-quality audio encoder/decoder, a model for text-audio alignment, and generative models for both text-to-audio and video-to-audio. Additionally, distilled versions are available for efficient operation and rapid inference. Evaluations conducted on both public and private datasets indicate that Woosh performs competitively, or even surpasses, existing open alternatives such as StableAudio-Open and TangoFlux. The inference code and model weights can be accessed on GitHub. This initiative aims to aid the audio research community by offering essential tools for innovative approaches and baseline establishment.

Key facts

Sony AI released Woosh, an open sound effects foundation model.
The model includes audio encoder/decoder, text-audio alignment, text-to-audio, and video-to-audio components.
Distilled versions for low-resource operation are included.
Evaluations show competitive performance against StableAudio-Open and TangoFlux.
Inference code and model weights are publicly available on GitHub.
The release targets the audio research community for building novel approaches and baselines.

Sony AI Releases Woosh, an Open Sound Effects Foundation Model

Key facts

Entities

Institutions

Sources