ARTFEED — Contemporary Art Intelligence

NVIDIA Launches Cosmos 3, First Open Omni-Model for Physical AI

ai-technology · 2026-06-01

NVIDIA has released Cosmos 3, an open omni-model for physical AI reasoning and action, built on a Mixture-of-Transformers (MoT) architecture. Unlike previous Cosmos versions that required separate models for world generation, controlled generation, scene understanding, and policy generation, Cosmos 3 unifies these capabilities in a single model. It processes text, image, video, audio, and action modalities within one architecture, using dedicated encoders (ViT for visual understanding, VAE for visual/audio generation, domain-aware vectors for actions) and a shared representation space. The input sequence splits into an autoregressive subsequence for reasoning and a diffusion subsequence for generation, interacting via joint attention. Two model sizes are available: Cosmos 3 Nano (8B parameters) for workstation-grade compute like the RTX PRO 6000 GPU, and Cosmos 3 Super (32B parameters) for large-scale synthetic data generation on NVIDIA Hopper and Blackwell GPUs. Both are hosted on Hugging Face. Cosmos 3 supports video generation from detailed narrative prompts and action generation from concise spatial prompts. It integrates with the Hugging Face Diffusers library via Cosmos3OmniPipeline. NVIDIA also releases synthetic data generation datasets for physical AI training. The Cosmos Framework provides end-to-end training and serving scripts, including post-training tools and agent skills. Applications include robotics, autonomous vehicles, and smart spaces.

Key facts

  • Cosmos 3 is the first open omni-model for physical AI reasoning and action.
  • It uses a Mixture-of-Transformers (MoT) architecture.
  • Two model sizes: Cosmos 3 Nano (8B params) and Cosmos 3 Super (32B params).
  • Available on Hugging Face at nvidia/Cosmos3-Nano and nvidia/Cosmos3-Super.
  • Integrates with Hugging Face Diffusers via Cosmos3OmniPipeline.
  • Supports text, image, video, audio, and action modalities.
  • NVIDIA releases synthetic data generation datasets for physical AI.
  • Applications include robotics, autonomous vehicles, and smart spaces.

Entities

Institutions

  • NVIDIA
  • Hugging Face

Sources