Gemma 4 VLA Demo Runs on Jetson Orin Nano Super

ai-technology · 2026-04-24

NVIDIA's Asier Arranz published a tutorial demonstrating Gemma 4, a vision-language-action (VLA) model, running on a Jetson Orin Nano Super (8 GB). The demo uses a Logitech C920 webcam and USB keyboard for voice interaction. The model decides autonomously whether to use vision based on user questions, without keyword triggers. The setup requires llama.cpp with Gemma 4 GGUF and a vision projector (mmproj). The tutorial covers system packages, Python environment, RAM optimization, model serving, and demo execution. A Docker-based text-only alternative is also provided for Jetson Orin. The project is available on GitHub under asierarranz/Google_Gemma.

Key facts

Gemma 4 VLA demo runs on Jetson Orin Nano Super (8 GB).
Model decides autonomously when to use vision based on context.
Hardware includes Logitech C920 webcam and USB keyboard.
Uses llama.cpp with Gemma 4 GGUF and vision projector (mmproj).
Tutorial covers system packages, Python environment, RAM optimization.
Docker-based text-only alternative available for Jetson Orin.
Project on GitHub: asierarranz/Google_Gemma.
First run downloads Parakeet STT, Kokoro TTS, and voice prompts.

Entities

Artists

Asier Arranz

Institutions

NVIDIA
Hugging Face
Jetson AI Lab
GitHub

Sources

Hugging Face Blog — 2026-04-22