Gemma 4 VLA Demo Runs on Jetson Orin Nano Super
NVIDIA's Asier Arranz published a tutorial demonstrating Gemma 4, a vision-language-action (VLA) model, running on a Jetson Orin Nano Super (8 GB). The demo uses a Logitech C920 webcam and USB keyboard for voice interaction. The model decides autonomously whether to use vision based on user questions, without keyword triggers. The setup requires llama.cpp with Gemma 4 GGUF and a vision projector (mmproj). The tutorial covers system packages, Python environment, RAM optimization, model serving, and demo execution. A Docker-based text-only alternative is also provided for Jetson Orin. The project is available on GitHub under asierarranz/Google_Gemma.
Key facts
- Gemma 4 VLA demo runs on Jetson Orin Nano Super (8 GB).
- Model decides autonomously when to use vision based on context.
- Hardware includes Logitech C920 webcam and USB keyboard.
- Uses llama.cpp with Gemma 4 GGUF and vision projector (mmproj).
- Tutorial covers system packages, Python environment, RAM optimization.
- Docker-based text-only alternative available for Jetson Orin.
- Project on GitHub: asierarranz/Google_Gemma.
- First run downloads Parakeet STT, Kokoro TTS, and voice prompts.
Entities
Artists
- Asier Arranz
Institutions
- NVIDIA
- Hugging Face
- Jetson AI Lab
- GitHub