Reachy Mini Robot Runs Fully Local Speech-to-Speech Pipeline

ai-technology · 2026-05-27

Pollen Robotics and Hugging Face have enabled fully local voice conversations with the Reachy Mini robot using a cascaded speech-to-speech pipeline. The system runs entirely on the user's hardware with no cloud dependency, no API keys, and no data leaving the machine. The pipeline combines Silero VAD, Parakeet-TDT STT, an LLM (recommended: Gemma 4 via llama.cpp or Qwen3-4B via MLX/vLLM), and Qwen3-TTS. Users can deploy the backend locally and connect the robot via a WebSocket server at /v1/realtime. The approach offers privacy, zero API costs, and full control over each component. The blog provides step-by-step instructions for setting up llama.cpp with Gemma 4, MLX on Apple Silicon, vLLM, Hugging Face Inference Endpoints, and OpenAI-compatible providers. The system supports multiple LLM backends and can be customized for different languages or quality-speed tradeoffs. The project is open-source, with repositories on Hugging Face and GitHub.

Key facts

Reachy Mini can now run fully local voice conversations with no cloud dependency.
The pipeline uses Silero VAD, Parakeet-TDT STT, an LLM, and Qwen3-TTS.
Recommended LLM setup: llama.cpp with Gemma 4, or MLX with Qwen3-4B.
The system exposes a WebSocket server at /v1/realtime compatible with Reachy Mini.
Users can swap any component of the cascade pipeline.
The approach ensures privacy and eliminates API costs.
Support for multiple LLM backends: local (llama.cpp, MLX, Transformers, vLLM) or hosted (OpenAI, Gemini, HF Inference Endpoints).
The project is open-source with repositories on Hugging Face.

Entities

Institutions

Pollen Robotics
Hugging Face
OpenAI
Gemini
Together
Fireworks
Replicate
OpenRouter

Sources

Hugging Face Blog — 2026-05-27