Reachy Mini Robot Runs Fully Local Speech-to-Speech Pipeline
Pollen Robotics and Hugging Face have enabled fully local voice conversations with the Reachy Mini robot using a cascaded speech-to-speech pipeline. The system runs entirely on the user's hardware with no cloud dependency, no API keys, and no data leaving the machine. The pipeline combines Silero VAD, Parakeet-TDT STT, an LLM (recommended: Gemma 4 via llama.cpp or Qwen3-4B via MLX/vLLM), and Qwen3-TTS. Users can deploy the backend locally and connect the robot via a WebSocket server at /v1/realtime. The approach offers privacy, zero API costs, and full control over each component. The blog provides step-by-step instructions for setting up llama.cpp with Gemma 4, MLX on Apple Silicon, vLLM, Hugging Face Inference Endpoints, and OpenAI-compatible providers. The system supports multiple LLM backends and can be customized for different languages or quality-speed tradeoffs. The project is open-source, with repositories on Hugging Face and GitHub.
Key facts
- Reachy Mini can now run fully local voice conversations with no cloud dependency.
- The pipeline uses Silero VAD, Parakeet-TDT STT, an LLM, and Qwen3-TTS.
- Recommended LLM setup: llama.cpp with Gemma 4, or MLX with Qwen3-4B.
- The system exposes a WebSocket server at /v1/realtime compatible with Reachy Mini.
- Users can swap any component of the cascade pipeline.
- The approach ensures privacy and eliminates API costs.
- Support for multiple LLM backends: local (llama.cpp, MLX, Transformers, vLLM) or hosted (OpenAI, Gemini, HF Inference Endpoints).
- The project is open-source with repositories on Hugging Face.
Entities
Institutions
- Pollen Robotics
- Hugging Face
- OpenAI
- Gemini
- Together
- Fireworks
- Replicate
- OpenRouter