ARTFEED — Contemporary Art Intelligence

Reachy Mini Robot Runs Fully Local Speech-to-Speech Pipeline

ai-technology · 2026-05-27

Pollen Robotics and Hugging Face have enabled fully local voice conversations with the Reachy Mini robot using a cascaded speech-to-speech pipeline. The system runs entirely on the user's hardware with no cloud dependency, no API keys, and no data leaving the machine. The pipeline combines Silero VAD, Parakeet-TDT STT, an LLM (recommended: Gemma 4 via llama.cpp or Qwen3-4B via MLX/vLLM), and Qwen3-TTS. Users can deploy the backend locally and connect the robot via a WebSocket server at /v1/realtime. The approach offers privacy, zero API costs, and full control over each component. The blog provides step-by-step instructions for setting up llama.cpp with Gemma 4, MLX on Apple Silicon, vLLM, Hugging Face Inference Endpoints, and OpenAI-compatible providers. The system supports multiple LLM backends and can be customized for different languages or quality-speed tradeoffs. The project is open-source, with repositories on Hugging Face and GitHub.

Key facts

  • Reachy Mini can now run fully local voice conversations with no cloud dependency.
  • The pipeline uses Silero VAD, Parakeet-TDT STT, an LLM, and Qwen3-TTS.
  • Recommended LLM setup: llama.cpp with Gemma 4, or MLX with Qwen3-4B.
  • The system exposes a WebSocket server at /v1/realtime compatible with Reachy Mini.
  • Users can swap any component of the cascade pipeline.
  • The approach ensures privacy and eliminates API costs.
  • Support for multiple LLM backends: local (llama.cpp, MLX, Transformers, vLLM) or hosted (OpenAI, Gemini, HF Inference Endpoints).
  • The project is open-source with repositories on Hugging Face.

Entities

Institutions

  • Pollen Robotics
  • Hugging Face
  • OpenAI
  • Gemini
  • Together
  • Fireworks
  • Replicate
  • OpenRouter

Sources