Briefing

Back to Articles Reachy Mini goes fully local

ai-dev
Llama

Deploy: Run speech‑to‑speech locally with llama.cpp Gemma 4 and connect Reachy Mini to the local backend.

What to do now

Deploy: Run speech‑to‑speech locally with llama.cpp Gemma 4 and connect Reachy Mini to the local backend.

Summary

On May 27 2026, the Reachy Mini community released a guide to run the entire speech‑to‑speech stack locally, eliminating the need for cloud APIs. The new stack uses the open‑source speech‑to‑speech library, which exposes a Realtime API‑compatible /v1/realtime WebSocket and chains a cascaded VAD → STT → LLM → TTS pipeline. The recommended LLM backend is llama.cpp serving the Gemma 4 E4B‑it‑GGUF model via the command `llama-server -hf ggml-org/gemma-4-E4B-it-GGUF -np 2 -c 65536 -fa on --swa-full`. For voice processing, the guide suggests Silero VAD v5 Tiny, Parakeet‑TDT 0.6B v3 for STT, and Qwen3‑TTS for multilingual TTS. The speech‑to‑speech binary can run in `--mode local` to keep all audio on the host, or in `--mode realtime` to stream to the Reachy Mini. The approach allows swapping any component in the cascade, so developers can experiment with newer models from the Hugging Face Hub each week. Privacy is preserved because no audio leaves the machine, and there are no per‑minute API costs.

Key changes

  • Full local speech‑to‑speech stack via speech‑to‑speech library exposing /v1/realtime WebSocket
  • Uses llama.cpp serving Gemma 4 E4B‑it‑GGUF with flags -hf, -np 2, -c 65536, -fa on, --swa-full
  • Recommended components: Silero VAD v5 Tiny, Parakeet‑TDT 0.6B v3 STT, Qwen3‑TTS
  • Supports multiple LLM backends: local llama.cpp, vLLM, Hugging Face Inference Endpoints, OpenAI‑compatible providers
  • Two modes: --mode local for local inference, --mode realtime for streaming to robot
  • Enables swapping any cascade component
  • No API keys, no data leaves machine
  • Privacy and cost savings

Affects

internal

Customer impact

Analyzing matches…

Ask about this story

Impact on an agency? Which customers? Compare historically Risks of waiting