Aboleth STT¶
GPU-accelerated speech-to-text for Unreal Engine 5.7
Aboleth STT brings real-time speech recognition to your Unreal Engine project. Powered by whisper.cpp with Silero VAD, streaming transcription, and CUDA/Vulkan GPU inference — everything runs locally on the player's hardware. No cloud APIs, no per-minute billing, no internet required.
Highlights¶
- GPU-Accelerated — CUDA (NVIDIA) and Vulkan (AMD/Intel/NVIDIA) backends
- Silero VAD — Neural voice activity detection with streaming LSTM
- Streaming Transcription — See text appear word-by-word as the player speaks
- Push-to-Talk — VAD automatic or manual capture modes
- 99 Languages — Auto-detect or force a specific language, with translation to English
- Runtime Tunable — Every setting adjustable from Blueprint or C++ without reloading
- In-Editor Model Downloader — One-click download from HuggingFace in Project Settings
- Formal Pipeline State Machine — Clean state transitions with full event coverage
Get Started in 60 Seconds¶
- Download a model — Project Settings > Aboleth Speech-to-Text > Model Management
- Drop an actor — Drag
AbolethSTTListenerActorinto your level - Bind one event —
OnUtteranceProcessedgives you the transcribed text - Play — Talk into your mic. Text appears.
Quick Start Guide API Reference
How It Works¶
Microphone → MicCapture → Ring Buffer → Silero VAD → Whisper (GPU) → Text
↓
[Streaming passes]
Local Agreement (n=2)
Word-by-word output
Aboleth STT captures audio from the microphone, resamples to 16kHz mono, and writes to a 30-second ring buffer. Silero VAD (streaming LSTM) detects speech onset and offset. When speech ends, the audio segment is sent to Whisper on a background thread for GPU inference. With streaming enabled, growing audio windows are sent to Whisper during speech for near-real-time word-by-word output.