Aboleth STT¶

GPU-accelerated speech-to-text for Unreal Engine 5.7

Aboleth STT brings real-time speech recognition to your Unreal Engine project. Powered by whisper.cpp with Silero VAD, streaming transcription, and CUDA/Vulkan GPU inference — everything runs locally on the player's hardware. No cloud APIs, no per-minute billing, no internet required.

Highlights¶

GPU-Accelerated — CUDA (NVIDIA) and Vulkan (AMD/Intel/NVIDIA) backends
Silero VAD — Neural voice activity detection with streaming LSTM
Streaming Transcription — See text appear word-by-word as the player speaks
Push-to-Talk — VAD automatic or manual capture modes
99 Languages — Auto-detect or force a specific language, with translation to English
Runtime Tunable — Every setting adjustable from Blueprint or C++ without reloading
In-Editor Model Downloader — One-click download from HuggingFace in Project Settings
Formal Pipeline State Machine — Clean state transitions with full event coverage

Get Started in 60 Seconds¶

Download a model — Project Settings > Aboleth Speech-to-Text > Model Management
Drop an actor — Drag AbolethSTTListenerActor into your level
Bind one event — OnUtteranceProcessed gives you the transcribed text
Play — Talk into your mic. Text appears.

Quick Start Guide API Reference

How It Works¶

Microphone → MicCapture → Ring Buffer → Silero VAD → Whisper (GPU) → Text
                                            ↓
                                    [Streaming passes]
                                    Local Agreement (n=2)
                                    Word-by-word output

Aboleth STT captures audio from the microphone, resamples to 16kHz mono, and writes to a 30-second ring buffer. Silero VAD (streaming LSTM) detects speech onset and offset. When speech ends, the audio segment is sent to Whisper on a background thread for GPU inference. With streaming enabled, growing audio windows are sent to Whisper during speech for near-real-time word-by-word output.

Architecture Details