Skip to content

Subsystem API

UAbolethSTTSubsystem is a GameInstanceSubsystem that serves as the central interface for all speech-to-text operations. It manages the full STT pipeline including model loading, microphone capture, VAD, and transcription.

Accessing the Subsystem

#include "AbolethSTTSubsystem.h"

UAbolethSTTSubsystem* STT = UAbolethSTTSubsystem::GetSTTSubsystem(WorldContextObject);
if (STT && STT->IsSTTReady())
{
    STT->StartListening();
}

Use the Get Subsystem node with class AbolethSTTSubsystem. Since this is a GameInstanceSubsystem, it is available from any world context.


Core Lifecycle

Functions for loading, starting, and stopping the STT system.

Function Specifier Return Description
LoadSTTSystem BlueprintCallable bool Loads the Whisper model and initializes the full pipeline. Returns true on success.
UnloadSTTSystem BlueprintCallable void Tears down the pipeline and releases all model resources.
IsSTTReady BlueprintPure bool Returns true if the model is loaded and the pipeline is ready to accept audio.
StartListening BlueprintCallable void Opens the microphone and begins VAD-driven capture.
StopListening BlueprintCallable void Stops microphone capture and halts processing.
IsMicrophoneActive BlueprintPure bool Returns true if the microphone is currently capturing audio.
ForceResetProcessing BlueprintCallable void Aborts any in-flight transcription and resets the pipeline to Idle.

State Queries

Read-only queries for inspecting the current state of the STT pipeline.

Function Specifier Return Description
GetPipelineState BlueprintPure EAbolethPipelineState Current pipeline state (Idle, SpeechDetected, Processing, etc.).
IsSpeechDetected BlueprintPure bool Returns true if the VAD is currently detecting speech.
IsProcessingSTT BlueprintPure bool Returns true if a transcription pass is in progress.
GetCurrentRMSLevel BlueprintPure float Current audio RMS level. Metering only --- not used for VAD gating.
GetCurrentVADProbability BlueprintPure float Raw Silero VAD probability (0.0--1.0) for the most recent window.
GetCurrentUtteranceDuration BlueprintPure float Duration in seconds of the current utterance being accumulated.
GetTimeSinceLastSpeech BlueprintPure float Seconds elapsed since the last detected speech frame.
GetAccumulatedSampleCount BlueprintPure int32 Number of audio samples buffered for the current utterance.
GetLastTranscribedText BlueprintPure FString The most recent final transcription result.

RMS vs VAD

GetCurrentRMSLevel is provided for audio-level UI display (volume meters, etc.). Speech detection is handled entirely by the Silero VAD model via GetCurrentVADProbability.


Audio Status

Query the state of the audio capture hardware and buffers.

Function Specifier Return Description
GetMicrophoneStatus BlueprintPure FAudioCaptureStatus Full capture status including audio levels, queue depth, and pipeline state.
GetActiveMic BlueprintPure UMicCaptureComponent* Pointer to the currently active microphone capture component.
GetUnifiedAudioBuffer BlueprintPure UAbolethUnifiedAudioBuffer* Access the shared audio ring buffer used across the pipeline.

Manual Processing

Manually trigger transcription outside the standard VAD-driven pipeline.

Function Specifier Return Description
ProcessUtteranceAsync BlueprintCallable void Submits the current audio buffer for async transcription. Results arrive via OnUtteranceProcessed.
ProcessUtteranceImmediate BlueprintCallable FString Blocking. Transcribes the current buffer and returns the result immediately. Use with caution on the game thread.
ResetVADState BlueprintCallable void Clears accumulated VAD state and audio buffers without stopping capture.

Blocking Call

ProcessUtteranceImmediate blocks the calling thread until transcription completes. Prefer ProcessUtteranceAsync for gameplay code.


Language

Configure the target language for transcription and translation.

Function Specifier Return Description
SetLanguage BlueprintCallable void Set the source language code (e.g., "en", "ja", "de"). Use "auto" for automatic detection.
GetLanguage BlueprintPure FString Returns the currently configured language code.
SetTranslateToEnglish BlueprintCallable void Enable or disable Whisper's built-in translation-to-English mode.
GetTranslateToEnglish BlueprintPure bool Returns true if translation to English is active.
GetAvailableLanguages Static TArray<FString> Returns all language codes supported by the loaded Whisper model.

Runtime Settings

Adjust VAD sensitivity, microphone gain, and debug options at runtime.

Function Specifier Return Description
SetVADThreshold BlueprintCallable void Set the Silero VAD speech probability threshold (0.0--1.0). Default: 0.5.
GetVADThreshold BlueprintPure float Returns the current VAD threshold.
SetMicGainDb BlueprintCallable void Set microphone gain in decibels. 0.0 = unity gain.
GetMicGainDb BlueprintPure float Returns the current microphone gain in dB.
ReloadVADSettings BlueprintCallable void Re-reads VAD parameters from project settings and applies them.
SetDebugLogging BlueprintCallable void Enable or disable verbose STT debug logging at runtime.
IsDebugLoggingEnabled BlueprintPure bool Returns true if debug logging is active.

Capture Mode

Switch between VAD-driven automatic capture and push-to-talk.

Function Specifier Return Description
SetCaptureMode BlueprintCallable void Switch between VADAutomatic and PushToTalk.
GetCaptureMode BlueprintPure EAbolethCaptureMode Returns the active capture mode.
StartManualCapture BlueprintCallable void Begin recording (push-to-talk). Only valid when capture mode is PushToTalk.
StopManualCapture BlueprintCallable void End recording and submit audio for transcription.
IsManualCaptureActive BlueprintPure bool Returns true if a manual capture session is in progress.

Streaming

Control real-time streaming transcription with Local Agreement confirmation.

Function Specifier Return Description
SetStreamingEnabled BlueprintCallable void Enable or disable streaming transcription.
IsStreamingEnabled BlueprintPure bool Returns true if streaming is active.
SetStreamingChunkIntervalMs BlueprintCallable void Set the interval between streaming snapshot passes in milliseconds.
GetStreamingChunkIntervalMs BlueprintPure int32 Returns the current streaming chunk interval.
GetStreamingPassCount BlueprintPure int32 Number of streaming passes executed for the current utterance.
GetStreamingAccumulatedText BlueprintPure FString Returns all confirmed (committed) text from the current streaming session.

Local Agreement

Streaming uses Local Agreement with n=2 to confirm words. Only words that appear consistently across two consecutive snapshot passes are committed, reducing hallucination in partial results.


Configure beam search decoding for higher-quality transcription at the cost of latency.

Function Specifier Return Description
SetBeamSearchEnabled BlueprintCallable void Enable or disable beam search decoding.
IsBeamSearchEnabled BlueprintPure bool Returns true if beam search is active.
SetBeamSize BlueprintCallable void Set the number of beams (e.g., 5). Higher values improve quality but increase latency.
GetBeamSize BlueprintPure int32 Returns the current beam size.
SetLengthPenalty BlueprintCallable void Set the length penalty factor for beam search scoring.
GetLengthPenalty BlueprintPure float Returns the current length penalty.
SetBeamSearchDuringStreaming BlueprintCallable void Allow beam search to run during streaming snapshot passes.
IsBeamSearchDuringStreaming BlueprintPure bool Returns true if beam search is used during streaming.
SetBeamSearchGateEnabled BlueprintCallable void Enable duration-based gating --- beam search only activates after a minimum utterance length.
IsBeamSearchGateEnabled BlueprintPure bool Returns true if the beam search gate is active.
SetBeamSearchGateMs BlueprintCallable void Minimum utterance duration in milliseconds before beam search activates.
GetBeamSearchGateMs BlueprintPure int32 Returns the current beam search gate threshold.

Audio Devices

Enumerate and select audio input devices at runtime.

Function Specifier Return Description
RefreshAudioDevices BlueprintCallable void Re-enumerates available audio input devices. Fires OnAudioDevicesRefreshed.
GetAvailableAudioDevices BlueprintPure TArray<FAudioDeviceInfo> Returns all detected audio input devices.
SetAudioDeviceByIndex BlueprintCallable void Switch to a specific device by its index in the device list.
UseDefaultAudioDevice BlueprintCallable void Revert to the system default audio input device.
GetSelectedDeviceIndex BlueprintPure int32 Returns the index of the currently selected device. -1 for system default.
GetSelectedDeviceInfo BlueprintPure FAudioDeviceInfo Returns full info for the currently selected audio device.

Static Helpers (C++ Only)

Function Scope Return Description
GetSTTSubsystem Static UAbolethSTTSubsystem* Retrieves the subsystem from any world context object. Returns nullptr if unavailable.

Blueprint Access

GetSTTSubsystem is C++ only. In Blueprint, use the engine-provided Get Subsystem node and select AbolethSTTSubsystem as the class. This works because UAbolethSTTSubsystem is a UGameInstanceSubsystem.