Structs & Enums¶
Core data types used throughout the Aboleth STT API.
Enums¶
EAbolethPipelineState¶
Current state of the STT processing pipeline. Reported by GetPipelineState() and the OnPipelineStateChanged delegate.
| Value | Description |
|---|---|
Uninitialized |
The STT system has not been loaded. Call LoadSTTSystem() to initialize. |
Idle |
System is loaded and listening, but no speech is detected. |
SpeechDetected |
The VAD has detected speech. Audio is being monitored for sustained voice activity. |
Accumulating |
Speech is confirmed and audio samples are being buffered for transcription. |
Processing |
Audio has been submitted to Whisper and transcription is in progress. |
Error |
A pipeline error occurred. Check logs or OnSTTProcessingFailed for details. |
Uninitialized ──LoadSTTSystem()──► Idle
│
VAD speech ──► SpeechDetected
│
sustained ──► Accumulating
│
silence ──► Processing
│
result ──► Idle
EAbolethCaptureMode¶
Determines how audio capture is initiated.
| Value | Description |
|---|---|
VADAutomatic |
The Silero VAD automatically detects speech and silence. No user input required. |
PushToTalk |
Audio is only captured between StartManualCapture() and StopManualCapture() calls. |
EAbolethGPUBackend¶
GPU acceleration backend for Whisper inference.
| Value | Description |
|---|---|
Auto |
Automatically selects the best available backend. Prefers CUDA if available, falls back to Vulkan. |
CUDA |
Use NVIDIA CUDA for GPU acceleration. Requires an NVIDIA GPU with CUDA support. |
Vulkan |
Use Vulkan for GPU acceleration. Works on NVIDIA, AMD, and Intel GPUs with Vulkan drivers. |
Structs¶
FAudioLevelInfo¶
Real-time audio level metrics from the microphone. Used for visual metering --- not for VAD decisions.
| Field | Type | Description |
|---|---|---|
CurrentAmplitude |
float |
Instantaneous peak amplitude of the current audio frame (0.0--1.0). |
PeakAmplitude |
float |
Highest amplitude observed since the last reset. Useful for peak-hold meters. |
RMSLevel |
float |
Root-mean-square level of the current frame. Represents average loudness. |
bIsSilent |
bool |
true if the current frame is below the silence threshold. Based on amplitude, not VAD. |
Metering vs VAD
FAudioLevelInfo is computed in MicCaptureComponent for audio level display purposes only. Speech detection is handled entirely by the Silero VAD model, which operates independently of these amplitude metrics.
FAudioDeviceInfo¶
Describes an audio input device detected by the system.
| Field | Type | Description |
|---|---|---|
DeviceName |
FString |
Human-readable device name (e.g., "Blue Yeti Stereo Microphone"). |
DeviceId |
FString |
Platform-specific unique device identifier. |
DeviceIndex |
int32 |
Index in the device list. Pass to SetAudioDeviceByIndex(). |
InputChannels |
int32 |
Number of input channels supported by the device. |
PreferredSampleRate |
int32 |
The device's preferred sample rate in Hz (e.g., 44100, 48000). |
bSupportsHardwareAEC |
bool |
true if the device supports hardware acoustic echo cancellation. |
FAudioCaptureStatus¶
Snapshot of the full audio capture state. Returned by GetMicrophoneStatus().
| Field | Type | Description |
|---|---|---|
AudioLevels |
FAudioLevelInfo |
Current audio level metrics. |
bIsCapturing |
bool |
true if the microphone is actively capturing audio. |
AudioQueueSamples |
int32 |
Number of audio samples currently buffered in the processing queue. |
bIsProcessingSTT |
bool |
true if a transcription pass is currently in progress. |
PipelineState |
EAbolethPipelineState |
Current pipeline state at the time of the query. |
FWhisperWordResult¶
A single word from Whisper's word-level timestamp output. Returned when word timestamps are enabled.
| Field | Type | Description |
|---|---|---|
Text |
FString |
The transcribed word text. |
T0 |
int32 |
Start time of the word in centiseconds (1/100th of a second) from the beginning of the audio segment. |
T1 |
int32 |
End time of the word in centiseconds from the beginning of the audio segment. |
Probability |
float |
Whisper's confidence score for this word (0.0--1.0). |