Skip to content

Project Settings

All Aboleth STT settings are configured in Project Settings > Plugins > Aboleth Speech-to-Text. These map to DefaultGame.ini under the section [/Script/AbolethSTT.WhisperDeveloperSettings].

Runtime changes

Many of these settings can also be changed at runtime without reloading the model. See Runtime Settings API for the full list of setters and getters.


Model

Setting Type Default Description
ModelFilename String ggml-large-v3-turbo-q5_0.bin Active Whisper model filename. Resolved relative to the Plugins/AbolethSTT/Models/ directory.
VADModelFilename String ggml-silero-v6.2.0.bin Silero VAD model filename. Read-only — ships with the plugin.

Note

Changing ModelFilename requires reloading the STT system (LoadSTTSystem). Use the Model Downloader to fetch additional models.


Silero VAD

These settings map directly to whisper_vad_params in the whisper.cpp backend. They control when speech is detected and how segment boundaries are drawn.

Setting Type Default Range Description
threshold Float 0.5 0.0 -- 1.0 Speech probability threshold. Higher values require more confidence before triggering onset.
min_speech_duration_ms Int 250 0 -- 5000 Minimum consecutive milliseconds above threshold before speech onset is confirmed. Prevents transient noise from triggering detection.
min_silence_duration_ms Int 100 0 -- 5000 Minimum consecutive milliseconds below threshold before speech offset is confirmed. Higher values tolerate longer pauses mid-sentence.
max_speech_duration_s Float 0.0 0.0 -- 3600.0 Maximum duration of a single speech segment before forcing a new one. 0.0 disables the limit.
speech_pad_ms Int 30 0 -- 1000 Padding added before and after detected speech. The onset reach-back captures audio slightly before the VAD triggered.
samples_overlap Float 0.1 0.0 -- 2.0 Overlap in seconds when copying speech audio between segments. Ensures no audio is lost at segment boundaries.

Tuning VAD

See the Tuning Guide for practical advice on adjusting these values for different environments and use cases.


Performance

Setting Type Default Description
NumThreads Int 4 Number of CPU threads used by whisper.cpp. Only affects CPU-bound work — GPU inference is not thread-limited.
GPUBackend Enum Auto GPU acceleration backend. Auto selects CUDA if available, then Vulkan, then CPU. Options: Auto, CUDA, Vulkan.
bFlashAttention Bool true Enables Flash Attention for faster encoder inference. Recommended on for all GPU backends.
bDynamicAudioContext Bool false Scales the encoder window to match actual audio length instead of always processing 30 seconds. Reduces latency on short utterances.
MinAudioContext Int 512 Floor value for dynamic audio context. Must be a power of 2. Prevents the encoder window from becoming too small on very short audio.

MinAudioContext

Setting MinAudioContext below 512 may cause artifacts with Whisper models. The default of 512 is recommended for all supported models.


Language

Setting Type Default Description
Language String "auto" Language code for transcription (e.g. "en", "es", "ja"). "auto" enables automatic language detection per utterance.
bTranslateToEnglish Bool false When enabled, translates non-English speech to English. Uses Whisper's built-in translation capability.

Audio

Setting Type Default Range Description
InputGainDb Float 0.0 0.0 -- 20.0 Software microphone gain in decibels. 0 is passthrough (no amplification). 6 dB is approximately 2x gain. 12 dB is approximately 4x gain.
TargetSampleRate Int 16000 -- Target sample rate for Whisper inference. Read-only — Whisper requires 16kHz mono audio.

Tip

If your microphone input is too quiet, increase InputGainDb before adjusting VAD thresholds. A gain of 6 dB doubles the signal amplitude and often resolves low-sensitivity issues without changing detection behavior.


Streaming

Setting Type Default Range Description
bEnableStreaming Bool true -- Enables Local Agreement streaming. When on, partial transcription results are emitted while the user is still speaking.
StreamingChunkIntervalMs Int 1000 50 -- 2000 Interval in milliseconds between streaming inference passes. Lower values give faster feedback but increase GPU load.
BufferTrimmingSec Int 6 3 -- 30 When the accumulated audio buffer exceeds this length in seconds, earlier audio is trimmed. Keeps memory and inference time bounded during long utterances.

Beam search is disabled by default. It improves accuracy on the final transcription pass at the cost of additional compute.

Setting Type Default Range Description
bEnableBeamSearch Bool false -- Enables beam search decoding on the final (post-utterance) inference pass.
BeamSize Int 5 1 -- 10 Number of beams to maintain during search. Higher values explore more hypotheses.
LengthPenalty Float -1.0 -1.0 -- 3.0 Penalty applied to longer sequences. -1.0 disables the penalty. Values above 1.0 favor longer transcriptions.
bBeamSearchDuringStreaming Bool false -- Enables beam search during streaming passes. Significantly increases streaming latency.
bEnableBeamSearchGate Bool true -- Enables adaptive beam dropout. Beams that fall too far behind the leader are pruned early to save compute.
BeamSearchGateMs Int 250 50 -- 2000 Time threshold in milliseconds for the beam search gate. Beams not contributing within this window are dropped.

Note

Enabling beam search during streaming (bBeamSearchDuringStreaming) is generally not recommended. The latency cost outweighs the accuracy gain for real-time use cases. Reserve beam search for the final pass where accuracy matters most.


Debug

Setting Type Default Description
bEnableDebugLogging Bool false Logs per-utterance diagnostics including timing, segment lengths, and VAD decisions. Output goes to the LogAbolethSTT log category.
bEnableProbabilityLogging Bool false Writes per-frame speech probability values to a CSV file for offline analysis. Useful for tuning VAD thresholds against specific audio environments.

Info

For more on debugging tools, see Debug & Profiling.