6.2.1 - Real-Time Appliance

March 18th, 2025

Realtime Appliance

6.2.1 - Real-Time Appliance

New

GPU & CPU
- New Virtual Appliance architecture with support for our latest generation of Ursa GPU models
- New languages - Hebrew (he), Persian (fa), Irish (ga), Maltese (mt), Urdu (ur), Bengali (bn) and Swahili (sw)
- Automatic Usage Reporting is enabled by default
- Audio Filtering: pre-process audio to remove low-volume background speech which might otherwise be detected and transcribed. Refer to documentation here to get started
- Disfluency removal: automatically remove disfluencies from your transcript. Refer to documentation here to get started
GPU
- GPU Ursa models - all 58 languages are now available on GPU
  - Major transcription accuracy gains
  - Major improvement in Speaker Diarization accuracy
  - Faster transcription
- Bilingual Spanish and English language pack - this enables Spanish and English to be transcribed accurately within the same file
- Audio Events: Detection of music, laughter and applause in media files now supported. Refer to documentation here to get started

Removed

The legacy Speaker Change Detection feature is now obsolete. Any sessions using the speaker_change and channel_and_speaker_change parameters will be rejected

Improvements

GPU
- Ursa2 models released, giving a broad accuracy uplift across languages (Enhanced Operating Point only):
  - Uplift for all languages, including a major improvement for Arabic dialects
  - Updated vocabulary for English
- Improved music detection accuracy in Audio Events
GPU & CPU
- Ursa2 models released, giving a broad accuracy uplift for below languages (Standard operating point only):
  - Uplift for Basque (eu), Estonian (et), Polish (pl), Swedish (sv), Tamil (ta), Turkish (tr), Uyghur (ug)
- Improvements to speaker diarization accuracy
- Lower-latency Finals: the minimum allowed value of the max_delay parameter has been reduced from 2 to 0.7, enabling lower-latency final transcripts. Refer to documentation here for more details
- Faster real-time transcription with more consistent latency
- Major efficiency improvements for real-time transcription on GPU for English, French, German and Spanish with the Enhanced operating point
- Partial transcripts now include numeral formatting, enhanced punctuation, and better casing.
- Improved accuracy for recognition of Arabic numbers and currency

Fixes

Written form for negative percentages in German transcription is now output as "%" instead of "Prozent"
"Kyiv" output consistently in English transcription
Updated English profanity tagging to remove a small number of non-profane words
Fixed an issue with Mandarin (cmn) where consecutive English words were concatenated together
Fix for delay in emitting Final transcripts for a subset of languages when using config containing punctuation_overrides={"permitted_marks":[],...
When the server receives an EndOfStream message, all AddAudio messages received are dropped, and on the first AddAudio received after EndOfStream, the server will send an error message to the client (but not close the connection)
Security fixes. A Software Bill of Materials (SBOM) is available for download from the corresponding release page in our Support Portal.

Known Issues

Security

Vulnerabilities in Nvidia gpu-operator. We upgraded to the latest version, but some vulnerabilities remain unresolved by the vendor. These packages are inaccessible from outside the appliance.
Vulnerabilities in packages shipped with k3s version v1.29.14+k3s1. We upgraded to the latest version, but some vulnerabilities remain unresolved by the vendor. These packages are inaccessible from outside the appliance.
Vulnerabilities in version 12.0.1 of our Real-Time Container; see the 12.0.1 Container release notes for details.

Speechmatics

6.2.1 - Real-Time Appliance

New

Removed

Improvements

Fixes

Known Issues