March 18th, 2025
Real-Time Appliance
GPU & CPU
New Virtual Appliance architecture with support for our latest generation of Ursa GPU models
New languages - Hebrew (he), Persian (fa), Irish (ga), Maltese (mt), Urdu (ur), Bengali (bn) and Swahili (sw)
Automatic Usage Reporting is enabled by default
Audio Filtering: pre-process audio to remove low-volume background speech which might otherwise be detected and transcribed. Refer to documentation here to get started
Disfluency removal: automatically remove disfluencies from your transcript. Refer to documentation here to get started
GPU
GPU Ursa models - all 58 languages are now available on GPU
Major transcription accuracy gains
Major improvement in Speaker Diarization accuracy
Faster transcription
Bilingual Spanish and English language pack - this enables Spanish and English to be transcribed accurately within the same file
Audio Events: Detection of music, laughter and applause in media files now supported. Refer to documentation here to get started
The legacy Speaker Change Detection feature is now obsolete. Any sessions using the speaker_change and channel_and_speaker_change parameters will be rejected
GPU
Ursa2 models released, giving a broad accuracy uplift across languages (Enhanced Operating Point only):
Uplift for all languages, including a major improvement for Arabic dialects
Updated vocabulary for English
Improved music detection accuracy in Audio Events
GPU & CPU
Ursa2 models released, giving a broad accuracy uplift for below languages (Standard operating point only):
Uplift for Basque (eu), Estonian (et), Polish (pl), Swedish (sv), Tamil (ta), Turkish (tr), Uyghur (ug)
Improvements to speaker diarization accuracy
Lower-latency Finals: the minimum allowed value of the max_delay parameter has been reduced from 2 to 0.7, enabling lower-latency final transcripts. Refer to documentation here for more details
Faster real-time transcription with more consistent latency
Major efficiency improvements for real-time transcription on GPU for English, French, German and Spanish with the Enhanced operating point
Partial transcripts now include numeral formatting, enhanced punctuation, and better casing.
Improved accuracy for recognition of Arabic numbers and currency
Written form for negative percentages in German transcription is now output as "%" instead of "Prozent"
"Kyiv" output consistently in English transcription
Updated English profanity tagging to remove a small number of non-profane words
Fixed an issue with Mandarin (cmn) where consecutive English words were concatenated together
Fix for delay in emitting Final transcripts for a subset of languages when using config containing punctuation_overrides={"permitted_marks":[],...
When the server receives an EndOfStream message, all AddAudio messages received are dropped, and on the first AddAudio received after EndOfStream, the server will send an error message to the client (but not close the connection)
Security fixes. A Software Bill of Materials (SBOM) is available for download from the corresponding release page in our Support Portal.
Security
Vulnerabilities in Nvidia gpu-operator. We upgraded to the latest version, but some vulnerabilities remain unresolved by the vendor. These packages are inaccessible from outside the appliance.
Vulnerabilities in packages shipped with k3s version v1.29.14+k3s1. We upgraded to the latest version, but some vulnerabilities remain unresolved by the vendor. These packages are inaccessible from outside the appliance.
Vulnerabilities in version 12.0.1 of our Real-Time Container; see the 12.0.1 Container release notes for details.