August 22nd, 2024
Real-Time Container
GPU & CPU
Audio Filtering: pre-process audio to remove low-volume background speech which might otherwise be detected and transcribed. Refer to documentation here to get started
Disfluency removal: automatically remove disfluencies from your transcript. Refer to documentation here to get started
The legacy Speaker Change Detection feature is now obsolete. Any sessions using the speaker_change and channel_and_speaker_change parameters will be rejected
GPU
Initial improvements from our Ursa2 accuracy uplift
Improved transcription accuracy and updated vocabulary for 31 languages (Enhanced Operating Point only): Bashkir (ba), Basque (eu), Belarusian (be), Bulgarian (bg), Cantonese (yue), Catalan (ca), Danish (da), Esperanto (eo), Estonian (et), Finnish (fi), French (fr), Galician (gl), Greek (el), Hindi (hi), Indonesian (id), Interlingua (ia), Japanese (ja), Korean (ko), Latvian (lv), Malay (ms), Marathi (mr), Mongolian (mn), Norwegian (no), Romanian (ro), Slovenian (sl), Spanish (es), Swedish (sv), Turkish (tr), Ukrainian (uk), Uyghur (ug), Vietnamese (vi)
Updated vocabulary for English (Enhanced Operating Point only)
Improved music detection accuracy in Audio Events
GPU & CPU
Lower-latency Finals: the minimum allowed value of the max_delay parameter has been reduced from 2 to 0.7, enabling lower-latency final transcripts. Refer to documentation here for more details
Improved transcription accuracy around endpoints, especially for lower values of max_delay
When a transcription Final does not contain words which appeared in previous Partials, an AddPartialTranscript message containing the missing words is now sent immediately after the Final
Start and end times in AddTranscript and AddPartialTranscript messages are now always rounded to 2 decimal places
Written form for negative percentages in German transcription is now output as "%" instead of "Prozent"