10.7.0 - Real-Time Container

August 22nd, 2024

Realtime Container

GPU & CPU
- Audio Filtering: pre-process audio to remove low-volume background speech which might otherwise be detected and transcribed. Refer to documentation here to get started
- Disfluency removal: automatically remove disfluencies from your transcript. Refer to documentation here to get started

The legacy Speaker Change Detection feature is now obsolete. Any sessions using the speaker_change and channel_and_speaker_change parameters will be rejected

GPU
- Initial improvements from our Ursa2 accuracy uplift
  - Improved transcription accuracy and updated vocabulary for 31 languages (Enhanced Operating Point only): Bashkir (ba), Basque (eu), Belarusian (be), Bulgarian (bg), Cantonese (yue), Catalan (ca), Danish (da), Esperanto (eo), Estonian (et), Finnish (fi), French (fr), Galician (gl), Greek (el), Hindi (hi), Indonesian (id), Interlingua (ia), Japanese (ja), Korean (ko), Latvian (lv), Malay (ms), Marathi (mr), Mongolian (mn), Norwegian (no), Romanian (ro), Slovenian (sl), Spanish (es), Swedish (sv), Turkish (tr), Ukrainian (uk), Uyghur (ug), Vietnamese (vi)
  - Updated vocabulary for English (Enhanced Operating Point only)
- Improved music detection accuracy in Audio Events
GPU & CPU
- Lower-latency Finals: the minimum allowed value of the max_delay parameter has been reduced from 2 to 0.7, enabling lower-latency final transcripts. Refer to documentation here for more details
- Improved transcription accuracy around endpoints, especially for lower values of max_delay
- When a transcription Final does not contain words which appeared in previous Partials, an AddPartialTranscript message containing the missing words is now sent immediately after the Final
- Start and end times in AddTranscript and AddPartialTranscript messages are now always rounded to 2 decimal places

Written form for negative percentages in German transcription is now output as "%" instead of "Prozent"

Speechmatics