10.5.0 - Real-Time Container

December 15th, 2023

Realtime Container

New transcription language - Persian (fa)
New bilingual Spanish and English container - this enables Spanish and English to be transcribed accurately within the same audio stream. To pull the new container see here. Only available for GPU
New GPU Ursa models - all 49 languages are now available on GPU
Major transcription accuracy gains
Major improvement in Speaker Diarization accuracy
Faster transcription
Arabic (ar), Bashkir (ba), Basque (eu), Belarusian (be), Bulgarian (bg), Cantonese (yue), Catalan (ca), Croatian (hr), Czech (cs), Danish (da), Dutch (nl), Esperanto (eo), Estonian (et), Finnish (fi), Galician (gl), Greek (el), Hindi (hi), Hungarian (hu), Indonesian (id), Interlingua (ia), Italian (it), Japanese (ja), Korean (ko), Latvian (lv), Lithuanian (lt), Malay (ms), Mandarin (cmn), Marathi (mr), Mongolian (mn), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Slovakian (sk), Slovenian (sl), Swedish (sv), Tamil (ta), Thai (th), Turkish (tr), Ukrainian (uk), Uyghur (ug), Vietnamese (vi), Welsh (cy) with associated GPU Inference Container

GPU & CPU
- Improved models for English transcription (Standard and Enhanced operating points):
  - Enhanced transcription of disfluencies in English. The model now more accurately captures common disfluencies like "um" and "uh". This change makes our ASR even more accurate for verbatim transcription, great for use cases such as audio editing, analytics on hesitations for call centers and legal transcription. For details on how to identify disfluencies in output, see the documentation here
  - More accurate transcription of short utterances of the word "I" in English
  - More accurate transcription of acronyms in English
- Appropriate punctuation is now provided for finals after a pause in speech, improving transcription for downstream workflows such as translation
CPU
- Significantly improved transcription accuracy for English
- Significantly improved transcription accuracy for Norwegian

When Speaker Diarization is enabled, occasionally punctuation can be labeled as Speaker:UU (unknown speaker)

Speechmatics