using gen ai speech to text

Using generative AI-speech-to-text output to provide automated monitoring of television subtitles

This technical paper is brought to you by IBC2025.

Abstract

This paper describes a proof-of-concept approach to monitoring timing errors and word loss in TV subtitles. It reviews previous attempts at subtitle monitoring and the problems caused to viewers by subtitle timing errors and word loss. It then introduces the use of speech to text technology and the conventions in subtitling where repetition, non-speech content and errors can make the task of aligning the speech-to-text transcript to subtitles more challenging. The paper describes the approach taken to remove non-speech content from the subtitles and transcript, along with the natural language processing techniques used to ensure a sufficiently accurate alignment between the two. It then gives examples of the ways in which the results are displayed and some sample results showing the scale of problems with subtitle quality. The paper concludes by reviewing the limits of this approach in terms of accuracy and points out the need for human oversight. Then it goes on to discuss where this approach could be used and other subtitle quality issues which could be monitored automatically.

Latest Technical paper
Favourites:

Registered users only: Login

Share this:
Other themes: