Name: Unlocking the Potential of the Archives: Combination of Strengths in Advancing Speech Recognition
Start: 2024-10-16T16:00:00+0300
End: 2024-10-16T16:30:00+0300

For more information on the FIAT/IFTA World Conference, visit the FIAT/IFTA website.

Wednesday October 16, 2024 4:00pm - 4:30pm EEST

Hotel Sheraton Bucharest - Arizona

We will present results of combining the latest research in automatic speech recognition (ASR) with European high performance computing (HPC) and large quantities of raw audiovisual data contained in national radio and television archives. The aim of the work was two-fold, firstly, to advance ASR by building models on large public data collections and secondly, to harness the large audio-visual media archives for large-scale qualitative and quantitative media research by generating an automatic indexation based on all spoken content that is decoded by ASR. Only the largest global companies can have access both to the latest ASR development, huge computing resources and huge audio collections, but their commercial interests do not treat all languages equally.

In Europe, most languages are spoken in small countries which, however, have advanced radio and television archives containing millions of hours of broadcasted media content. The latest publicly funded HPC initiatives have also opened researchers an access to unprecedented computational resources. By utilizing the computing and archives it is possible for researchers to develop and publish large pre-trained speech models for many languages without depending on the commercial interests of the large global companies. The large speech models can be pre-trained in a self-supervised fashion which can benefit also from untranscribed and uncategorized audio collections. When openly published, these models make it then easy and quick to develop speech technology applications, such as accurate recognizers for ASR and speech, speaker and audio characteristics for these languages by fine-tuning the models using a feasible amount of transcribed target data.

Speakers

Tommi Lehtonen

Technical Planner, Finland's National Audiovisual Institute

Mr. Lehtonen has Master’s Degree in Folklore Studies from University of Helsinki and Master’s Degree in Information Technology at Metropolia University of Applied Sciences. He has been working in National Audiovisual Institute of Finland (KAVI) for over twelve years. Focus of... Read More →

Mikko Kurimo

Aalto University

Prof. Mikko Kurimo (D.Sc.Tech. 1997 Helsinki University of Technology) is a Full Professor of Speech and Language Processing and the head of the speech recognition research group at Aalto University. He has supervised 18 doctoral theses and 79 master’s thesis and co-ordinated several... Read More →

Wednesday October 16, 2024 4:00pm - 4:30pm EEST
Hotel Sheraton Bucharest - Arizona

Parallel Session, Session 1

FIAT/IFTA World Conference 2024

Tommi Lehtonen

Mikko Kurimo

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!