The SpeeD-IL Project: Speech Datasets and Models for Indian Languages

Speech technologies have seen a successful shift towards data-driven techniques like machine learning and deep learning methods. However, most languages in India have been left out of this technological revolution. This is because there is a lack of sufficient data for training the systems, especially for non-scheduled Indo-Aryan and Dravidian languages and even scheduled languages from the Tibeto-Burman and Austro-Asiatic language families, which are mostly spoken in Eastern and North-Eastern parts of India.

To address this situation, we have partnered with UnReaL-TecE LLP, Karya Inc. and other organizations to launch the SpeeD-IL project. The aim of this project is to develop speech corpora, resources, and models for underrepresented languages in India. The SpeeD-IL project is a significant step towards making speech technologies accessible to all Indian languages, including those that have been overlooked so far.

The Project Website – https://sites.google.com/view/speed-il/

Tools and Applications – Karya (available on Google Playstore) and LiFE App (http://life.unreal-tece.co.in/)

The SpeeD-IL Project: Speech Datasets and Models for Indian Languages

Team

Publications

External Article

Blind Spot

Events

podcasts

Share on Mastodon