The SpeeD-IL Project: Speech Datasets and Models for Indian Languages

The SpeeD-IL project, in collaboration with UnReaL-TecE LLP and Karya Inc., aims to develop speech resources for India’s underrepresented languages, bridging the data gap in speech technology for non-scheduled and lesser-known languages. Visit the project site and check out the Karya and LiFE App tools.

Speech technologies have seen a successful shift towards data-driven techniques like machine learning and deep learning methods. However, most languages in India have been left out of this technological revolution. This is because there is a lack of sufficient data for training the systems, especially for non-scheduled Indo-Aryan and Dravidian languages and even scheduled languages from the Tibeto-Burman and Austro-Asiatic language families, which are mostly spoken in Eastern and North-Eastern parts of India. 

To address this situation, we have partnered with UnReaL-TecE LLP, Karya Inc. and other organizations to launch the SpeeD-IL project. The aim of this project is to develop speech corpora, resources, and models for underrepresented languages in India. The SpeeD-IL project is a significant step towards making speech technologies accessible to all Indian languages, including those that have been overlooked so far.

The Project Website –  https://sites.google.com/view/speed-il/ 

Tools and Applications – Karya (available on Google Playstore) and LiFE App (http://life.unreal-tece.co.in/)

Team

Publications

Events

podcasts