SpeeD-IA

Repository for different Speech Datasets and Models for Indo-Aryan languages prepared by the Dr. Bhimrao Ambedkar University and Council for Strategic and Defense Research under different projects, in collaboration with Karya Inc. and UnReaL-TecE LLP.

This repository currently contains the transcription of the speech data collected through the Karya App for the pilot project of the SpeeD-IA project in four languages - Awadhi, Bhojpuri, Braj and Magahi.

The audio could be downloaded here. SpeeD-IA Audio and Transcription is licensed under CC BY-NC-SA 4.0 . For commercial licensing of the dataset, contact UnReaL-TecE LLP.

If you are using the data, please cite the following paper

@inproceedings{interspeech2022,
    author = {Kumar, Ritesh and Singh, Siddharth and Ratan, Shyam and Raj, Mohit and Sinha, Sonal and lahiri, bornini and Seshadri, Vivek and Bali, Kalika and Ojha, Atul Kr.},        
    title = {Annotated Speech Corpus for Low Resource Indian Languages: Awadhi, Bhojpuri, Braj and Magahi},
    booktitle = {Proceedings of Speech for Social Good Workshop, Interspeech 2022},        
    year = {2022}
}

For any queries, please feel free to contact at riteshkr[dot]kmi - the email is at the most popular email domain stating with 'g'.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SpeeD-IA

Files

README.md

Latest commit

History

README.md

File metadata and controls

SpeeD-IA