Dataset Persistent ID
|
doi:10.21979/N9/4RHC3D |
Publication Date
|
2023-08-11 |
Title
| MERLIon CCS Challenge Development and Evaluation Datasets Open Preview (Documentation) |
Author
| Chua, Victoria Yi Han (Nanyang Technological University) - ORCID: 0000-0002-0755-3148
Styles, Suzy J (Nanyang Technological University) - ORCID: 0000-0003-3517-9680 |
Contact
|
Use email button above to contact.
Suzy J Styles (Nanyang Technological University) |
Description
| The inaugural Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge focuses on developing robust language identification and language diarization systems that are reliable for non-standard, accented, spontaneous code-switched, child-directed speech collected via Zoom.
The inaugural MERLIon CCS Challenge is a special session at INTERSPEECH 2023. This repository is a open preview containing documentation about the files that can be downloaded in the development and evaluation sets for two Tasks in the 2023 MERLIon CCS Challenge.
In work arising from this corpus, please cite the dataset:
Chua, Victoria Yi Han; Garcia Perera, Leibny Paola; Khudanpur, Sanjeev; Khong, Andy W. H.; Dauwels, Justin; Woon, Fei Ting; Styles, Suzy J, 2023, "Development and Evaluation data for Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge", https://doi.org/10.21979/N9/ANXS8Z, DR-NTU (Data), V1 |
Subject
| Engineering; Social Sciences |
Keyword
| Speech Processing
Engineering Challenge
Codeswitched Speech
Accented Speech
Language Identification
Language Diarization
Child-directed Speech |
Related Publication
| Woon, F. T., Yogarrajah, E. C., Fong, S., Salleh, N. S. M., Sundaray, S., & Styles, S. J. (2021). Creating a corpus of multilingual parent-child speech remotely: Lessons learned in a large-scale onscreen picturebook sharing task. Frontiers in Psychology, 12, 734936. doi: 10.3389/fpsyg.2021.734936 https://www.frontiersin.org/articles/10.3389/fpsyg.2021.734936/full
Chua, Y. H. V. , Liu, H., Garcia Perera, L. P. , Woon, F. T., Wong, J., Zhang, X., Khudanpur, S., Khong, A. W. H., Dauwels, J. & Styles, S. J. (2023). MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarization. Accepted for Proc. Interspeech 2023. arXiv: arxiv:2305.18881 http://arxiv.org/abs/2305.18881
Styles, S. J. , Chua, Y. H. V., Woon, F. T., Liu, H., Garcia Perera, L. P., Khudanpur, S., Khong, A. W. H., & Dauwels, J. (2023). Investigating model performance in language identification: beyond simple error statistics. Accepted for Interspeech 2023. http://arxiv.org/abs/2305.18925 arXiv: arxiv:2305.18925 http://arxiv.org/abs/2305.18925
Garcia Perera, L. P., Chua, Y. H. V., Liu, H., Woon, F. T., Khong, A. W. H., Dauwels, J. & Styles, S. J., "MERLIon CCS Challenge Evaluation Plan Version 1.2". ArXiv https://doi.org/10.48550/arXiv.2305.19493 arXiv: arxiv:2305.19493 https://arxiv.org/abs/2305.19493 |
Grant Information
| Nanyang Technological University: NAP Start Up M4081215.100
Nanyang Technological University: CRADLE@NTU JHU IO 90071537
National Research Foundation (NRF): NRF2016-SOL002-011 |
Depositor
| Chua, Victoria Yi Han |
Deposit Date
| 2023-07-20 |
Kind of Data
| Description text data |
Software
| .csv
.txt |
Related Material
| GitHub page: "merlion-ccs-2023" in MERLIon-Challenge, GitHub, https://github.com/MERLIon-Challenge/merlion-ccs-2023, last updated 21 February 2023. |