This dataverse contains specially selected datasets from the Talk Together Study which have been re-coded and adapted for use in MERLIon speech processing challenges. The inaugural challenge was run as part of INTERSPEECH 2023.

MERLIon CCS Challenge 2023 (INTERSPEECH 2023)
MERLIon CCS Challenge Development and Evaluation Datasets Open Preview (Documentation)
Development and Evaluation data for Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge

MERLIon CCS Challenge 2024 TBC.
Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

1 to 10 of 507 Results
Tabular Data - 20.4 KB - 3 Variables, 247 Observations - UNF:6:bDmCMlXNpTBHaqDY+57ZdA==
Contains the timestamps of evaluated regions for language diarization in each audio recording in the MERLIon CCS Challenge development set.
Tabular Data - 10.5 KB - 1 Variables, 151 Observations - UNF:6:DBZ0LJYDQuBp+JpHC4e2wQ==
Contains the filenames of all audio recordings in the MERLIon CCS Challenge development set.
Plain Text - 4.1 KB - MD5: 2a4512e853e7ffdd47d4bb24550c55ca
Contains the release notes and dataset description of the MERLIon CCS Challenge development set.
Tabular Data - 13.2 KB - 5 Variables, 151 Observations - UNF:6:cor05m8DAi1IsQux4AFq2w==
Contains total lengths of English and Mandarin speech in milliseconds and number of English and Mandarin segments in each audio recording in the MERLIon CCS Challenge development set.
Unknown - 1000.0 MB - MD5: 6b49a389e5d0463f19364bd54d7de3d2
The MERLIon CCS Challenge Development Set Audio and Metadata is split into 5 parts. This is part 1 of 5. Download all 5 parts together and extract the data via 7zip.
Unknown - 1000.0 MB - MD5: 3205814e04c75d8e0b30228729cc4716
The MERLIon CCS Challenge Development Set Audio and Metadata is split into 5 parts. This is part 2 of 5. Download all 5 parts together and extract the data via 7zip.
Unknown - 1000.0 MB - MD5: 0b4198c578646b6818233ee91dc16986
The MERLIon CCS Challenge Development Set Audio and Metadata is split into 5 parts. This is part 3 of 5. Download all 5 parts together and extract the data via 7zip.
Unknown - 1000.0 MB - MD5: 38e9fd24ce55ced246cd52211aa1300a
The MERLIon CCS Challenge Development Set Audio and Metadata is split into 5 parts. This is part 4 of 5. Download all 5 parts together and extract the data via 7zip.
Unknown - 293.4 MB - MD5: 9c290ac8af8279949ec13663f2505984
The MERLIon CCS Challenge Development Set Audio and Metadata is split into 5 parts. This is part 5 of 5. Download all 5 parts together and extract the data via 7zip.
Add Data

Log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.