Development and Evaluation data for Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge

Version 1.0

Chua, Victoria Yi Han; Garcia Perera, Leibny Paola; Khudanpur, Sanjeev; Khong, Andy W. H.; Dauwels, Justin; Woon, Fei Ting; Styles, Suzy J, 2023, "Development and Evaluation data for Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge", https://doi.org/10.21979/N9/ANXS8Z, DR-NTU (Data), V1, UNF:6:QFBERdU0YulYhMohwDaNWg== [fileUNF]

Learn about Data Citation Standards.

Contact Owner

Make Data Count (MDC) Metrics

since 2021-11-01

13,522 Views

946 Downloads )

0 Citations

Description	The inaugural Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge focuses on developing robust language identification and language diarization systems that are reliable for non-standard, accented, spontaneous code-switched, child-directed speech collected via Zoom. This dataset contains the development set and evaluation sets for two Tasks in the 2023 MERLIon CCS Challenge, a special session at INTERSPEECH 2023 (Theme: 'Inclusive Spoken Language Science and Technology – Breaking Down Barriers'). As videocalls become increasingly ubiquitous, we present a unique first-of-its-kind Zoom videocall dataset. The MERLIon CCS Challenge tackles automatic language identification and language diarization in a subset of audio recordings from the Talk Together Study, where parents narrated an onscreen wordless picturebook to their child. The main objectives of this inaugural challenge are: • to benchmark the current and novel language identification and language diarization systems in a code-switching scenario including extremely short utterances; • to test the robustness of such systems under accented speech; • to challenge the research community to propose novel solutions in terms of adaptation, training, and novel embedding extraction for this particular set of tasks. The challenge features language identification (Task 1) and language diarization (Task 2). Two tracks, open and closed, are available. The tracks differ by the data used during system training. More information can be found in the MERLIon CCS Challenge Evaluation Plan and the MERLIon CCS Challenge GitHub. The public release of the Challenge audio data includes minor revisions following the conclusion of the challenge, constituting no more than .0001% of the labeled data. Due to the nature of the audio and the data release agreement with the participants, all downloads from this repository will require an agreement to the terms of use. To preview the metadata associated with the datasets contained here, you can access the documentation without downloading any files here. This collection contains two versions of the data, a legacy archive (LEGACY_ARCHIVE) containing all original files in their original file structure and a set of download files (DOWNLOAD_FILES), formatted for efficient download. In the section below, please click Tree view to see the file structure.
Subject	Engineering; Social Sciences
Keyword	Speech Processing, Engineering Challenge, Codeswitched Speech, Accented Speech, Language Identification, Language Diarization, Child-directed Speech
Related Publication	Woon, F. T., Yogarrajah, E. C., Fong, S., Salleh, N. S. M., Sundaray, S., & Styles, S. J. (2021). Creating a corpus of multilingual parent-child speech remotely: Lessons learned in a large-scale onscreen picturebook sharing task. Frontiers in Psychology, 12. doi: 10.3389/fpsyg.2021.734936
Funding Information	National Research Foundation (NRF): NRF2016-SOL002-011 Nanyang Technological University: CRADLE@NTU grant JHU IO 90071537 Nanyang Technological University: NAP Start Up Grant M4081215.100
License/Data Use Agreement	Custom Dataset Terms

Change View

Table

Tree

Filter by

	1 to 10 of 495 Files	Original Format Archival Format (.tab)
	MERLIon-CCS-Challenge_Development-Set_Audio-and-Metadata_v001.zip.001 DOWNLOAD_FILES/MERLIon-CCS-Challenge_Development-Set_v001/Unknown - 1000.0 MB Published Aug 11, 2023 50 Downloads MD5: 6b49a389e5d0463f19364bd54d7de3d2 The MERLIon CCS Challenge Development Set Audio and Metadata is split into 5 parts. This is part 1 of 5. Download all 5 parts together and extract the data via 7zip.	Access File File Access Public Download Options Original File Format Download Metadata Data File Citation EndNote XML RIS BibTeX
	MERLIon-CCS-Challenge_Development-Set_Audio-and-Metadata_v001.zip.002 DOWNLOAD_FILES/MERLIon-CCS-Challenge_Development-Set_v001/Unknown - 1000.0 MB Published Aug 11, 2023 34 Downloads MD5: 3205814e04c75d8e0b30228729cc4716 The MERLIon CCS Challenge Development Set Audio and Metadata is split into 5 parts. This is part 2 of 5. Download all 5 parts together and extract the data via 7zip.	Access File File Access Public Download Options Original File Format Download Metadata Data File Citation EndNote XML RIS BibTeX
	MERLIon-CCS-Challenge_Development-Set_Audio-and-Metadata_v001.zip.003 DOWNLOAD_FILES/MERLIon-CCS-Challenge_Development-Set_v001/Unknown - 1000.0 MB Published Aug 11, 2023 20 Downloads MD5: 0b4198c578646b6818233ee91dc16986 The MERLIon CCS Challenge Development Set Audio and Metadata is split into 5 parts. This is part 3 of 5. Download all 5 parts together and extract the data via 7zip.	Access File File Access Public Download Options Original File Format Download Metadata Data File Citation EndNote XML RIS BibTeX
	MERLIon-CCS-Challenge_Development-Set_Audio-and-Metadata_v001.zip.004 DOWNLOAD_FILES/MERLIon-CCS-Challenge_Development-Set_v001/Unknown - 1000.0 MB Published Aug 11, 2023 22 Downloads MD5: 38e9fd24ce55ced246cd52211aa1300a The MERLIon CCS Challenge Development Set Audio and Metadata is split into 5 parts. This is part 4 of 5. Download all 5 parts together and extract the data via 7zip.	Access File File Access Public Download Options Original File Format Download Metadata Data File Citation EndNote XML RIS BibTeX
	MERLIon-CCS-Challenge_Development-Set_Audio-and-Metadata_v001.zip.005 DOWNLOAD_FILES/MERLIon-CCS-Challenge_Development-Set_v001/Unknown - 293.4 MB Published Aug 11, 2023 26 Downloads MD5: 9c290ac8af8279949ec13663f2505984 The MERLIon CCS Challenge Development Set Audio and Metadata is split into 5 parts. This is part 5 of 5. Download all 5 parts together and extract the data via 7zip.	Access File File Access Public Download Options Original File Format Download Metadata Data File Citation EndNote XML RIS BibTeX
	MERLIon-CCS-Challenge_Development-Set_Labels_v001.zip DOWNLOAD_FILES/MERLIon-CCS-Challenge_Development-Set_v001/ZIP Archive - 798.6 KB Published Aug 11, 2023 19 Downloads MD5: a9187cb03609055c99fc8bfca8c2c78d	Access File File Access Public Download Options ZIP Archive Download Metadata Data File Citation EndNote XML RIS BibTeX
	MERLIon-CCS-Challenge_Task-1_Evaluation-Set_Audio-and-Metadata_v001.zip.001 DOWNLOAD_FILES/MERLIon-CCS-Challenge_Task-1_Evaluation-Set_v001/Unknown - 1000.0 MB Published Aug 11, 2023 15 Downloads MD5: c343c4a4d6553fa1b9be40f396d43662 The MERLIon CCS Challenge Evaluation Set for Task 1: Audio and Metadata is split into 5 parts. This is part 1 of 5. Download all 5 parts together and extract the data via 7zip.	Access File File Access Public Download Options Original File Format Download Metadata Data File Citation EndNote XML RIS BibTeX
	MERLIon-CCS-Challenge_Task-1_Evaluation-Set_Audio-and-Metadata_v001.zip.002 DOWNLOAD_FILES/MERLIon-CCS-Challenge_Task-1_Evaluation-Set_v001/Unknown - 1000.0 MB Published Aug 11, 2023 10 Downloads MD5: 9fdf42de354294c51e948501012fdea0 The MERLIon CCS Challenge Evaluation Set for Task 1: Audio and Metadata is split into 5 parts. This is part 2 of 5. Download all 5 parts together and extract the data via 7zip.	Access File File Access Public Download Options Original File Format Download Metadata Data File Citation EndNote XML RIS BibTeX
	MERLIon-CCS-Challenge_Task-1_Evaluation-Set_Audio-and-Metadata_v001.zip.003 DOWNLOAD_FILES/MERLIon-CCS-Challenge_Task-1_Evaluation-Set_v001/Unknown - 1000.0 MB Published Aug 11, 2023 8 Downloads MD5: 3a0015301895eca0a90ea1dd8eaab1cb The MERLIon CCS Challenge Evaluation Set for Task 1: Audio and Metadata is split into 5 parts. This is part 3 of 5. Download all 5 parts together and extract the data via 7zip.	Access File File Access Public Download Options Original File Format Download Metadata Data File Citation EndNote XML RIS BibTeX
	MERLIon-CCS-Challenge_Task-1_Evaluation-Set_Audio-and-Metadata_v001.zip.004 DOWNLOAD_FILES/MERLIon-CCS-Challenge_Task-1_Evaluation-Set_v001/Unknown - 1000.0 MB Published Aug 11, 2023 10 Downloads MD5: 2cdf714da49c8897d5e62fbbc8906006 The MERLIon CCS Challenge Evaluation Set for Task 1: Audio and Metadata is split into 5 parts. This is part 4 of 5. Download all 5 parts together and extract the data via 7zip.	Access File File Access Public Download Options Original File Format Download Metadata Data File Citation EndNote XML RIS BibTeX

Citation Metadata

Persistent Identifier	doi:10.21979/N9/ANXS8Z
Publication Date	2023-08-11
Title	Development and Evaluation data for Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge
Author	Chua, Victoria Yi Han (Nanyang Technological University) - ORCID: 0000-0002-0755-3148 Garcia Perera, Leibny Paola (Johns Hopkins University) - ORCID: 0000-0002-7449-5726 Khudanpur, Sanjeev (Johns Hopkins University) - ORCID: 0000-0001-5976-0897 Khong, Andy W. H. (Nanyang Technological University) - ORCID: 0000-0002-0708-4791 Dauwels, Justin (TU Delft) Woon, Fei Ting (Nanyang Technological University) - ORCID: 0000-0003-0096-0784 Styles, Suzy J (Nanyang Technological University) - ORCID: 0000-0003-3517-9680
Point of Contact	Use email button above to contact. Styles, Suzy J. (Nanyang Technological University)
Description	The inaugural Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge focuses on developing robust language identification and language diarization systems that are reliable for non-standard, accented, spontaneous code-switched, child-directed speech collected via Zoom. This dataset contains the development set and evaluation sets for two Tasks in the 2023 MERLIon CCS Challenge, a special session at INTERSPEECH 2023 (Theme: 'Inclusive Spoken Language Science and Technology – Breaking Down Barriers'). As videocalls become increasingly ubiquitous, we present a unique first-of-its-kind Zoom videocall dataset. The MERLIon CCS Challenge tackles automatic language identification and language diarization in a subset of audio recordings from the Talk Together Study, where parents narrated an onscreen wordless picturebook to their child. The main objectives of this inaugural challenge are: • to benchmark the current and novel language identification and language diarization systems in a code-switching scenario including extremely short utterances; • to test the robustness of such systems under accented speech; • to challenge the research community to propose novel solutions in terms of adaptation, training, and novel embedding extraction for this particular set of tasks. The challenge features language identification (Task 1) and language diarization (Task 2). Two tracks, open and closed, are available. The tracks differ by the data used during system training. More information can be found in the MERLIon CCS Challenge Evaluation Plan and the MERLIon CCS Challenge GitHub. The public release of the Challenge audio data includes minor revisions following the conclusion of the challenge, constituting no more than .0001% of the labeled data. Due to the nature of the audio and the data release agreement with the participants, all downloads from this repository will require an agreement to the terms of use. To preview the metadata associated with the datasets contained here, you can access the documentation without downloading any files here. This collection contains two versions of the data, a legacy archive (LEGACY_ARCHIVE) containing all original files in their original file structure and a set of download files (DOWNLOAD_FILES), formatted for efficient download. In the section below, please click Tree view to see the file structure.
Subject	Engineering; Social Sciences
Keyword	Speech Processing Engineering Challenge Codeswitched Speech Accented Speech Language Identification Language Diarization Child-directed Speech
Related Publication	Woon, F. T., Yogarrajah, E. C., Fong, S., Salleh, N. S. M., Sundaray, S., & Styles, S. J. (2021). Creating a corpus of multilingual parent-child speech remotely: Lessons learned in a large-scale onscreen picturebook sharing task. Frontiers in Psychology, 12. doi: 10.3389/fpsyg.2021.734936 https://doi.org/10.3389/fpsyg.2021.734936 Chua, Y. H. V. , Liu, H., Garcia Perera, L. P. , Woon, F. T., Wong, J., Zhang, X., Khudanpur, S., Khong, A. W. H., Dauwels, J. & Styles, S. J. (2023). MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarization. Accepted for Proc. Interspeech 2023. arXiv: arxiv:2305.18881 http://arxiv.org/abs/2305.18881 Styles, S. J. , Chua, Y. H. V., Woon, F. T., Liu, H., Garcia Perera, L. P., Khudanpur, S., Khong, A. W. H., & Dauwels, J. (2023). Investigating model performance in language identification: beyond simple error statistics. Accepted for Interspeech 2023. http://arxiv.org/abs/2305.18925 arXiv: arxiv:2305.18925 http://arxiv.org/abs/2305.18925 Garcia Perera, L. P., Chua, Y. H. V., Liu, H., Woon, F. T., Khong, A. W. H., Dauwels, J. & Styles, S. J., "MERLIon CCS Challenge Evaluation Plan Version 1.2". ArXiv https://doi.org/10.48550/arXiv.2305.19493 arXiv: arxiv:2305.19493 https://arxiv.org/abs/2305.19493
Funding Information	National Research Foundation (NRF): NRF2016-SOL002-011 Nanyang Technological University: CRADLE@NTU grant JHU IO 90071537 Nanyang Technological University: NAP Start Up Grant M4081215.100
Depositor	Styles, Suzy J.
Deposit Date	2023-01-18
Data Type	Development Data: Audio and annotations; Evaluation Data: Audio with limited annotations
Software	.wav .txt 7-Zip
Related Material	GitHub page: "merlion-ccs-2023" in MERLIon-Challenge, GitHub, https://github.com/MERLIon-Challenge/merlion-ccs-2023, last updated 21 February 2023.

Dataset Terms

License/Data Use Agreement

Our Community Norms as well as good scientific practices expect that proper credit is given via citation. Please use the data citation shown on the dataset page.

Custom Dataset Terms — the following Custom Dataset Terms have been defined for this dataset.

To access recordings in these collections, users must agree to the following terms:
I agree to respect the audio recordings:
• I will not presented audio in a context which might cause harm or embarrassment to the speaker. This includes association with racial prejudice, criminality, sexual orientation, religious affiliation, other sensitive material, distressing or unpleasant stimuli in another sensory domain (e.g., unpleasant pictures, unpleasant smells).
• I will not identify any individual as ‘bad at’ any aspect of the task.

I agree to cite the corpus:
• In work arising from this corpus, I will cite the original corpus appropriately.
• Any AI/deep learning derivatives that were developed by training on the data in this corpus must acknowledge this corpus.
• If a speaker has contributed a 'username' for their contribution, the 'username' will appear alongside the filename and citation information in any derivatives. Any derivatives incorporating audio from this dataset must indicate 'some rights reserved' or point to the usage rights of the corpus.

I agree to open access:
• I will not charge others to access the materials in the corpus, or bundle the corpus audio into a for-profit product.
• As an extension of 'fair use,' example audio files can be used to describe the nature of the dataset, a phenomenon of interest in the audio file, or illustrate a procedure in work arising from the corpus, even if the resulting work is a for-profit publication or derivative. Any such transfer of rights to a third party is limited to 1% of the total corpus or 6 whole recordings from the total corpus, whichever is larger. Such uses must include citations to the original along with the statement 'some rights reserved'.

Confidentiality Declaration

Identifiable information has been redacted with a beep of 440Hz. All participants consented to the release of the redacted audio recordings in the dataset.

Restrictions

Some files in the dataset contain timestamp annotations and language labels for segments in each audio recording in the MERLIon CCS Evaluation set for Task 1 and 2 and are locked for use in future challenges. The files are preserved here for archive integrity for future release.

Citation Requirements

Any usage of the data in this corpus must be accompanied by citation. Any AI/deep learning derivatives that were developed by training on the data in this corpus must acknowledge this corpus. Where Usernames have been given, Usernames must be presented alongside any vocal samples used as illustrations of method or results. For example, named or listed in the credits of a documentary; named in a digital file published as supplementary material in a journal article; or listed in live demonstrations (e.g., Presentation at academic conferences, Public science lectures). Any derivatives incorporating audio from this dataset must contain the same usage terms.

Restricted Files + Terms of Access

Restricted Files

There are 4 restricted files in this dataset.

Terms of Access for Restricted Files

These files contain timestamp annotations and language labels for segments in each audio recording in the MERLIon CCS Evaluation set for Task 1 and 2 and are locked for use in future challenges. The files are preserved here for archive integrity for future release.

Request Access

Users may not request access to files.

Guestbook

The following guestbook will prompt a user to provide additional information when downloading a file.

MERLIon CCS Growing Collection Access Agreement

Dataset Version	Summary	Contributors	Published on
No records found.

Edit File

This file has already been deleted (or replaced) in the current version. It may not be edited.

Restrict Access

Restricting limits access to published files. People who want to use the restricted files can request access by default. If you disable request access, you must add information about access to the Terms of Access field.

Learn about restricting files and dataset access in the User Guide.

Request Access

Enable access request

You must enable request access or add terms of access to restrict file access.

Terms of Access for Restricted Files

Save Changes

Edit Embargo

The selected file or files have already been published. Contact an administrator to change the embargo date or reason of the file or files.

Delete Files

The file will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Select File(s)

Please select one or more files.

Share Dataset

Share this dataset on your favorite social media networks.

Continue

Dataset Citations

Citations for this dataset are retrieved from Crossref via DataCite using Make Data Count standards. For more information about dataset metrics, please refer to the User Guide.

Sorry, no citations were found.

Restricted Files Selected

The selected file(s) may not be downloaded because you have not been granted access.

Download Options

The files selected are too large to download as a ZIP.

You can select individual files that are below the 953.7 MB download limit from the files table, or use the Data Access API for programmatic access to the files.

Select File(s)

Please select a file or files to be downloaded.

Restricted Files Selected

The restricted file(s) selected may not be downloaded because you have not been granted access.

Click Continue to download the files you have access to download.

Ineligible Files Selected

Some file(s) cannot be transferred. (They are restricted, embargoed, or not Globus accessible.)

Click Continue to transfer the elligible files.

Delete Dataset

Are you sure you want to delete this dataset and all of its files? You cannot undelete this dataset.

Delete Draft Version

Are you sure you want to delete this draft version? Files will be reverted to the most recently published version. You cannot undelete this draft.

Unpublished Dataset Private URL

Private URL can only be used with unpublished versions of datasets.

Unpublished Dataset Private URL

Are you sure you want to disable the Private URL? If you have shared the Private URL with others they will no longer be able to use it to access your unpublished dataset.

Delete Files

The file(s) will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Compute

This dataset contains restricted files you may not compute on because you have not been granted access.

Deaccession Dataset

Are you sure you want to deaccession? The selected version(s) will no longer be viewable by the public.

Deaccession Dataset

Are you sure you want to deaccession this dataset? It will no longer be viewable by the public.

Version Differences Details

Please select two versions to view the differences.

Version Differences Details

Version:
Last Updated:

Select File(s)

Please select a file or files for access request.

Select File(s)

Embargoed files cannot be accessed. Please select an unembargoed file or files for your access request.

Edit Tags

Select existing file tags or create new tags to describe your files. Each file can have more than one tag.

Request Access

You need to Log In to request access.

Dataset Terms

Please confirm and/or complete the information needed below in order to request access to files in this dataset.

This dataset is made available under the following terms. Please confirm and/or complete the information needed below in order to continue.

License/Data Use Agreement

Our Community Norms as well as good scientific practices expect that proper credit is given via citation. Please use the data citation shown on the dataset page.

Custom terms specific to this dataset Custom Dataset Terms — the following Custom Dataset Terms have been defined for this dataset.

Terms of Use To access recordings in these collections, users must agree to the following terms:
I agree to respect the audio recordings:
• I will not presented audio in a context which might cause harm or embarrassment to the speaker. This includes association with racial prejudice, criminality, sexual orientation, religious affiliation, other sensitive material, distressing or unpleasant stimuli in another sensory domain (e.g., unpleasant pictures, unpleasant smells).
• I will not identify any individual as ‘bad at’ any aspect of the task.

I agree to cite the corpus:
• In work arising from this corpus, I will cite the original corpus appropriately.
• Any AI/deep learning derivatives that were developed by training on the data in this corpus must acknowledge this corpus.
• If a speaker has contributed a 'username' for their contribution, the 'username' will appear alongside the filename and citation information in any derivatives. Any derivatives incorporating audio from this dataset must indicate 'some rights reserved' or point to the usage rights of the corpus.

I agree to open access:
• I will not charge others to access the materials in the corpus, or bundle the corpus audio into a for-profit product.
• As an extension of 'fair use,' example audio files can be used to describe the nature of the dataset, a phenomenon of interest in the audio file, or illustrate a procedure in work arising from the corpus, even if the resulting work is a for-profit publication or derivative. Any such transfer of rights to a third party is limited to 1% of the total corpus or 6 whole recordings from the total corpus, whichever is larger. Such uses must include citations to the original along with the statement 'some rights reserved'.

Confidentiality Declaration Identifiable information has been redacted with a beep of 440Hz. All participants consented to the release of the redacted audio recordings in the dataset.

Restrictions Some files in the dataset contain timestamp annotations and language labels for segments in each audio recording in the MERLIon CCS Evaluation set for Task 1 and 2 and are locked for use in future challenges. The files are preserved here for archive integrity for future release.

Citation Requirements Any usage of the data in this corpus must be accompanied by citation. Any AI/deep learning derivatives that were developed by training on the data in this corpus must acknowledge this corpus. Where Usernames have been given, Usernames must be presented alongside any vocal samples used as illustrations of method or results. For example, named or listed in the credits of a documentary; named in a digital file published as supplementary material in a journal article; or listed in live demonstrations (e.g., Presentation at academic conferences, Public science lectures). Any derivatives incorporating audio from this dataset must contain the same usage terms.

Preview Guestbook

Upon downloading files the guestbook asks for the following information.

Guestbook Name

Collected Data

Account Information

Package File Download

Use the Download URL in a Wget command or a download manager to download this package file. Download via web browser is not recommended. User Guide - Downloading a Dataverse Package via URL

Download URL

https://researchdata.ntu.edu.sg/api/access/datafile/

Compute Batch

Clear Batch

Dataset	Persistent Identifier	Change Compute Batch

Compute Batch

Submit for Review

You will not be able to make changes to this dataset while it is in review.

Publish Dataset

Are you sure you want to republish this dataset?

Publishing dataset means making it publicly available and publicly searchable on the DR-NTU (Data) search engine and third-party search engines (e.g., Google Search or Google Dataset Search).

If you need a second opinion/review, or if you have some concerns, refrain from publishing, and contact NTU Data Librarians (library@ntu.edu.sg).

Please read the following carefully BEFORE you publish your research data:

By posting User Uploads to your dataverse or other dataverses in DR-NTU (Data), or by allowing others to do so, you make the following representations and warranties to DR-NTU (Data):

1. User Uploads do not infringe upon the copyrights or other intellectual property rights, including, but not limited to patent, trademark, trade secret, copyright, right of publicity or other right of any third party;

2. User Uploads do not violate any laws;

3. In the event you become aware of any issues after submitting a User Upload, you will promptly notify DR-NTU (Data) and the relevant DR-NTU (Data) Administrator(s) of any confidentiality, privacy or data protection, licensing, or intellectual property issues regarding the User Uploads;

4. User Uploads do not contain software viruses or any other computer codes, files, or programs that are designed or intended to disrupt, damage, limit or interfere with the proper function of any software, hardware, or telecommunications equipment or to damage or obtain unauthorized access to any system, data files, or other information of DR-NTU (Data) or any third party;

5. User Uploads have been given all relevant, obligatory, and applicable approvals for posting such materials with the content included and in the format uploaded, including but not limited to approvals from the Institutional Review Board and third parties with whom Users have relevant contractual obligations; and

6. User Uploads must be void of all identifiable information, such that re-identification of any subjects from the amalgamation of the information available from all of the materials (across datasets and dataverses) uploaded under any one author and/or User should not be possible. Specifically, User Uploads cannot contain social security numbers; credit card numbers; medical record numbers; health plan numbers; other account numbers of individuals; or biometric identifiers (fingerprints, retina, voice, print, DNA, etc.). The only exceptions for when identifiable information is allowed are when:

a. the information has been previously released to the public;
b. the information describes public figures, where the data relates to their public roles or other non-sensitive subjects; or
c. all identified subjects have given explicit informed consent allowing the public release of the information in the dataset.

Select if this is a minor or major version update.

Minor Release (1.1)

Major Release (2.0)

Publish Dataset

This dataset cannot be published until MERLIon CCS Challenges is published by its administrator.

Publish Dataset

This dataset cannot be published until MERLIon CCS Challenges and Talk Together Corpus of parent-child speech in Singapore are published.

Return to Author

Return this dataset to contributor for modification.