Growing Collection

The Growing Collection. This dataverse is a repository for collections of audio & video recordings and transcriptions that have been recorded and released according to the terms of the Growing Collection. Each speaker has been asked whether they would prefer their audio to be identifiable or anonymous. To gain access to audio you must agree to the terms of the Growing Collection

Audio Collections (Words and sentences read aloud)
• Spring-Village Corpus of minimal pairs in Singapore Mandarin Chinese (Mar 2022)
• Pitch-Peach Corpus of minimal pairs and homophones in Singapore English (under construction)
• TOWRE Skilled Adult Readers of Singapore English (under construction)
• Early Word Lists (under construction)
• Zoo-Lǎohǔ Word Set Cross linguistic lexical priming word set for animacy judgements (English and Mandarin Chinese)
• Finding Mushroom Corpus of Malay formal and informal sentences

Audio Collections (Picture Naming Tasks)
• /i/ /a/ /u/ SESAME Picture Card Corpus - Adults speaking English, Mandarin, Malay and Tamil (coming soon)

Audio Collections (Sentences/Narrations/Conversations)
• Laksa Corpus (Archived Jan 2020)
• Leaf & Stone Corpus (Archived Jan 2020)
• Green Grass Park Picture Description Corpus in Singapore Mandarin (Feb 2022)
• SESAME Topic Prompts (under construction)

Video Collections
• Dog-Dark Corpus of hand-shapes in SgSL Corpus (Coming Soon)
• SESAME Topic Prompts SgSL Corpus (Under Development)
• "What a Scary Storm!" SgSL Corpus (Under Development)

Synthesized speech derived from natural speech tokens
• Boat-Vote continuum for categorical perception studies

MERLIon Challenge Collections
• MERLIon CCS Challenge 2023

Corpus Administration
• Growing Collection Release Form (Archived Jan 2020)
• Growing Collection Corpus metadata template (Coming soon)

Terms of use: To access recordings in these collections, users must agree to the following terms: Recordings of audio and video must be treated with respect, and should not be presented in any context which might cause harm or embarrassment to the speaker. For example recordings should not be associated with assessments of racial prejudice, evaluations of likely criminality, sexual orientation, religious affiliation, or any other sensitive material. Recordings should not be paired with distressing or unpleasant stimuli in another sensory domain (e.g., unpleasant pictures, unpleasant smells). No individual should be identified as ‘bad at’ any aspect of the task. Where Usernames have been given, Usernames must be presented alongside any vocal samples used as illustrations of method or results. For example, named or listed in the credits of a documentary; named in a digital file published as supplementary material in a journal article; or listed in live demonstrations (e.g., Presentation at academic conferences, Public science lectures). Any use of files from the Growing Collection must be credited as specified in the Terms of Use for that dataset.

Laksa Corpus

Leaf & Stone Corpus - Singapore

Green Grass Park Picture Description Corpus

Spring Village Corpus of minimal pairs in Mandarin Chinese

Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

191 to 200 of 769 Results

vote-boat_stimulus008.wav Apr 17, 2024 - Boat-Vote continuum for categorical perception studies audio/vnd.wave - 42.6 KB - MD5: d3bb3e8a0a280a8eb7e40cc44b6ab227
vote-boat_stimulus009.wav Apr 17, 2024 - Boat-Vote continuum for categorical perception studies audio/vnd.wave - 42.6 KB - MD5: ffd9b4fdcc1a6f38461063f868b6f584
vote-boat_stimulus010.wav Apr 17, 2024 - Boat-Vote continuum for categorical perception studies audio/vnd.wave - 42.7 KB - MD5: 122edafcff871524564ca0a51b9e463a
Development and Evaluation data for Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge Aug 11, 2023 - MERLIon CCS Challenges Chua, Victoria Yi Han; Garcia Perera, Leibny Paola; Khudanpur, Sanjeev; Khong, Andy W. H.; Dauwels, Justin; Woon, Fei Ting; Styles, Suzy J, 2023, "Development and Evaluation data for Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge", https://doi.org/10.21979/N9/ANXS8Z, DR-NTU (Data), V1, UNF:6:QFBERdU0YulYhMohwDaNWg== [fileUNF] The inaugural Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge focuses on developing robust language identification and language diarization systems that are reliable for non-standard, accented, spontaneous...
MERLIon-CCS-Challenge-2023_Development-Set_Evaluated-Regions_v001.tab Aug 11, 2023 - Development and Evaluation data for Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge Tabular Data - 20.4 KB - 3 Variables, 247 Observations - UNF:6:bDmCMlXNpTBHaqDY+57ZdA== Contains the timestamps of evaluated regions for language diarization in each audio recording in the MERLIon CCS Challenge development set.
MERLIon-CCS-Challenge-2023_Development-Set_v001_File-List.tab Aug 11, 2023 - Development and Evaluation data for Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge Tabular Data - 10.5 KB - 1 Variables, 151 Observations - UNF:6:DBZ0LJYDQuBp+JpHC4e2wQ== Contains the filenames of all audio recordings in the MERLIon CCS Challenge development set.
MERLIon-CCS-Challenge-2023_Development-Set_v001_METADATA.txt Aug 11, 2023 - Development and Evaluation data for Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge Plain Text - 45.6 KB - MD5: 5729c747c57f7719d16ab554173daf1f Contains metadata of the MERLIon CCS Challenge development set.
MERLIon-CCS-Challenge-2023_Development-Set_v001_RELEASE-NOTES.txt Aug 11, 2023 - Development and Evaluation data for Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge Plain Text - 4.1 KB - MD5: 2a4512e853e7ffdd47d4bb24550c55ca Contains the release notes and dataset description of the MERLIon CCS Challenge development set.
MERLIon-CCS-Challenge-2023_Development-Set_v001_Segment-Lengths-Counts.tab Aug 11, 2023 - Development and Evaluation data for Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge Tabular Data - 13.2 KB - 5 Variables, 151 Observations - UNF:6:cor05m8DAi1IsQux4AFq2w== Contains total lengths of English and Mandarin speech in milliseconds and number of English and Mandarin segments in each audio recording in the MERLIon CCS Challenge development set.
MERLIon-CCS-Challenge_Development-Set_Audio-and-Metadata_v001.zip.001 Aug 11, 2023 - Development and Evaluation data for Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge Unknown - 1000.0 MB - MD5: 6b49a389e5d0463f19364bd54d7de3d2 The MERLIon CCS Challenge Development Set Audio and Metadata is split into 5 parts. This is part 1 of 5. Download all 5 parts together and extract the data via 7zip.

vote-boat_stimulus008.wav

Apr 17, 2024 - Boat-Vote continuum for categorical perception studies

audio/vnd.wave - 42.6 KB -

vote-boat_stimulus009.wav

Apr 17, 2024 - Boat-Vote continuum for categorical perception studies

audio/vnd.wave - 42.6 KB -

vote-boat_stimulus010.wav

Apr 17, 2024 - Boat-Vote continuum for categorical perception studies

audio/vnd.wave - 42.7 KB -

Development and Evaluation data for Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge

Aug 11, 2023 - MERLIon CCS Challenges

Chua, Victoria Yi Han; Garcia Perera, Leibny Paola; Khudanpur, Sanjeev; Khong, Andy W. H.; Dauwels, Justin; Woon, Fei Ting; Styles, Suzy J, 2023, "Development and Evaluation data for Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge", https://doi.org/10.21979/N9/ANXS8Z, DR-NTU (Data), V1, UNF:6:QFBERdU0YulYhMohwDaNWg== [fileUNF]

The inaugural Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge focuses on developing robust language identification and language diarization systems that are reliable for non-standard, accented, spontaneous...

MERLIon-CCS-Challenge-2023_Development-Set_Evaluated-Regions_v001.tab

Aug 11, 2023 - Development and Evaluation data for Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge

Tabular Data - 20.4 KB - 3 Variables, 247 Observations -

Contains the timestamps of evaluated regions for language diarization in each audio recording in the MERLIon CCS Challenge development set.

MERLIon-CCS-Challenge-2023_Development-Set_v001_File-List.tab

Aug 11, 2023 - Development and Evaluation data for Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge

Tabular Data - 10.5 KB - 1 Variables, 151 Observations -

Contains the filenames of all audio recordings in the MERLIon CCS Challenge development set.

MERLIon-CCS-Challenge-2023_Development-Set_v001_METADATA.txt

Aug 11, 2023 - Development and Evaluation data for Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge

Plain Text - 45.6 KB -

Contains metadata of the MERLIon CCS Challenge development set.

MERLIon-CCS-Challenge-2023_Development-Set_v001_RELEASE-NOTES.txt

Aug 11, 2023 - Development and Evaluation data for Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge

Plain Text - 4.1 KB -

Contains the release notes and dataset description of the MERLIon CCS Challenge development set.

MERLIon-CCS-Challenge-2023_Development-Set_v001_Segment-Lengths-Counts.tab

Aug 11, 2023 - Development and Evaluation data for Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge

Tabular Data - 13.2 KB - 5 Variables, 151 Observations -

Contains total lengths of English and Mandarin speech in milliseconds and number of English and Mandarin segments in each audio recording in the MERLIon CCS Challenge development set.

MERLIon-CCS-Challenge_Development-Set_Audio-and-Metadata_v001.zip.001

Aug 11, 2023 - Development and Evaluation data for Multilingual Everyday Recordings - Language Identification on Code-Switched Child-Directed Speech (MERLIon CCS) Challenge

Unknown - 1000.0 MB -

The MERLIon CCS Challenge Development Set Audio and Metadata is split into 5 parts. This is part 1 of 5. Download all 5 parts together and extract the data via 7zip.

Add Data

Share Dataverse

Link Dataverse

Reset Modifications