Dataset for: Probably Pleasant? A Neural-Probabilistic Approach to Automatic Masker Selection for Urban Soundscape Augmentation (doi:10.21979/N9/YSJQKD)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

(external link) (external link)

Document Description
Citation
Title:	Dataset for: Probably Pleasant? A Neural-Probabilistic Approach to Automatic Masker Selection for Urban Soundscape Augmentation
Identification Number:	doi:10.21979/N9/YSJQKD
Distributor:	DR-NTU (Data)
Date of Distribution:	2022-01-28
Version:	1
Bibliographic Citation:	Ooi, Kenneth; Watcharasupat, Karn N.; Lam, Bhan; Ong, Zhen-Ting; Gan, Woon-Seng, 2022, "Dataset for: Probably Pleasant? A Neural-Probabilistic Approach to Automatic Masker Selection for Urban Soundscape Augmentation", https://doi.org/10.21979/N9/YSJQKD, DR-NTU (Data), V1
Study Description
Citation
Title:	Dataset for: Probably Pleasant? A Neural-Probabilistic Approach to Automatic Masker Selection for Urban Soundscape Augmentation
Identification Number:	doi:10.21979/N9/YSJQKD
Authoring Entity:	Ooi, Kenneth (Nanyang Technological University)
	Watcharasupat, Karn N. (Nanyang Technological University)
	Lam, Bhan (Nanyang Technological University)
	Ong, Zhen-Ting (Nanyang Technological University)
	Gan, Woon-Seng (Nanyang Technological University)
Software used in Production:	Python
Grant Number:	COT-V4-2020-1
Grant Number:	COT-V4-2020-1
Distributor:	DR-NTU (Data)
Access Authority:	Ooi Wen Rui Kenneth
Depositor:	Ooi Wen Rui Kenneth
Date of Deposit:	2021-10-04
Holdings Information:	https://doi.org/10.21979/N9/YSJQKD
Study Scope
Keywords:	Computer and Information Science, Engineering, Computer and Information Science, Engineering, Soundscape
Abstract:	This dataset contains the log-mel spectrograms for the augmented soundscapes described in our ICASSP 2022 submission "Probably Pleasant? A Neural-Probabilistic Approach to Automatic Masker Selection for Urban Soundscape Augmentation", in <code>.npy</code> format. The data can be accessed using the <code>numpy</code> package of Python, using the command <code>numpy.load</code>. <br><br> The dataset is available as a 5-fold cross validation dataset, with the log-mel spectrograms for each fold having filenames <code>fold_#_features.npy</code> and the subjective ratings for the augmented soundscapes having filenames of the format <code>fold_#_labels.npy</code>, where <code>#</code> is the number of the fold in the set {1,2,3,4,5}. The independent test set has fold index 0. <h1> Generation of augmented soundscapes</h1> Each augmented soundscape was created by adding 30-second excerpts of recordings of sounds known as <i>maskers</i> to binaural recordings of urban soundscapes (element-wise addition in the time domain). Each masker recording only has one class ("construction", "traffic", "water", or "wind") active for the entire duration of the recording, whereas each binaural recording of an urban soundscape may have multiple sound sources active at any point in the recording, including sound sources outside of the four masker classes. <h2> Cross-validation set </h2> The masker samples were obtained from <a href="https://freesound.org">Freesound</a> by searching the names of the masker classes (i.e. "construction", "traffic", "water", and "wind") on Freesound, and randomly picking a selection of tracks containing 30-second sections of sound that corresponded only to that particular masker class. The soundscape samples were obtained from the <a href="https://urban-soundscapes.org">Urban Soundscapes of the World (USotW) dataset</a>, and consisted of all binaural recordings available in the public dataset, minus those with <ol> <li> audible electrical noise,</li> <li> measured in-situ L<sub>A,eq</sub> values below 52 dB, and</li> <li> measured in-situ L<sub>A,eq</sub> values above 77 dB,</li> </ol> in order to <ol> <li> reflect only the accurately-captured real-life soundscapes,</li> <li> ensure that reproduction levels were significantly above the noise floor of the location with the highest noise floor (~36 dB) where the subjective responses were obtained, and</li> <li> ensure safe listening levels for our participants.</li> </ol> In total, 120 out of the 127 publicly-available recordings in the USotW dataset were used for the cross-validation set. <h2> Test set </h2> The masker samples were obtained from <a href="https://freesound.org">Freesound</a> in the same manner as that for the cross-validation set, but ensuring that no overlap in recordings occurred between the test set and cross-validation set maskers. The soundscape samples were taken from binaural recordings of locations in Singapore (which was not represented in any of the soundscapes in the <a href="https://urban-soundscapes.org">USotW dataset</a> and hence the cross-validation set). They were recorded under the similar <a href="https://www.mdpi.com/2076-3417/10/7/2397">Soundscape Indices Protocol</a> and were taken in similar urban contexts as the <a href="https://urban-soundscapes.org">USotW dataset</a> Specifically, they were from <ul> <li>a road facing a construction site,</li> <li>a gazebo in a park,</li> <li>a walkway facing a lake,</li> <li>a walkway facing a crowded canteen,</li> <li>a path facing a lake, and</li> <li>a path facing a lake with an aircraft flying overhead.</li> </ul> <h1>Participant information</h1> The participants of the listening test were a sample of people who were able to physically come down to our laboratory (in Nanyang Technological University, Singapore) to listen to the stimuli and provide their responses. Their mean age was 28.4 ± 11.8 years, and there were a total of 151 female and 149 male participants. All participants were tested to have normal hearing (mean hearing threshold <20 dB (resp. 30 dB) at 0.5, 1, 2, 4, and 6 kHz for participants below (resp. equal to or above) 30 years of age).
Kind of Data:	Processed audio data (log-mel spectrograms)
Methodology and Processing
Sources Statement
Data Access
Other Study Description Materials
Related Publications
Citation
Identification Number:	10.1109/ICASSP43922.2022.9746897
Bibliographic Citation:	Ooi, K., Watcharasupat, K. N., Lam, B., Ong, Z. & Gan, W. (2022). Probably pleasant? A neural-probabilistic approach to automatic masker selection for urban soundscape augmentation. 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2022), 8887-8891.
Citation
Identification Number:	10356/158000
Bibliographic Citation:	Ooi, K., Watcharasupat, K. N., Lam, B., Ong, Z. & Gan, W. (2022). Probably pleasant? A neural-probabilistic approach to automatic masker selection for urban soundscape augmentation. 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2022), 8887-8891.
Other Study-Related Materials
Label:	fold_0_features.npy
Notes:	application/octet-stream
Other Study-Related Materials
Label:	fold_0_labels.npy
Notes:	application/octet-stream
Other Study-Related Materials
Label:	fold_1_features.npy
Notes:	application/octet-stream
Other Study-Related Materials
Label:	fold_1_labels.npy
Notes:	application/octet-stream
Other Study-Related Materials
Label:	fold_2_features.npy
Notes:	application/octet-stream
Other Study-Related Materials
Label:	fold_2_labels.npy
Notes:	application/octet-stream
Other Study-Related Materials
Label:	fold_3_features.npy
Notes:	application/octet-stream
Other Study-Related Materials
Label:	fold_3_labels.npy
Notes:	application/octet-stream
Other Study-Related Materials
Label:	fold_4_features.npy
Notes:	application/octet-stream
Other Study-Related Materials
Label:	fold_4_labels.npy
Notes:	application/octet-stream
Other Study-Related Materials
Label:	fold_5_features.npy
Notes:	application/octet-stream
Other Study-Related Materials
Label:	fold_5_labels.npy
Notes:	application/octet-stream