Dataset for: Probably Pleasant? A Neural-Probabilistic Approach to Automatic Masker Selection for Urban Soundscape Augmentation (doi:10.21979/N9/YSJQKD)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

(external link) (external link)

Document Description

Citation

Title:

Dataset for: Probably Pleasant? A Neural-Probabilistic Approach to Automatic Masker Selection for Urban Soundscape Augmentation

Identification Number:

doi:10.21979/N9/YSJQKD

Distributor:

DR-NTU (Data)

Date of Distribution:

2022-01-28

Version:

1

Bibliographic Citation:

Ooi, Kenneth; Watcharasupat, Karn N.; Lam, Bhan; Ong, Zhen-Ting; Gan, Woon-Seng, 2022, "Dataset for: Probably Pleasant? A Neural-Probabilistic Approach to Automatic Masker Selection for Urban Soundscape Augmentation", https://doi.org/10.21979/N9/YSJQKD, DR-NTU (Data), V1

Study Description

Citation

Title:

Dataset for: Probably Pleasant? A Neural-Probabilistic Approach to Automatic Masker Selection for Urban Soundscape Augmentation

Identification Number:

doi:10.21979/N9/YSJQKD

Authoring Entity:

Ooi, Kenneth (Nanyang Technological University)

Watcharasupat, Karn N. (Nanyang Technological University)

Lam, Bhan (Nanyang Technological University)

Ong, Zhen-Ting (Nanyang Technological University)

Gan, Woon-Seng (Nanyang Technological University)

Software used in Production:

Python

Grant Number:

COT-V4-2020-1

Grant Number:

COT-V4-2020-1

Distributor:

DR-NTU (Data)

Access Authority:

Ooi Wen Rui Kenneth

Depositor:

Ooi Wen Rui Kenneth

Date of Deposit:

2021-10-04

Holdings Information:

https://doi.org/10.21979/N9/YSJQKD

Study Scope

Keywords:

Computer and Information Science, Engineering, Computer and Information Science, Engineering, Soundscape

Abstract:

This dataset contains the log-mel spectrograms for the augmented soundscapes described in our ICASSP 2022 submission "Probably Pleasant? A Neural-Probabilistic Approach to Automatic Masker Selection for Urban Soundscape Augmentation", in <code>.npy</code> format. The data can be accessed using the <code>numpy</code> package of Python, using the command <code>numpy.load</code>. <br><br> The dataset is available as a 5-fold cross validation dataset, with the log-mel spectrograms for each fold having filenames <code>fold_#_features.npy</code> and the subjective ratings for the augmented soundscapes having filenames of the format <code>fold_#_labels.npy</code>, where <code>#</code> is the number of the fold in the set {1,2,3,4,5}. The independent test set has fold index 0. <h1> Generation of augmented soundscapes</h1> Each augmented soundscape was created by adding 30-second excerpts of recordings of sounds known as <i>maskers</i> to binaural recordings of urban soundscapes (element-wise addition in the time domain). Each masker recording only has one class ("construction", "traffic", "water", or "wind") active for the entire duration of the recording, whereas each binaural recording of an urban soundscape may have multiple sound sources active at any point in the recording, including sound sources outside of the four masker classes. <h2> Cross-validation set </h2> The masker samples were obtained from <a href="https://freesound.org">Freesound</a> by searching the names of the masker classes (i.e. "construction", "traffic", "water", and "wind") on Freesound, and randomly picking a selection of tracks containing 30-second sections of sound that corresponded only to that particular masker class. The soundscape samples were obtained from the <a href="https://urban-soundscapes.org">Urban Soundscapes of the World (USotW) dataset</a>, and consisted of all binaural recordings available in the public dataset, minus those with <ol> <li> audible electrical noise,</li> <li> measured in-situ L<sub>A,eq</sub> values below 52 dB, and</li> <li> measured in-situ L<sub>A,eq</sub> values above 77 dB,</li> </ol> in order to <ol> <li> reflect only the accurately-captured real-life soundscapes,</li> <li> ensure that reproduction levels were significantly above the noise floor of the location with the highest noise floor (~36 dB) where the subjective responses were obtained, and</li> <li> ensure safe listening levels for our participants.</li> </ol> In total, 120 out of the 127 publicly-available recordings in the USotW dataset were used for the cross-validation set. <h2> Test set </h2> The masker samples were obtained from <a href="https://freesound.org">Freesound</a> in the same manner as that for the cross-validation set, but ensuring that no overlap in recordings occurred between the test set and cross-validation set maskers. The soundscape samples were taken from binaural recordings of locations in Singapore (which was not represented in any of the soundscapes in the <a href="https://urban-soundscapes.org">USotW dataset</a> and hence the cross-validation set). They were recorded under the similar <a href="https://www.mdpi.com/2076-3417/10/7/2397">Soundscape Indices Protocol</a> and were taken in similar urban contexts as the <a href="https://urban-soundscapes.org">USotW dataset</a> Specifically, they were from <ul> <li>a road facing a construction site,</li> <li>a gazebo in a park,</li> <li>a walkway facing a lake,</li> <li>a walkway facing a crowded canteen,</li> <li>a path facing a lake, and</li> <li>a path facing a lake with an aircraft flying overhead.</li> </ul> <h1>Participant information</h1> The participants of the listening test were a sample of people who were able to physically come down to our laboratory (in Nanyang Technological University, Singapore) to listen to the stimuli and provide their responses. Their mean age was 28.4 ± 11.8 years, and there were a total of 151 female and 149 male participants. All participants were tested to have normal hearing (mean hearing threshold <20 dB (resp. 30 dB) at 0.5, 1, 2, 4, and 6 kHz for participants below (resp. equal to or above) 30 years of age).

Kind of Data:

Processed audio data (log-mel spectrograms)

Methodology and Processing

Sources Statement

Data Access

Other Study Description Materials

Related Publications

Citation

Identification Number:

10.1109/ICASSP43922.2022.9746897

Bibliographic Citation:

Ooi, K., Watcharasupat, K. N., Lam, B., Ong, Z. & Gan, W. (2022). Probably pleasant? A neural-probabilistic approach to automatic masker selection for urban soundscape augmentation. 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2022), 8887-8891.

Citation

Identification Number:

10356/158000

Bibliographic Citation:

Ooi, K., Watcharasupat, K. N., Lam, B., Ong, Z. & Gan, W. (2022). Probably pleasant? A neural-probabilistic approach to automatic masker selection for urban soundscape augmentation. 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2022), 8887-8891.

Other Study-Related Materials

Label:

fold_0_features.npy

Notes:

application/octet-stream

Other Study-Related Materials

Label:

fold_0_labels.npy

Notes:

application/octet-stream

Other Study-Related Materials

Label:

fold_1_features.npy

Notes:

application/octet-stream

Other Study-Related Materials

Label:

fold_1_labels.npy

Notes:

application/octet-stream

Other Study-Related Materials

Label:

fold_2_features.npy

Notes:

application/octet-stream

Other Study-Related Materials

Label:

fold_2_labels.npy

Notes:

application/octet-stream

Other Study-Related Materials

Label:

fold_3_features.npy

Notes:

application/octet-stream

Other Study-Related Materials

Label:

fold_3_labels.npy

Notes:

application/octet-stream

Other Study-Related Materials

Label:

fold_4_features.npy

Notes:

application/octet-stream

Other Study-Related Materials

Label:

fold_4_labels.npy

Notes:

application/octet-stream

Other Study-Related Materials

Label:

fold_5_features.npy

Notes:

application/octet-stream

Other Study-Related Materials

Label:

fold_5_labels.npy

Notes:

application/octet-stream