deploy: 5c9d40c

audeering · Jan 3, 2024 · d171bfa · d171bfa
1 parent 330dae8
commit d171bfa
Show file tree

Hide file tree

Showing 40 changed files with 2,779 additions and 34 deletions.
diff --git a/.doctrees/datasets.doctree b/.doctrees/datasets.doctree
diff --git a/.doctrees/datasets/air.doctree b/.doctrees/datasets/air.doctree
diff --git a/.doctrees/datasets/cough-speech-sneeze.doctree b/.doctrees/datasets/cough-speech-sneeze.doctree
diff --git a/.doctrees/datasets/crema-d.doctree b/.doctrees/datasets/crema-d.doctree
diff --git a/.doctrees/datasets/emodb.doctree b/.doctrees/datasets/emodb.doctree
diff --git a/.doctrees/datasets/micirp.doctree b/.doctrees/datasets/micirp.doctree
diff --git a/.doctrees/datasets/musan.doctree b/.doctrees/datasets/musan.doctree
diff --git a/.doctrees/datasets/vadtoolkit.doctree b/.doctrees/datasets/vadtoolkit.doctree
diff --git a/.doctrees/environment.pickle b/.doctrees/environment.pickle
diff --git a/_images/air.png b/_images/air.png
diff --git a/_images/cough-speech-sneeze.png b/_images/cough-speech-sneeze.png
diff --git a/_images/crema-d.png b/_images/crema-d.png
diff --git a/_images/emodb.png b/_images/emodb.png
diff --git a/_images/micirp.png b/_images/micirp.png
diff --git a/_images/musan.png b/_images/musan.png
diff --git a/_images/vadtoolkit.png b/_images/vadtoolkit.png
diff --git a/_sources/datasets.rst.txt b/_sources/datasets.rst.txt
@@ -18,4 +18,10 @@ For each dataset, the latest version is shown.
     :maxdepth: 1
     :hidden:
 
+    datasets/air
+    datasets/cough-speech-sneeze
+    datasets/crema-d
     datasets/emodb
+    datasets/micirp
+    datasets/musan
+    datasets/vadtoolkit
diff --git a/_sources/datasets/air.rst.txt b/_sources/datasets/air.rst.txt
@@ -0,0 +1,64 @@
+.. _air:
+
+air
+---
+
+Created by Marco Jeub, Magnus Schäfer, Hauke Krüger, Christoph Matthias Nelke, Christophe Beaugeant, Peter Vary
+
+
+============= ======================
+version       `1.4.2 <https://github.com/audeering/air/blob/main/CHANGELOG.md>`__
+license       `MIT <https://opensource.org/licenses/MIT>`__
+source        https://www.iks.rwth-aachen.de/en/research/tools-downloads/databases/aachen-impulse-response-database/
+usage         commercial
+languages     
+format        wav
+channel       2
+sampling rate 48000
+bit depth     16
+duration      0 days 00:04:43.719958333
+files         107
+repository    `data-public <https://audeering.jfrog.io/artifactory/webapp/#/artifacts/browse/tree/General/data-public/air>`__
+published     2023-12-21 by audeering-unittest
+============= ======================
+
+
+Description
+^^^^^^^^^^^
+
+The Aachen Impulse Response (AIR) database is a set of impulse responses that were measured in a wide variety of rooms. The initial aim of the AIR database was to allow for realistic studies of signal processing algorithms in reverberant environments with a special focus on hearing aids applications. The first version was published in 2009 and offers binaural room impulse responses (BRIR) measured with a dummy head in different locations with different acoustical properties, such as reverberation time and room volume. Besides the evaluation of dereverberation algorithms and perceptual investigations of reverberant speech, this part of the database allows for the investigation of head shadowing influence since all recordings where made with and without the dummy head. In a first update, the database was extended to BRIRs with various azimuth angles between head and desired source. This further allows to investigate (binaural) direction-of-arrival (DOA) algorithms as well as the influence of signal processing algorithms on the binaural cues. Since dereverberation can also be applied to telephone speech, the latest extension includes (dual-channel) impulse responses between the artificial mouth of a dummy head and a mock-up phone. The measurements were carried out in compliance with the ITU standards for both the hand-held and the hands-free position. Additional microphone configurations were added in the latest extension. For the third big extension, the IKS has carried out measurements of binaural room impulse responses in the Aula Carolina Aachen. The former church with a ground area of 570m² and a high ceiling shows very strong reverberation effects. The database will successively be extended to further application scenarios. 
+
+Example
+^^^^^^^
+
+:file:`data/air_binaural_stairway_1_1_0.wav`
+
+.. image:: ../air.png
+
+.. raw:: html
+
+    <p><audio controls src="air/data/air_binaural_stairway_1_1_0.wav"></audio></p>
+
+Tables
+^^^^^^
+
+.. csv-table::
+   :header: ID,Type,Columns
+   :widths: 20, 10, 70
+
+    "brir", "filewise", "room, azimuth"
+    "phone", "filewise", "room, mode"
+    "rir", "filewise", "room, distance, reverberation-time"
+
+
+Schemes
+^^^^^^^
+
+.. csv-table::
+   :header: ID,Dtype,Labels,Mappings
+
+    "azimuth", "float", ""
+    "distance", "float", ""
+    "mode", "str", "hand-held, hands-free"
+    "reverberation-time", "float", ""
+    "room", "str", "aula_carolina, bathroom, booth, corridor, kitchen, lecture, meeting, office, stairway", "floor cover, furniture, room height, room length, room width, wall surface"
diff --git a/_sources/datasets/cough-speech-sneeze.rst.txt b/_sources/datasets/cough-speech-sneeze.rst.txt
@@ -0,0 +1,59 @@
+.. _cough-speech-sneeze:
+
+cough-speech-sneeze
+-------------------
+
+Created by S Amiriparian, S Pugachevskiy, N Cummins, D Hantke, J Pohjalainen, G Keren, Schuller, BW
+
+
+============= ======================
+version       `2.0.1 <https://github.com/audeering/cough-speech-sneeze/blob/main/CHANGELOG.md>`__
+license       `CC-BY-4.0 <https://creativecommons.org/licenses/by/4.0/>`__
+source        Dataset based on the publication of Shahin Amiriparian: "Amiriparian, S., Pugachevskiy, S., Cummins, N., Hantke, S., Pohjalainen, J., Keren, G., Schuller, B., 2017. CAST a database: Rapid targeted large-scale big data acquisition via small-world modelling of social media platforms, in: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, pp. 340–345. https://doi.org/10.1109/ACII.2017.8273622"
+usage         commercial
+languages     
+format        wav
+channel       1
+sampling rate 16000, 44100
+bit depth     16
+duration      0 days 03:02:29.436148526
+files         4310
+repository    `data-public <https://audeering.jfrog.io/artifactory/webapp/#/artifacts/browse/tree/General/data-public/cough-speech-sneeze>`__
+published     2024-01-02 by audeering
+============= ======================
+
+
+Description
+^^^^^^^^^^^
+
+Cough-speech-sneeze: a data set of human sounds This dataset was collected by Dr. Shahin Amiriparian. It contains samples of human speech, coughing, and sneezing collected from YouTube, as well as silence clips. The original publication of this (possibly then extended) dataset is the following: Amiriparian, S., Pugachevskiy, S., Cummins, N., Hantke, S., Pohjalainen, J., Keren, G., Schuller, B., 2017. CAST a database: Rapid targeted large-scale big data acquisition via small-world modelling of social media platforms, in: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, pp. 340–345. https://doi.org/10.1109/ACII.2017.8273622
+
+Example
+^^^^^^^
+
+:file:`coughing/6hw6_4eb_hq_18.41-19.81.wav`
+
+.. image:: ../cough-speech-sneeze.png
+
+.. raw:: html
+
+    <p><audio controls src="cough-speech-sneeze/coughing/6hw6_4eb_hq_18.41-19.81.wav"></audio></p>
+
+Tables
+^^^^^^
+
+.. csv-table::
+   :header: ID,Type,Columns
+   :widths: 20, 10, 70
+
+    "files", "filewise", "category, duration"
+
+
+Schemes
+^^^^^^^
+
+.. csv-table::
+   :header: ID,Dtype,Labels
+
+    "category", "str", "coughing, silence, sneezing, speech"
+    "duration", "time", ""
diff --git a/_sources/datasets/crema-d.rst.txt b/_sources/datasets/crema-d.rst.txt
@@ -0,0 +1,96 @@
+.. _crema-d:
+
+crema-d
+-------
+
+Created by Houwei Cao, David G. Cooper, Michael K. Keutmann, Ruben C. Gur, Ani Nenkova, Ragini Verma, Samantha L Moore, Adam Savitt
+
+
+============= ======================
+version       `1.2.0 <https://github.com/audeering/crema-d/blob/main/CHANGELOG.md>`__
+license       `Open Data Commons Open Database License (ODbL) v1.0 <http://opendatacommons.org/licenses/odbl/1.0/>`__
+source        https://github.com/CheyneyComputerScience/CREMA-D
+usage         commercial
+languages     English
+format        wav
+channel       1
+sampling rate 16000
+bit depth     16
+duration      0 days 05:15:21.404187500
+files         7441
+repository    `data-public <https://audeering.jfrog.io/artifactory/webapp/#/artifacts/browse/tree/General/data-public/crema-d>`__
+published     2024-01-02 by audeering
+============= ======================
+
+
+Description
+^^^^^^^^^^^
+
+CREMA-D: Crowd-sourced Emotional Mutimodal Actors Dataset CREMA-D is a data set of 7,442 original clips from 91 actors.  These clips were from 48 male and 43 female actors between the ages of 20 and 74  coming from a variety of races and ethnicities  (African America, Asian, Caucasian, Hispanic, and Unspecified). When using the database commercially, the database must be referenced together with its license.
+
+Example
+^^^^^^^
+
+:file:`1001/1001_TAI_HAP_XX.wav`
+
+.. image:: ../crema-d.png
+
+.. raw:: html
+
+    <p><audio controls src="crema-d/1001/1001_TAI_HAP_XX.wav"></audio></p>
+
+Tables
+^^^^^^
+
+.. csv-table::
+   :header: ID,Type,Columns
+   :widths: 20, 10, 70
+
+    "emotion.categories.desired.dev", "filewise", "emotion, emotion.intensity"
+    "emotion.categories.desired.test", "filewise", "emotion, emotion.intensity"
+    "emotion.categories.desired.train", "filewise", "emotion, emotion.intensity"
+    "emotion.categories.dev", "filewise", "emotion.0, emotion.0.level, emotion.1, emotion.1.level, emotion.2, emotion.2.level, emotion.3, emotion.3.level, emotion.4, emotion.4.level"
+    "emotion.categories.dev.gold_standard", "filewise", "emotion, emotion.level, emotion.agreement"
+    "emotion.categories.dev.votes", "filewise", "anger, disgust, fear, happiness, neutral, sadness"
+    "emotion.categories.face.dev", "filewise", "emotion.0, emotion.0.level, emotion.1, emotion.1.level, emotion.2, emotion.2.level, emotion.3, emotion.3.level, emotion.4, emotion.4.level"
+    "emotion.categories.face.dev.gold_standard", "filewise", "emotion, emotion.level, emotion.agreement"
+    "emotion.categories.face.dev.votes", "filewise", "anger, disgust, fear, happiness, neutral, sadness"
+    "emotion.categories.face.test", "filewise", "emotion.0, emotion.0.level, emotion.1, emotion.1.level, emotion.2, emotion.2.level, emotion.3, emotion.3.level, emotion.4, emotion.4.level"
+    "emotion.categories.face.test.gold_standard", "filewise", "emotion, emotion.level, emotion.agreement"
+    "emotion.categories.face.test.votes", "filewise", "anger, disgust, fear, happiness, neutral, sadness"
+    "emotion.categories.face.train", "filewise", "emotion.0, emotion.0.level, emotion.1, emotion.1.level, emotion.2, emotion.2.level, emotion.3, emotion.3.level, emotion.4, emotion.4.level"
+    "emotion.categories.face.train.gold_standard", "filewise", "emotion, emotion.level, emotion.agreement"
+    "emotion.categories.face.train.votes", "filewise", "anger, disgust, fear, happiness, neutral, sadness"
+    "emotion.categories.multimodal.dev", "filewise", "emotion.0, emotion.0.level, emotion.1, emotion.1.level, emotion.2, emotion.2.level, emotion.3, emotion.3.level"
+    "emotion.categories.multimodal.dev.gold_standard", "filewise", "emotion, emotion.level, emotion.agreement"
+    "emotion.categories.multimodal.dev.votes", "filewise", "anger, disgust, fear, happiness, neutral, sadness"
+    "emotion.categories.multimodal.test", "filewise", "emotion.0, emotion.0.level, emotion.1, emotion.1.level, emotion.2, emotion.2.level, emotion.3, emotion.3.level"
+    "emotion.categories.multimodal.test.gold_standard", "filewise", "emotion, emotion.level, emotion.agreement"
+    "emotion.categories.multimodal.test.votes", "filewise", "anger, disgust, fear, happiness, neutral, sadness"
+    "emotion.categories.multimodal.train", "filewise", "emotion.0, emotion.0.level, emotion.1, emotion.1.level, emotion.2, emotion.2.level, emotion.3, emotion.3.level"
+    "emotion.categories.multimodal.train.gold_standard", "filewise", "emotion, emotion.level, emotion.agreement"
+    "emotion.categories.multimodal.train.votes", "filewise", "anger, disgust, fear, happiness, neutral, sadness"
+    "emotion.categories.test", "filewise", "emotion.0, emotion.0.level, emotion.1, emotion.1.level, emotion.2, emotion.2.level, emotion.3, emotion.3.level, emotion.4, emotion.4.level"
+    "emotion.categories.test.gold_standard", "filewise", "emotion, emotion.level, emotion.agreement"
+    "emotion.categories.test.votes", "filewise", "anger, disgust, fear, happiness, neutral, sadness"
+    "emotion.categories.train", "filewise", "emotion.0, emotion.0.level, emotion.1, emotion.1.level, emotion.2, emotion.2.level, emotion.3, emotion.3.level, emotion.4, emotion.4.level"
+    "emotion.categories.train.gold_standard", "filewise", "emotion, emotion.level, emotion.agreement"
+    "emotion.categories.train.votes", "filewise", "anger, disgust, fear, happiness, neutral, sadness"
+    "files", "filewise", "speaker, corrupted"
+    "sentence", "filewise", "sentence"
+
+
+Schemes
+^^^^^^^
+
+.. csv-table::
+   :header: ID,Dtype,Min,Max,Labels,Mappings
+
+    "corrupted", "bool", "", "", ""
+    "emotion", "str", "", "", "anger, disgust, fear, happiness, neutral, no_agreement, sadness"
+    "emotion.agreement", "float", "", "1", ""
+    "emotion.intensity", "str", "", "", "high, low, mid, unspecified"
+    "emotion.level", "float", "", "100", ""
+    "sentence", "str", "", "", "DFA, IEO, IOM, ITH, ITS, IWL, IWW, MTI, TAI, TIE, TSI, WSI", "✓"
+    "speaker", "int", "", "", "1001, 1002, 1003, 1004, 1005, 1006, 1007, [...], 1084, 1085, 1086, 1087, 1088, 1089, 1090, 1091", "age, ethnicity, race, sex"
+    "votes", "int", "", "", ""
diff --git a/_sources/datasets/emodb.rst.txt b/_sources/datasets/emodb.rst.txt
@@ -7,7 +7,7 @@ Created by Felix Burkhardt, Astrid Paeschke, Miriam Rolfes, Walter Sendlmeier, B
 
 
 ============= ======================
-version       `1.3.0 <https://github.com/audeering/emodb/blob/main/CHANGELOG.md>`__
+version       `1.4.1 <https://github.com/audeering/emodb/blob/main/CHANGELOG.md>`__
 license       `CC0-1.0 <https://creativecommons.org/publicdomain/zero/1.0/>`__
 source        http://emodb.bilderbar.info/download/download.zip
 usage         unrestricted
@@ -19,7 +19,7 @@ bit depth     16
 duration      0 days 00:24:47.092187500
 files         535
 repository    `data-public <https://audeering.jfrog.io/artifactory/webapp/#/artifacts/browse/tree/General/data-public/emodb>`__
-published     2022-08-05 by audeering-unittest
+published     2023-04-05 by audeering-unittest
 ============= ======================
 
 
@@ -33,6 +33,8 @@ Example
 
 :file:`wav/13b09La.wav`
 
+.. image:: ../emodb.png
+
 .. raw:: html
 
     <p><audio controls src="emodb/wav/13b09La.wav"></audio></p>

diff --git a/_sources/datasets/micirp.rst.txt b/_sources/datasets/micirp.rst.txt
@@ -0,0 +1,58 @@
+.. _micirp:
+
+micirp
+------
+
+Created by Stewart Tavener (Xaudia.com)
+
+
+============= ======================
+version       `1.0.0 <https://github.com/audeering/micirp/blob/main/CHANGELOG.md>`__
+license       `CC-BY-SA-4.0 <https://creativecommons.org/licenses/by-sa/4.0/>`__
+source        http://micirp.blogspot.com/
+usage         commercial
+languages     
+format        wav
+channel       1
+sampling rate 44100, 48000
+bit depth     24
+duration      0 days 00:00:27.341591837
+files         66
+repository    `data-public <https://audeering.jfrog.io/artifactory/webapp/#/artifacts/browse/tree/General/data-public/micirp>`__
+published     2023-12-21 by audeering
+============= ======================
+
+
+Description
+^^^^^^^^^^^
+
+The Microphone Impulse Response Project (MicIRP) contains impulse response data for vintage microphones. The impulse response files were created using the analysis software Fuzzmeasure. The microphones were tested using a swept-sine method in a small booth, treated with much acoustic foam, placed about 20 to 30 cm from the source. Although the recording system and booth are calibrated regularly with a Beyerdynamic measurement microphone, there are problems comparing, for example, a figure-8 ribbon with an omnidirectional standard, as they will see different amounts of reflections from the side. So, it should be noted that the impulse response files describe the microphones measured in the booth, rather than in free space.
+
+Example
+^^^^^^^
+
+:file:`dirs/IR_AKGD12.wav`
+
+.. image:: ../micirp.png
+
+.. raw:: html
+
+    <p><audio controls src="micirp/dirs/IR_AKGD12.wav"></audio></p>
+
+Tables
+^^^^^^
+
+.. csv-table::
+   :header: ID,Type,Columns
+   :widths: 20, 10, 70
+
+    "files", "filewise", "manufacturer"
+
+
+Schemes
+^^^^^^^
+
+.. csv-table::
+   :header: ID,Dtype,Labels
+
+    "manufacturer", "str", "AKG, Altec, American, Amperite, Astatic, B&O, BBC, [...], Oktava, RCA, Reslo, STC, Shure, Sony, Telefunken, Toshiba"
diff --git a/_sources/datasets/musan.rst.txt b/_sources/datasets/musan.rst.txt
@@ -0,0 +1,77 @@
+.. _musan:
+
+musan
+-----
+
+Created by David Snyder, Guoguo Chen, Daniel Povey
+
+
+============= ======================
+version       `1.0.0 <https://github.com/audeering/musan/blob/main/CHANGELOG.md>`__
+license       `CC-BY-4.0 <https://creativecommons.org/licenses/by/4.0/>`__
+source        http://www.openslr.org/17/
+usage         commercial
+languages     ara, zho, dan, nld, eng, fra, deu, heb, hun, ita, jpn, lat, pol, por, rus, spa, tgl
+format        wav
+channel       1
+sampling rate 16000
+bit depth     16
+duration      4 days 13:17:22.582937499
+files         2016
+repository    `data-public <https://audeering.jfrog.io/artifactory/webapp/#/artifacts/browse/tree/General/data-public/musan>`__
+published     2023-12-20 by audeering-unittest
+============= ======================
+
+
+Description
+^^^^^^^^^^^
+
+The goal of this corpus is to provide data for music/speech discrimination, speech/nonspeech detection, and voice activity detection. The corpus is divided into music, speech, and noise portions. In total there are approximately 109 hours of audio. Reference: https://arxiv.org/abs/1510.08484
+
+Example
+^^^^^^^
+
+:file:`noise/free-sound/noise-free-sound-0324.wav`
+
+.. image:: ../musan.png
+
+.. raw:: html
+
+    <p><audio controls src="musan/noise/free-sound/noise-free-sound-0324.wav"></audio></p>
+
+Tables
+^^^^^^
+
+.. csv-table::
+   :header: ID,Type,Columns
+   :widths: 20, 10, 70
+
+    "files", "filewise", "duration"
+    "music", "filewise", "genre, vocals, artist, composer"
+    "music.fma", "filewise", "genre, vocals, artist, composer"
+    "music.fma-western-art", "filewise", "genre, vocals, artist, composer"
+    "music.hd-classical", "filewise", "genre, vocals, artist, composer"
+    "music.jamendo", "filewise", "genre, vocals, artist, composer"
+    "music.rfm", "filewise", "genre, vocals, artist, composer"
+    "noise", "filewise", "background_noise"
+    "noise.free-sound", "filewise", "background_noise"
+    "noise.sound-bible", "filewise", "background_noise"
+    "speech", "filewise", "gender, language"
+    "speech.librivox", "filewise", "gender, language"
+    "speech.us-gov", "filewise", "gender, language"
+
+
+Schemes
+^^^^^^^
+
+.. csv-table::
+   :header: ID,Dtype,Labels
+
+    "artist", "str", ""
+    "background_noise", "bool", ""
+    "composer", "str", ""
+    "duration", "time", ""
+    "gender", "str", "female, male"
+    "genre", "str", ""
+    "language", "str", "ara, dan, deu, eng, fra, heb, hun, [...], lat, nld, pol, por, rus, spa, tgl, zho"
+    "vocals", "bool", ""