Skip to content

Commit

Permalink
update the egyptian fruit bat dataset
Browse files Browse the repository at this point in the history
  • Loading branch information
Radek Osmulski committed Jun 25, 2021
1 parent 9bc7893 commit 1d8e2c1
Show file tree
Hide file tree
Showing 15 changed files with 2,458 additions and 3,419 deletions.
306 changes: 306 additions & 0 deletions egyptian_fruit_bat/01_Download_Dataset.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,306 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Download Dataset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's begin by downloading and extracting the dataset.\n",
"\n",
"To download the full dataset, use this URL: https://archive.org/details/egyptian_fruit_bat\n",
"To download the 10k flavor of the dataset, use this URL: https://archive.org/details/egyptian_fruit_bats_10k\n",
"\n",
"All other steps remain the same."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2021-01-26 11:29:23-- https://archive.org/download/egyptian_fruit_bats/egyptian_fruit_bats.zip\n",
"Resolving archive.org (archive.org)... 207.241.224.2\n",
"Connecting to archive.org (archive.org)|207.241.224.2|:443... connected.\n",
"HTTP request sent, awaiting response... 302 Found\n",
"Location: https://ia601500.us.archive.org/0/items/egyptian_fruit_bats/egyptian_fruit_bats.zip [following]\n",
"--2021-01-26 11:29:27-- https://ia601500.us.archive.org/0/items/egyptian_fruit_bats/egyptian_fruit_bats.zip\n",
"Resolving ia601500.us.archive.org (ia601500.us.archive.org)... 207.241.227.110\n",
"Connecting to ia601500.us.archive.org (ia601500.us.archive.org)|207.241.227.110|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 59031044836 (55G) [application/zip]\n",
"Saving to: ‘egyptian_fruit_bats.zip’\n",
"\n",
"egyptian_fruit_bats 100%[===================>] 54.98G 4.20MB/s in 4h 28m \n",
"\n",
"2021-01-26 15:57:34 (3.50 MB/s) - ‘egyptian_fruit_bats.zip’ saved [59031044836/59031044836]\n",
"\n"
]
}
],
"source": [
"!mkdir data\n",
"!cd data && wget https://archive.org/download/egyptian_fruit_bats/egyptian_fruit_bats.zip\n",
"!cd data && unzip -q egyptian_fruit_bats.zip"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And we are all set!"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"annotations.csv \u001b[0m\u001b[01;34maudio\u001b[0m/ \u001b[01;31megyptian_fruit_bats.zip\u001b[0m\r\n"
]
}
],
"source": [
"ls data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`annotations.csv` contains the labels, while the audio files are stored under `data/audio`"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"91081\r\n"
]
}
],
"source": [
"ls -lht data/audio | wc -l"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A total of 91080 bat vocalizations, all recorded at 250_000 Hz."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[0m\u001b[00;36m0.wav\u001b[0m\n",
"\u001b[00;36m10000.wav\u001b[0m\n",
"\u001b[00;36m10001.wav\u001b[0m\n",
"\u001b[00;36m10002.wav\u001b[0m\n",
"\u001b[00;36m10003.wav\u001b[0m\n",
"\u001b[00;36m10004.wav\u001b[0m\n",
"\u001b[00;36m10005.wav\u001b[0m\n",
"\u001b[00;36m10006.wav\u001b[0m\n",
"\u001b[00;36m10007.wav\u001b[0m\n",
"\u001b[00;36m10008.wav\u001b[0m\n",
"ls: write error\n"
]
}
],
"source": [
"ls data/audio | head"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"anno = pd.read_csv('data/annotations.csv')"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Emitter</th>\n",
" <th>Addressee</th>\n",
" <th>Context</th>\n",
" <th>Emitter pre-vocalization action</th>\n",
" <th>Addressee pre-vocalization action</th>\n",
" <th>Emitter post-vocalization action</th>\n",
" <th>Addressee post-vocalization action</th>\n",
" <th>File Name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>118</td>\n",
" <td>0</td>\n",
" <td>9</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>0.wav</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>11</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1.wav</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>118</td>\n",
" <td>0</td>\n",
" <td>12</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>2.wav</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>12</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>3.wav</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>12</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>4.wav</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Emitter Addressee Context Emitter pre-vocalization action \\\n",
"0 118 0 9 2 \n",
"1 0 0 11 0 \n",
"2 118 0 12 2 \n",
"3 0 0 12 0 \n",
"4 0 0 12 0 \n",
"\n",
" Addressee pre-vocalization action Emitter post-vocalization action \\\n",
"0 2 3 \n",
"1 0 0 \n",
"2 2 3 \n",
"3 0 0 \n",
"4 0 0 \n",
"\n",
" Addressee post-vocalization action File Name \n",
"0 3 0.wav \n",
"1 0 1.wav \n",
"2 3 2.wav \n",
"3 0 3.wav \n",
"4 0 4.wav "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"anno.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Each of the audio files has been assigned six labels. The files are associated with labels via the file name."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Loading

0 comments on commit 1d8e2c1

Please sign in to comment.