Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add balancing for finetune and update data README #162

Merged
merged 23 commits into from
Sep 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .github/workflows/format_code.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
name: Check code formatting

on: [push, pull_request]

jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: psf/black@stable
# - uses: psf/black@552baf822992936134cbd31a38f69c8cfe7c0f05

16 changes: 16 additions & 0 deletions .github/workflows/isort.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
name: Run isort

on: [push, pull_request]

jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: isort/isort-action@v1
# - uses: isort/isort-action@master
with:
# isortVersion: 5.13.2
sortPaths: 'nkululeko'
configuration: '--profile black'

11 changes: 8 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ The preferred way to contribute to nkululeko is to fork the [main repository](ht

```bash
git clone https://github.com/YourLogin/nkululeko.git
cd spafe
cd nkululeko
```

3. Remove any previously installed nkululeko versions, then install your local copy with testing dependencies:
Expand All @@ -43,9 +43,14 @@ The preferred way to contribute to nkululeko is to fork the [main repository](ht
-> Please never work directly on the `master` branch!
```

6. Once you are done, make sure to format the code using black to fit spafe's codestyle.
6. Once you are done, make sure to format the code using black to fit Nkululeko's codestyle.

```black nkululeko/```
```bash
black nkululeko/
isort --profile black nkululeko/
# Alternatively and additionaly, use ruff:
ruff check --fix --output-format=full nkululeko
```

7. Make sure that the tests succeed and have enough coverage.

Expand Down
12 changes: 6 additions & 6 deletions data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Nkululeko database repository
# Data


This is the default top directory for Nkululeko data import.Each database should be in its own subfolder (you can also use `ln -sf`` to soft link original database path to these subfolders) and contain a README how to import the data to Nkululeko CSV or audformat.
This is the default top directory for Nkululeko data import. Each database should be in its own subfolder (you can also use `ln -sf` to soft link original database path to these subfolders) and contain a README how to import the data to Nkululeko CSV or audformat.
## Accessibility


Expand All @@ -15,12 +15,10 @@ The column `access` in the table below indicates the database's accessability. T
- `private`: the database is not publicly available on the internet and requires the private information of the owner of the dataset.


To support open science and reproducible research, we only accept PR and recipes for public dataset for now on.
## Databases

To support open science and reproducible research, we encourage to submit PR and recipes for public dataset for now on.
|Name|Target|Description|Access|
| :--- | :--- | :--- | :--- |
|emorynlp|emotion|English, From Friends TV|public|
|emorynlp|emotion|English Emotion Dataset from Friends TV Show|public|
|emns|emotion,intensity|British, singles peaker, UAR=.479|public|
|test|none|Test data for nkululeko|public|
|catsvsdogs|cats_dogs|kaggle test set|public|
Expand Down Expand Up @@ -72,11 +70,13 @@ To support open science and reproducible research, we only accept PR and recipes
|urdu|emotion|Urdu|public|
|polish|emotion|Polish|public|
|cmu-mosei|sentiment,emotion|English, original link dead|public|
|SVD|pathologicalspeech|German|public|
|svd|pahtological speech|German speech data for detecting various pathological voices|public|
|msp-improv|emotion,VAD,naturalness|English|restricted|
|shemo|emotion|Persian|public|
|esd|emotion|English,Chinese|public|


This recipe contains information about 56 datasets.
## Performance

![Nkululeko performance](../meta/images/nkululeko_ser_20240719.png)
3 changes: 2 additions & 1 deletion data/androids/process_database.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,10 @@

"""

import pandas as pd
import os

import audeer
import pandas as pd

dataset_name = 'androids'
data_root = './Androids-Corpus/'
Expand Down
8 changes: 5 additions & 3 deletions data/banglaser/process_database.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,14 @@
GG = Actor ID, 01-34 (odd: male, even: female)
"""

import pandas as pd
from nkululeko.utils.files import find_files
import argparse
from sklearn.model_selection import train_test_split
from pathlib import Path

import pandas as pd
from sklearn.model_selection import train_test_split

from nkululeko.utils.files import find_files


def process_database(data_dir, output_dir):
# check if data_dir exists
Expand Down
1 change: 0 additions & 1 deletion data/crema-d/load_db.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@

import audb


# set download directory to current
cwd = os.getcwd()
audb.config.CACHE_ROOT = cwd
Expand Down
Loading
Loading