Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: simplify API documentation #14

Merged
merged 5 commits into from
Aug 28, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions audpsychometric/__init__.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,10 @@
import audpsychometric.core
from audpsychometric.core import datasets
from audpsychometric.core.datasets import list_datasets
from audpsychometric.core.datasets import read_dataset
from audpsychometric.core.gold_standard import agreement_categorical
from audpsychometric.core.gold_standard import agreement_numerical
from audpsychometric.core.gold_standard import evaluator_weighted_estimator
from audpsychometric.core.gold_standard import mode
from audpsychometric.core.gold_standard import rater_agreement_pearson
import audpsychometric.core.reliability
from audpsychometric.core.reliability import congeneric_reliability
from audpsychometric.core.reliability import cronbachs_alpha
from audpsychometric.core.reliability import intra_class_correlation
Expand Down
40 changes: 28 additions & 12 deletions audpsychometric/core/datasets/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
"""Provide example datasets for package."""


__all__ = ["read_dataset", "list_dataset"]

import os
Expand All @@ -19,12 +16,20 @@ def read_dataset(data_set_name: str) -> pd.DataFrame:
retrieves a test dataset from within the package.

Args:
data_set_name(str): string identifier of the dataset.
This does not need not be identical with the filename
data_set_name(str): dataset name

Returns:
table containing dataset

dataframe containing dataset

Examples:
>>> df = read_dataset("wine")
>>> df.head()
Wine Judge Scores
0 1 A 1
1 2 A 1
2 3 A 3
3 4 A 6
4 5 A 6

"""
ds = data_sets.loc[data_sets["dataset"] == data_set_name]
Expand All @@ -38,11 +43,22 @@ def read_dataset(data_set_name: str) -> pd.DataFrame:
def list_datasets():
r"""List tests datasets available in package.

Args:
None
Returns:
table listing available datasets

"""
dataframe listing available datasets

Examples:
>>> list_datasets()
fname ... description
dataset ...
statology statology.csv ... icc sample from web page
hallgren-table5 Hallgren-Table-05.csv ... icc table from publication
hallgren-table3 Hallgren-Table-03.csv ... kappa table from publication
HolzingerSwineford1939 HolzingerSwineford1939.csv ... lavaan
Shrout_Fleiss Shrout_Fleiss_1979.csv ... Dataset from paper
wine wine.csv ... online source
<BLANKLINE>
[6 rows x 4 columns]

""" # noqa: E501
df_data_sets = data_sets.set_index("dataset")
return df_data_sets
5 changes: 5 additions & 0 deletions audpsychometric/core/reliability.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,9 @@ def intra_class_correlation(
The model is based on analysis of variance,
and ratings must at least be ordinally scaled.

CCC_ is conceptually and numerically related to the ICC.
ChristianGeng marked this conversation as resolved.
Show resolved Hide resolved
For an implementation see :func:`audmetric.concordance_cc`.

Args:
ratings: ratings.
When given as a 1-dimensional array,
Expand All @@ -137,6 +140,8 @@ def intra_class_correlation(
anova_method: method for ANOVA calculation,
can be ``"pingouin"`` or ``"statsmodels"``

.. _CCC: https://en.wikipedia.org/wiki/Concordance_correlation_coefficient

Returns:
icc and additional results lumped into dict

Expand Down
75 changes: 4 additions & 71 deletions docs/api-src/audpsychometric.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,84 +3,17 @@ audpsychometric

.. automodule:: audpsychometric

Library to facilitate evaluation and processing of annotated speech.

Pychometric Analysis
--------------------

.. autosummary::
:toctree:
:nosignatures:

cronbachs_alpha
congeneric_reliability
intra_class_correlation

The module currently contains two reliability coefficients
from the family of structural equation model (SEM)-based
reliability coefficients.
One of them is Cronbach's alphas
in the function :func:`audpsychometric.cronbachs_alpha`.
This classical coefficient assumes *tau equivalence*
which requires factor loadings to be homogeneous.
The second coefficient
in the function :func:`audpsychometric.congeneric_reliability`
relaxes this assumption
and only assumes a `one-dimensional congeneric reliability`_ model:
congeneric measurement models are characterized by the fact
that the factor loadings of the indicators
do not have to be homogeneous,
i.e. they can differ.

In addition,
the module implements *Intraclass Correlation (ICC)* analysis.
ICC is based on the analysis of variance of a class of coefficients
that are based on ANOVA
with ratings as the dependent variable,
and terms for targets
(like e.g rated audio chunks),
raters and their interaction are estimated.
Different flavors of ICC are then computed
based on these sum of squares terms.

Note that the CCC_ is conceptually and numerically related to the ICC.
We do not implement it here,
as there are other implementations available,
e.g. :func:`audmetric.concordance_cc`.


Gold Standard Calculation
-------------------------

.. autosummary::
:toctree:
:nosignatures:

agreement_categorical
ChristianGeng marked this conversation as resolved.
Show resolved Hide resolved
agreement_numerical
cronbachs_alpha
congeneric_reliability
evaluator_weighted_estimator
intra_class_correlation
list_datasets
mode
rater_agreement_pearson


Demo Datasets
-------------

.. autosummary::
:toctree:
:nosignatures:

list_datasets
read_dataset

Currently these datasets are defined:

.. jupyter-execute::

from audpsychometric import datasets
df_datasets = datasets.list_datasets()
print(df_datasets)


.. _one-dimensional congeneric reliability: https://en.wikipedia.org/wiki/Congeneric_reliability
.. _CCC: https://en.wikipedia.org/wiki/Concordance_correlation_coefficient
1 change: 0 additions & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@
]
pygments_style = None
extensions = [
"jupyter_sphinx", # executing code blocks
"sphinx.ext.autodoc",
"sphinx.ext.napoleon", # support for Google-style docstrings
"sphinx.ext.viewcode",
Expand Down
3 changes: 0 additions & 3 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,11 +1,8 @@
audeer
ipykernel
jupyter-sphinx
sphinx
sphinx-apipages >=0.1.2
sphinx-audeering-theme >=1.2.1
sphinx-autodoc-typehints
sphinx-copybutton
sphinxcontrib-programoutput
sphinxcontrib-bibtex
toml
4 changes: 2 additions & 2 deletions tests/conftest.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
import numpy as np
import pytest

from audpsychometric import datasets
import audpsychometric


@pytest.fixture(scope="function")
def df_holzinger_swineford():
df_dataset = datasets.read_dataset("HolzingerSwineford1939")
df_dataset = audpsychometric.read_dataset("HolzingerSwineford1939")
cols_use = [col for col in df_dataset.columns if col.startswith("x")]
df = df_dataset[cols_use].astype(np.float32)
return df
Expand Down
4 changes: 2 additions & 2 deletions tests/test_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

def test_list_datasets():
"""First basic dataset is available in dataset list."""
df_datasets = audpsychometric.datasets.list_datasets()
df_datasets = audpsychometric.list_datasets()
assert "statology" in df_datasets.index


Expand All @@ -24,5 +24,5 @@ def test_list_datasets():
)
def test_read_dataset(dataset):
"""Test functional requirement that a dataset can be read into dataframe."""
df_dataset = audpsychometric.datasets.read_dataset(dataset)
df_dataset = audpsychometric.read_dataset(dataset)
assert isinstance(df_dataset, pd.DataFrame)
6 changes: 3 additions & 3 deletions tests/test_reliability.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

def test_icc():
"""Test icc basic result validity."""
df_dataset = audpsychometric.datasets.read_dataset("wine")
df_dataset = audpsychometric.read_dataset("wine")

data_wide = df_dataset.pivot_table(index="Wine", columns="Judge", values="Scores")

Expand All @@ -24,7 +24,7 @@ def test_icc():

def test_cronbachs_alpha():
"""Test cronbach's alpha return values for three raters."""
df_dataset = audpsychometric.datasets.read_dataset("hallgren-table3")
df_dataset = audpsychometric.read_dataset("hallgren-table3")
df = df_dataset[["Dep_Rater1", "Dep_Rater2", "Dep_Rater3"]]
for ratings in [df, df.values]:
alpha, result = audpsychometric.cronbachs_alpha(ratings)
Expand Down Expand Up @@ -56,7 +56,7 @@ def test_anova_helper():

def test_icc_nanremoval():
"""Cover nan removal if statement."""
df_dataset = audpsychometric.datasets.read_dataset("HolzingerSwineford1939")
df_dataset = audpsychometric.read_dataset("HolzingerSwineford1939")
df_dataset = df_dataset[[x for x in df_dataset.columns if x.startswith("x")]]
nan_mat = np.random.random(df_dataset.shape) < 0.1
audpsychometric.intra_class_correlation(df_dataset.mask(nan_mat))
Loading