Skip to content

Commit

Permalink
Merge branch 'develop' into 259-implement-class-wise-shapley
Browse files Browse the repository at this point in the history
  • Loading branch information
Markus Semmler committed Sep 22, 2023
2 parents 6deaea3 + 43690b0 commit 1691281
Show file tree
Hide file tree
Showing 37 changed files with 1,007 additions and 502 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ pylint.html
# Saved data
runs/
data/models/
*.pkl

# Docs
docs_build
13 changes: 0 additions & 13 deletions .readthedocs.yaml

This file was deleted.

8 changes: 7 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
## Unreleased

- Implementation of Data-OOB by @BastienZim
[PR #426](https://github.com/aai-institute/pyDVL/pull/426)
[PR #426](https://github.com/aai-institute/pyDVL/pull/426),
[PR $431](https://github.com/aai-institute/pyDVL/pull/431)
- Refactoring of parallel module. Old imports will stop working in v0.9.0
[PR #421](https://github.com/aai-institute/pyDVL/pull/421)

Expand All @@ -26,6 +27,10 @@ randomness.
`pydvl.value.semivalues`. Introduced new type `Seed` and conversion function
`ensure_seed_sequence`.
[PR #396](https://github.com/aai-institute/pyDVL/pull/396)
- Added `batch_size` parameter to `compute_banzhaf_semivalues`,
`compute_beta_shapley_semivalues`, `compute_shapley_semivalues` and
`compute_generic_semivalues`.
[PR #428](https://github.com/aai-institute/pyDVL/pull/428)

### Changed

Expand Down Expand Up @@ -247,3 +252,4 @@ It contains:
- Parallelization of computations with Ray
- Documentation
- Notebooks containing examples of different use cases

13 changes: 7 additions & 6 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,15 +35,15 @@ library. E.g. with venv:
```shell script
python -m venv ./venv
. venv/bin/activate # `venv\Scripts\activate` in windows
pip install -r requirements-dev.txt
pip install -r requirements-dev.txt -r requirements-docs.txt
```

With conda:

```shell script
conda create -n pydvl python=3.8
conda activate pydvl
pip install -r requirements-dev.txt
pip install -r requirements-dev.txt -r requirements-docs.txt
```

A very convenient way of working with your library during development is to
Expand All @@ -54,11 +54,12 @@ pip install -e .
```

In order to build the documentation locally (which is done as part of the tox
suite) you will need [pandoc](https://pandoc.org/). Under Ubuntu it can be
installed with:
suite) [pandoc](https://pandoc.org/) is required. Except for OSX, it should be installed
automatically as a dependency with `requirements-docs.txt`. Under OSX you can
install pandoc (you'll need at least version 2.11) with:

```shell script
sudo apt-get update -yq && apt-get install -yq pandoc
brew install pandoc
```

Remember to mark all autogenerated directories as excluded in your IDE. In
Expand Down Expand Up @@ -151,7 +152,7 @@ cells which are then hidden in the documentation.

In order to do this, cells are marked with tags understood by the mkdocs
plugin [`mkdocs-jupyter`](https://github.com/danielfrg/mkdocs-jupyter#readme),
namely adding the following to the relevant cells:
namely adding the following to the metadata of the relevant cells:

```yaml
"tags": [
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ documentation.

For influence computation, follow these steps:

1. Wrap your model and loss in a `TorchTwiceDifferential` object
1. Wrap your model and loss in a `TorchTwiceDifferentiable` object
2. Compute influence factors by providing training data and inversion method

Using the conjugate gradient algorithm, this would look like:
Expand Down
2 changes: 0 additions & 2 deletions apt-cache/.gitignore

This file was deleted.

1 change: 0 additions & 1 deletion build_scripts/copy_changelog.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import logging
import os
import shutil
from pathlib import Path

import mkdocs.plugins
Expand Down
1 change: 0 additions & 1 deletion build_scripts/copy_notebooks.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import logging
import os
import shutil
from pathlib import Path

import mkdocs.plugins
Expand Down
13 changes: 6 additions & 7 deletions docs/css/extra.css
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,12 @@ a.autorefs-external:hover::after {
}

.md-typeset h2 {
font-size: 1.7em;
font-size: 1.3em;
font-weight: 300;
}

.md-typeset h3 {
font-size: 1.1em;
font-weight: 300;
}

Expand Down Expand Up @@ -77,12 +82,6 @@ a.autorefs-external:hover::after {
user-select: none;
}

/* Nicer style of headers in generated API */
h2 code {
font-size: large!important;
background-color: inherit!important;
}

/* Remove cell input and output prompt */
.jp-InputArea-prompt, .jp-OutputArea-prompt {
display: none !important;
Expand Down
24 changes: 13 additions & 11 deletions docs/value/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,21 +15,23 @@ alias:
training set which reflects its contribution to the final performance of some
model trained on it. Some methods attempt to be model-agnostic, but in most
cases the model is an integral part of the method. In these cases, this number
not an intrinsic property of the element of interest, but typically a function
of three factors:
is not an intrinsic property of the element of interest, but typically a
function of three factors:

1. The dataset $D$, or more generally, the distribution it was sampled
from (with this we mean that *value* would ideally be the (expected)
contribution of a data point to any random set $D$ sampled from the same
distribution).
1. The dataset $D$, or more generally, the distribution it was sampled from: In
some cases one only cares about values wrt. a given data set, in others
value would ideally be the (expected) contribution of a data point to any
random set $D$ sampled from the same distribution. pyDVL implements methods
of the first kind.

2. The algorithm $\mathcal{A}$ mapping the data $D$ to some estimator $f$
in a model class $\mathcal{F}$. E.g. MSE minimization to find the parameters
of a linear model.
2. The algorithm $\mathcal{A}$ mapping the data $D$ to some estimator $f$ in a
model class $\mathcal{F}$. E.g. MSE minimization to find the parameters of a
linear model.

3. The performance metric of interest $u$ for the problem. When value depends on
a model, it must be measured in some way which uses it. E.g. the $R^2$ score or
the negative MSE over a test set.
a model, it must be measured in some way which uses it. E.g. the $R^2$ score
or the negative MSE over a test set. This metric will be computed over a
held-out valuation set.

pyDVL collects algorithms for the computation of data values in this sense,
mostly those derived from cooperative game theory. The methods can be found in
Expand Down
Loading

0 comments on commit 1691281

Please sign in to comment.