Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSR method for Banzhaf #520

Merged
merged 47 commits into from
Apr 12, 2024
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
837cbeb
WIP: MSR method for Banzhaf
mdbenito May 18, 2023
800a36a
wip: adapt to new structure of semivalue computation
jakobkruse1 Mar 19, 2024
f579644
linting
jakobkruse1 Mar 19, 2024
699e9cb
wip: refactor to future processor
jakobkruse1 Mar 21, 2024
1f913c8
linting
jakobkruse1 Mar 21, 2024
f657b88
fix type hints
jakobkruse1 Mar 21, 2024
3564ea6
fix type hints
jakobkruse1 Mar 21, 2024
c97821a
fix some bugs to achieve correct banzhaf semivalues
jakobkruse1 Mar 26, 2024
7e7286c
chore and documentation tasks
jakobkruse1 Mar 27, 2024
a34f3aa
Merge branch 'develop' into feature/msr-banzhaf
jakobkruse1 Mar 27, 2024
bfe5637
test msr banzhaf methods
jakobkruse1 Mar 27, 2024
4542263
fix batch size update unintentional change
jakobkruse1 Mar 27, 2024
22c5bdb
update changelog with banzhaf MSR
jakobkruse1 Mar 27, 2024
8f8e187
update list type hint
jakobkruse1 Mar 27, 2024
3a27dfd
test additional functionality added
jakobkruse1 Mar 28, 2024
ea47b24
update documentation with MSR banzhaf
jakobkruse1 Mar 28, 2024
0c83af4
fix documentation links
jakobkruse1 Mar 28, 2024
ec343af
disable social plugin in mkdocs to repair ci
jakobkruse1 Apr 2, 2024
8ba78c3
docs updates from review
jakobkruse1 Apr 2, 2024
de2f2c8
Merge branch 'develop' into feature/msr-banzhaf
jakobkruse1 Apr 2, 2024
c2adbdc
check order of elements in test
jakobkruse1 Apr 2, 2024
57aa335
rename bibliography entry for wang_data_2023
jakobkruse1 Apr 3, 2024
737c0d3
minor changes in stopping formatting
jakobkruse1 Apr 3, 2024
ac2a71d
rewrite banzhaf notebook
jakobkruse1 Apr 4, 2024
7934056
run banzhaf notebook to get outputs for documentation
jakobkruse1 Apr 4, 2024
c6eb063
linter for banzhaf notebook
jakobkruse1 Apr 4, 2024
57eb45a
reenable mkdocs social plugin
jakobkruse1 Apr 4, 2024
0d2c4e0
add consistency check to msr notebook
jakobkruse1 Apr 5, 2024
30807c6
Merge branch 'develop' into feature/msr-banzhaf
jakobkruse1 Apr 5, 2024
be76dc5
speedup test in ci by using another utility
jakobkruse1 Apr 5, 2024
3b7dbb6
reduced runtime of notebook even more
jakobkruse1 Apr 8, 2024
bccb3ca
adapt to new parallel backend structure
jakobkruse1 Apr 8, 2024
637074b
finish notebook for msr banzhaf
jakobkruse1 Apr 8, 2024
7140215
Merge branch 'develop' into feature/msr-banzhaf
jakobkruse1 Apr 8, 2024
9c1509b
Merge branch 'develop' into feature/msr-banzhaf
jakobkruse1 Apr 8, 2024
dca88a7
Some tweaks to notebook text and supporting code
mdbenito Apr 8, 2024
2a395ee
Add number of updates to results dataframe
mdbenito Apr 8, 2024
cb50644
Add notebook to gallery
mdbenito Apr 11, 2024
23532ce
Fix imports and other minor stuff
mdbenito Apr 11, 2024
fdad917
run notebook complete
jakobkruse1 Apr 12, 2024
61fb081
satisfy linters
jakobkruse1 Apr 12, 2024
90ad81b
fix bug in least core caused by new updates column in results
jakobkruse1 Apr 12, 2024
1f7d7f2
Rename rank stability and homogeneize interface
mdbenito Apr 12, 2024
8f25b53
Require explicit stopping criteria (impossible to provide good defaults)
mdbenito Apr 12, 2024
b3b2ec2
Missing exports
mdbenito Apr 12, 2024
f202f88
Further cleanup of the notebook (hiding cells, text changes)
mdbenito Apr 12, 2024
426b867
[skip ci] Update CHANGELOG.md
mdbenito Apr 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@

- New method: `NystroemSketchInfluence`
[PR #504](https://github.com/aai-institute/pyDVL/pull/504)
- New method: `MSR Banzhaf Semivalues`
[PR #504](https://github.com/aai-institute/pyDVL/pull/520)
jakobkruse1 marked this conversation as resolved.
Show resolved Hide resolved
- New preconditioned block variant of conjugate gradient
[PR #507](https://github.com/aai-institute/pyDVL/pull/507)
- Improvements to documentation: fixes, links, text, example gallery, LFS and
Expand Down
27 changes: 27 additions & 0 deletions docs/value/semi-values.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,33 @@ values = compute_banzhaf_semivalues(
)
```

### Banzhaf semi-values with MSR sampling
jakobkruse1 marked this conversation as resolved.
Show resolved Hide resolved
Wang et. al. propose a more sample-efficient method for computing Banzhaf semivalues in their paper
*Data Banzhaf: A Robust Data Valuation Framework for Machine Learning* [@wang_data_2022].
This method updates all semivalues per evaluation of the utility (i.e. per model trained) based on whether
a specific data point was included in the data subset or not.
The mathematical formula for computing the semivalues is
jakobkruse1 marked this conversation as resolved.
Show resolved Hide resolved

$$\hat{\phi}_{MSR}(i) = \frac{1}{|\mathbf{S}_{\ni i}|} \sum_{S \in \mathbf{S}_{\ni i}} U(S)
- \frac{1}{|\mathbf{S}_{\not{\ni} i}|} \sum_{S \in \mathbf{S}_{\not{\ni} i}} U(S)$$

where $\mathbf{S}_{\ni i}$ are the subsets that contain the index $i$ and $\mathbf{S}_{\not{\ni} i}$ are the
subsets not containing the index $i$.

The function implementing this mehod is
jakobkruse1 marked this conversation as resolved.
Show resolved Hide resolved
[compute_msr_banzhaf_semivalues][pydvl.value.semivalues.compute_msr_banzhaf_semivalues].
```python
from pydvl.value import *
jakobkruse1 marked this conversation as resolved.
Show resolved Hide resolved

utility = Utility(model, data)
values = compute_msr_banzhaf_semivalues(
u=utility, done=RankStability(rtol=0.001),
)
```
For further details on how to use this method and a comparison of the sample efficiency, we suggest to take a look
at the example notebook [msr_banzhaf_spotify](../../examples/msr_banzhaf_spotify).


## General semi-values

As explained above, both Beta Shapley and Banzhaf indices are special cases of
Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ nav:
- Data utility learning: examples/shapley_utility_learning.ipynb
- Least Core: examples/least_core_basic.ipynb
- Data OOB: examples/data_oob.ipynb
- Banzhaf Semivalues: examples/msr_banzhaf_spotify.ipynb
- Influence Function:
- For CNNs: examples/influence_imagenet.ipynb
- For mislabeled data: examples/influence_synthetic.ipynb
Expand Down
Loading
Loading