Skip to content

Commit

Permalink
Add short documentation.
Browse files Browse the repository at this point in the history
  • Loading branch information
Markus Semmler committed Sep 1, 2023
1 parent 5869e6e commit e64c637
Showing 1 changed file with 44 additions and 0 deletions.
44 changes: 44 additions & 0 deletions docs/value/shapley.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,50 @@ stop condition. This is an instance of a
[MaxTime][pydvl.value.stopping.MaxTime] and
[AbsoluteStandardError][pydvl.value.stopping.AbsoluteStandardError].

### Classwise Shapley

**Classwise Shapley** [@schoch_csshapley_2022] is different Shapley schema applicable
for classification problems. The key insight is that samples can be beneficial
for overall performance, while being detrimental for their own class. CS-Shapley changes
the utility to account for this effect by decomposing it into a product of two
functions. It is the in-class accuracy multiplied by a discounter out-of-class accuracy.
The value is defined as:

$$
v_u(x_i) \approx \frac{1}{K \cdot L}
\sum_{S^{(k)}_{-y_i} \subseteq T_{-y_i} \setminus \{i\}}
\sum_{\sigma^{(l)} \in \Pi(T_{y_i} \setminus \{i\})}
[u( \sigma_{\colon i} \cup \{i\} | S_{-y_i} )
− u( \sigma_{\colon i} | S_{-y_i})]
$$

where $K$ is the number of subsets $S^{(k)}_{-y_i}$ sampled from the class complement
set $T_{-y_i}$ of class c and $L$ is the number of permutations sampled from the class
indices set $T_{y_i}$. The scoring function used has the form

$$u(S_{y_i}|S_{-y_i}) = a_S(D_{y_i}))) \exp\{a_S(D_{-y_i}))\}.$$

This can be further customised, but that form is shown by the authors to have certain
desirable properties.


```python
from pydvl.utils import Dataset, Utility
from pydvl.value import compute_classwise_shapley_values, ClasswiseScorer, HistoryDeviation
from pydvl.value.shapley.truncated import RelativeTruncation

model = ...
scoring = ClasswiseScorer("accuracy")
data = Dataset(...)
utility = Utility(model, data, scoring)
values = compute_classwise_shapley_values(
utility,
done=HistoryDeviation(n_steps=500, rtol=1e-3),
truncation=RelativeTruncation(utility, rtol=0.01),
n_resample_complement_sets=10,
normalize_values=True
)
```

### Owen sampling

Expand Down

0 comments on commit e64c637

Please sign in to comment.