You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here are some thoughts concerning fairness evaluation:
Protected attributes
Extraction: As methodologies may implement pre-, in- or post-processing to enhance algorithmic fairness, I think we need to extract sensitive attributes at the level of task extraction, not only at evaluation.
Definition: Sensitive attributes may differ across datasets (for instance, ethnicity changes for a patient at different visits in the MIMIC dataset; Canada and France do not record ethnicity; and different datasets may present different levels of granularity). As an initial step, we could focus on age and sex.
Save: Attributes could be saved in the target file and extracted at the same time. Alternatively, we could gather them in a separate file, following a different script.
Metrics
We can distinguish three fairness definitions: Group fairness: A model is fair if performance is equal across groups defined by protected attributes. Causal fairness: A model is fair if the prediction remains unchanged if membership changes. Individual fairness: A model is fair if similar individuals are treated similarly.
We need the causal graph to estimate causal fairness and a meaningful distance for individual fairness. I think, as a first step, group fairness is the simplest to implement (and is widely used in medical ML). So, we could stratify performance per identified protected groups and compute the difference as a measure of fairness.
Let me know what you think!
The text was updated successfully, but these errors were encountered:
Here are some thoughts concerning fairness evaluation:
Protected attributes
Extraction: As methodologies may implement pre-, in- or post-processing to enhance algorithmic fairness, I think we need to extract sensitive attributes at the level of task extraction, not only at evaluation.
Definition: Sensitive attributes may differ across datasets (for instance, ethnicity changes for a patient at different visits in the MIMIC dataset; Canada and France do not record ethnicity; and different datasets may present different levels of granularity). As an initial step, we could focus on age and sex.
Save: Attributes could be saved in the target file and extracted at the same time. Alternatively, we could gather them in a separate file, following a different script.
Metrics
We can distinguish three fairness definitions:
Group fairness: A model is fair if performance is equal across groups defined by protected attributes.
Causal fairness: A model is fair if the prediction remains unchanged if membership changes.
Individual fairness: A model is fair if similar individuals are treated similarly.
We need the causal graph to estimate causal fairness and a meaningful distance for individual fairness. I think, as a first step, group fairness is the simplest to implement (and is widely used in medical ML). So, we could stratify performance per identified protected groups and compute the difference as a measure of fairness.
Let me know what you think!
The text was updated successfully, but these errors were encountered: