Skip to content

This a repository collecting all metrics, algorithms and pieces of code related to data and model quality for Machine Learning, developed by me and others at the MUDI lab (https://www.mudilab.net/mudi/) of the DISCo dept. @ University of Milano-Bicocca

Notifications You must be signed in to change notification settings

AndreaCampagner/qualiMLpy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

qualiMLpy

This a repository collecting all metrics, algorithms and pieces of code related to data quality for Machine Learning, developed by me and others at the MUDI lab (https://www.mudilab.net/mudi/) of the DISCo dept. @ University of Milano-Bicocca.

Code dependencies (also depends on which files you want to use) are: scikit-learn, numpy, scipy, matplotlib.

Some of our code has also been provisionally deployed in the form of web sandbox tools:

You can find more information on most of these metrics and code in various articles:

  • Degree of correspondence (correspondence.py) and degree of correspondence (reliability.py)

    • Cabitza, F., Campagner, A., & Sconfienza, L. M. (2020). As if sand were stone. New concepts and metrics to probe the ground on which to build trustable AI. BMC Medical Informatics and Decision Making, 20(1), 1-21.

    • Cabitza, F., Campagner, A., Albano, D., et al. (2020). The elephant in the machine: Proposing a new metric of data reliability and its application to a medical case to assess classification reliability. Applied Sciences, 10(11), 4014.

  • H-accuracy (ha.py)

    • Campagner, A., Sconfienza, L., & Cabitza, F. (2020). H-accuracy, an alternative metric to assess classification models in medicine. Digital Personalized Health and Medicine; Studies in Health Technology and Informatics; IOS Press: Amsterdam, The Netherlands, 270.
  • Meta-validation Methodology plots (step_one.py + step_two.py)

    • Cabitza, F., Campagner, A., Soares, F., de Guardiana Romualdo, L. G., Challa, F., Sulejmani, A., Seghezzi, M., Carobene, A. (2021). The Importance of Being External. Lessons learnt from the external validation of a machine learning model for COVID-19 diagnosis across 3 continents. Computer Methods and Programs in Biomedicine (Accepted)

In this repository you can also find some general utilities that are not directly authored by me (or anyone @ MUDI lab), for which we provide a easy-to-use Python implementation (mostly compatible with the Python data science stack). For more info, please refer to the following publications:

  • Riley, R. D., Debray, T. P., Collins, G. S., Archer, L., Ensor, J., van Smeden, M., & Snell, K. I. (2021). Minimum sample size for external validation of a clinical prediction model with a binary outcome. Statistics in Medicine.
  • Bradley, A. A., Schwartz, S. S., & Hashino, T. (2008). Sampling uncertainty and confidence intervals for the Brier score and Brier skill score. Weather and Forecasting, 23(5), 992-1006.

About

This a repository collecting all metrics, algorithms and pieces of code related to data and model quality for Machine Learning, developed by me and others at the MUDI lab (https://www.mudilab.net/mudi/) of the DISCo dept. @ University of Milano-Bicocca

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages