The data and code found in this repository correspond to the following paper:
Matthew Smith, Francisco Alvarez,
Identifying mortality factors from Machine Learning using Shapley values - a case of COVID19,
Expert Systems with Applications,
2021,
114832,
ISSN 0957-4174,
https://doi.org/10.1016/j.eswa.2021.114832.
(https://www.sciencedirect.com/science/article/pii/S0957417421002736)
Abstract: In this paper we apply a series of Machine Learning models to a recently published unique dataset on the mortality of COVID19 patients. We use a dataset consisting of blood samples of 375 patients admitted to a hospital in the region of Wuhan, China. There are 201 patients who survived hospitalisation and 174 patients who died whilst in hospital. The focus of the paper is not only on seeing which Machine Learning model is able to obtain the absolute highest accuracy but more on the interpretation of what the Machine Learning models provides. We find that age, days in hospital, Lymphocyte and Neutrophils are important and robust predictors when predicting a patients mortality. Furthermore, the algorithms we use allows us to observe the marginal impact of each variable on a case-by-case patient level, which might help practicioneers to easily detect anomalous patterns. This paper analyses the global and local interpretation of the Machine Learning models on patients with COVID19.
Keywords: Machine Learning; Shapley Values; Coronavirus; COVID19