Imputing all missing values on a selection of variables, without excluding predictors with missings #343
Replies: 2 comments
-
The main difficulty with not propagating missing values is the fact that many algorithms — including PMM, (polytomous) logistic regression, and proportional odds regression, which are the respective defaults in If you suspect that patients with missing values in variables that you do NOT wish to impute are systematically different from other patients (e.g. because those values are intentionally missing), you may include a further indicator variable (variable observed = 0, variable missing = 1) for those predictor variables, which would allow you to model different relationships like different slopes when regressing a imputed variable on the predictor variable. Note: I haven't done this myself yet, so I am not 100% if such an approach would be statistically valid and am merely suggesting this as a possible further route for further inquiry. |
Beta Was this translation helpful? Give feedback.
-
Is it possible to impute all the missing values on a selection of variables which containing missings, whilst keeping variables with missings for the prediction model? No. You cannot do that, at least not in current Conceivably, you could build imputation procedures that are "insensitive" to missing values, so that NA-propagation is not needed. That strategy could work, but it might introduce new issues, for example problems related to varying n's for different variables, bias introduced by ad-hoc fixes, or undercoverage because of single imputation. The MICE algorithm evades these. |
Beta Was this translation helpful? Give feedback.
-
I have a dataset with multiple variables which contain missing values, yet I only want to impute the missings on 4 of these variables. Therefore, I left the method argument empty for all variables, except for these 4. As a result, only values were imputed for rows which had completed values on all the predictor variables. Since I have a large dataset with a lot of missings including (intentional missings), consequently ca. 5% of the missing values were imputed for each variable. I replicated this issue with the nhanes dataset:
From issue #75, and van Buuren 2018 (section 4.7.1) I understand that this is how mice works, and that removing predictors with missings will solve the issue. However, this would not be desirable in my case, as this would result in removing 98/118 variables. Furthermore, many of these variables are related to the missings of the values that I want to impute, so I think they are very useful in the prediction model.
Is it possible to impute all the missing values on a selection of variables which containing missings, whilst keeping variables with missings for the prediction model? In other words, is there an option NOT to propagate the missings within the algorithm?
Beta Was this translation helpful? Give feedback.
All reactions