Imputing all missing values on a selection of variables, without excluding predictors with missings #343

verakorenblik · 2021-03-19T09:51:54Z

verakorenblik
Mar 19, 2021

I have a dataset with multiple variables which contain missing values, yet I only want to impute the missings on 4 of these variables. Therefore, I left the method argument empty for all variables, except for these 4. As a result, only values were imputed for rows which had completed values on all the predictor variables. Since I have a large dataset with a lot of missings including (intentional missings), consequently ca. 5% of the missing values were imputed for each variable. I replicated this issue with the nhanes dataset:

# load mice package and the nhanes data
library(mice)
nhanes <- nhanes

# initiate mice, adjust method argument
ini <- mice(nhanes, maxit = 0)
meth <- ini$method
meth[] <- ""
meth["bmi"] <- "pmm"

# perform imputation
imp_01 <- mice(nhanes, method = meth, m = 1, maxit = 5)

# assess how many NAs are left, and if this "person" had missings on the predictor variables
sum(is.na(imp_01$imp$bmi))
who <- rownames(imp_01$imp$bmi)[which(is.na(imp_01$imp$bmi))]
where <- nhanes[unique(who),]

From issue #75, and van Buuren 2018 (section 4.7.1) I understand that this is how mice works, and that removing predictors with missings will solve the issue. However, this would not be desirable in my case, as this would result in removing 98/118 variables. Furthermore, many of these variables are related to the missings of the values that I want to impute, so I think they are very useful in the prediction model.

Is it possible to impute all the missing values on a selection of variables which containing missings, whilst keeping variables with missings for the prediction model? In other words, is there an option NOT to propagate the missings within the algorithm?

prockenschaub · 2021-03-22T08:57:18Z

prockenschaub
Mar 22, 2021

The main difficulty with not propagating missing values is the fact that many algorithms — including PMM, (polytomous) logistic regression, and proportional odds regression, which are the respective defaults in mice — are unable to deal with missing data in the predictors. A possible (and indeed the default) way in mice to get around this issue is to also include those predictors in the imputation. Temporary values are imputed for all variables, which allows e.g. logistic regression to run. You can later reset those to missing in the imputed datasets if desired.

If you suspect that patients with missing values in variables that you do NOT wish to impute are systematically different from other patients (e.g. because those values are intentionally missing), you may include a further indicator variable (variable observed = 0, variable missing = 1) for those predictor variables, which would allow you to model different relationships like different slopes when regressing a imputed variable on the predictor variable. Note: I haven't done this myself yet, so I am not 100% if such an approach would be statistically valid and am merely suggesting this as a possible further route for further inquiry.

0 replies

stefvanbuuren · 2021-03-22T10:59:16Z

stefvanbuuren
Mar 22, 2021
Maintainer

Is it possible to impute all the missing values on a selection of variables which containing missings, whilst keeping variables with missings for the prediction model?

No. You cannot do that, at least not in current mice(). It would require fitting the imputation model on incomplete data, which is only possible if we have some way of dealing with the missing data problem in the predictors. The MICE algorithm does this by imputing the missing values in the predictors. If it finds a missing value in a predictor (e.g. because of where), then it sets the imputed value in the outcome for that case to NA.

Conceivably, you could build imputation procedures that are "insensitive" to missing values, so that NA-propagation is not needed. That strategy could work, but it might introduce new issues, for example problems related to varying n's for different variables, bias introduced by ad-hoc fixes, or undercoverage because of single imputation. The MICE algorithm evades these.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Imputing all missing values on a selection of variables, without excluding predictors with missings #343

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Imputing all missing values on a selection of variables, without excluding predictors with missings #343

verakorenblik Mar 19, 2021

Replies: 2 comments

prockenschaub Mar 22, 2021

stefvanbuuren Mar 22, 2021 Maintainer

verakorenblik
Mar 19, 2021

prockenschaub
Mar 22, 2021

stefvanbuuren
Mar 22, 2021
Maintainer