An extra column with constant number leads to different imputed result #337
Replies: 4 comments
-
I wouldn't expect the results to be identical.
pred <- make.predictorMatrix(df.with.id.imputed)
pred[, 1] <- 0
df.with.id.imputed2 <- complete(mice(df.with.id.na, predictorMatrix = pred, m=1, maxit = 10, method = 'pmm', seed = 500))
compare(df.origin.imputed, df.with.id.imputed2[, -1])
# TRUE
compare(df.origin.imputed, df.with.constant.imputed[, -1])
# TRUE I assume this answers your questions. Thanks for your interest in |
Beta Was this translation helpful? Give feedback.
-
Thank you! I'm so sorry that the Now as I understand, adding a column of constant won't change the result of imputation, but adding a column of "ID" will change the imputation, so what you did with But I'm still wondering: when doing imputation, shouldn't the columns (variables/attributes/features) be independent of each other and the missing values should be inferred by the values from other rows of the same column (supposing that each row is a sample and each column is a feature)? |
Beta Was this translation helpful? Give feedback.
-
In order to create high-quality imputations we need to account for both the values within the same row and the values within the same column of each missing cell. If we would focus on column values only (e.g. by imputing the mean or by taking a random sample from the observed values from that column) the correlations between the variables after imputation will systematically go down. The role of the imputation model is to preserve the relations in the data as well as the uncertainty about these relations. See the discussion in sections 1.3.3-1.3.5 in https://stefvanbuuren.name/fimd/sec-simplesolutions.html#sec:meanimp for background. |
Beta Was this translation helpful? Give feedback.
-
Thank you Stef! That perfectly answers my question. |
Beta Was this translation helpful? Give feedback.
-
I accidentally added an extra (useless) column with sample ID to my dataset, then "mice" gave me a totally different imputed result. I also tested adding one column with a constant number which had the same issue. Why does this kind of extra column affect the imputation result?
Here is a test code:
Beta Was this translation helpful? Give feedback.
All reactions