How to perform 'with' in some complex analysis instead of lm? #332
Replies: 8 comments
-
If you'd like to combine models without parameters or with different predictor sets across the imputed data sets, you'd need ensemble techniques. See e.g. this vignette for an example that uses majority voting on stepwise selected models. |
Beta Was this translation helpful? Give feedback.
-
If your aim is to obtain predictions, it is more efficient to pool the predicted values and not pool the parameters/models first. See below for a library(mice) # Multiple Imputation
library(dplyr) # Data manipulation
library(tidyr) # Tidy data
library(magrittr) # Pipes
library(purrr) # Functional programming - map()
set.seed(123) # Fix RNG seed
imp <- mice(boys,
maxit = 2, # for reasons of brevity
print = FALSE) # no iteration history
pred <- complete(imp, "all") %>%
map(lm, formula = hgt ~ age + wgt + tv) %>% # model
map(predict) %>% # list of predicted values per imputed set
Reduce("+", .) / imp$m # average corresponding list elements
cor(boys$hgt, pred, use = "pairwise.complete.obs")^2 #R-squared
#> [1] 0.9595445 Created on 2020-12-15 by the reprex package (v0.3.0) |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Thank you for your answer! It's helpful! |
Beta Was this translation helpful? Give feedback.
-
Yes, otherwise you'll render your prediction procedure uncongenial to the imputation model - see e.g. Meng (1994) and Bartlett et al. (2015). |
Beta Was this translation helpful? Give feedback.
-
FIMD Section 4.5 considers the relevant topics. |
Beta Was this translation helpful? Give feedback.
-
Great discussion. Two remarks:
|
Beta Was this translation helpful? Give feedback.
-
Thank you very much for your reply! |
Beta Was this translation helpful? Give feedback.
-
I want to perform SVM (package 'e1071') and randomForest(package 'randomForest') on imputed data, but I don't know how to apply this method by 'with' function.
Can any analysis be performed and pooled on imputed data? Or only some special method can using? What should I do if I want to apply and pool this method (like svm, randomforest, clustering analysis...) on imputed data?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions