How to perform 'with' in some complex analysis instead of lm? #332

Bigsealion · 2020-12-15T08:25:48Z

Bigsealion
Dec 15, 2020

I want to perform SVM (package 'e1071') and randomForest(package 'randomForest') on imputed data, but I don't know how to apply this method by 'with' function.
Can any analysis be performed and pooled on imputed data? Or only some special method can using? What should I do if I want to apply and pool this method (like svm, randomforest, clustering analysis...) on imputed data?
Thanks!

gerkovink · 2020-12-15T16:40:29Z

gerkovink
Dec 15, 2020
Maintainer

If you'd like to combine models without parameters or with different predictor sets across the imputed data sets, you'd need ensemble techniques.

See e.g. this vignette for an example that uses majority voting on stepwise selected models.

0 replies

gerkovink · 2020-12-15T17:05:54Z

gerkovink
Dec 15, 2020
Maintainer

If your aim is to obtain predictions, it is more efficient to pool the predicted values and not pool the parameters/models first. See below for a reprex that demonstrates pooling predicted values.

library(mice)     # Multiple Imputation
library(dplyr)    # Data manipulation
library(tidyr)    # Tidy data
library(magrittr) # Pipes
library(purrr)    # Functional programming - map()

set.seed(123) # Fix RNG seed

imp <- mice(boys, 
            maxit = 2,     # for reasons of brevity
            print = FALSE) # no iteration history

pred <- complete(imp, "all") %>% 
  map(lm, formula = hgt ~ age + wgt + tv) %>% # model
  map(predict) %>% # list of predicted values per imputed set
  Reduce("+", .) / imp$m # average corresponding list elements


cor(boys$hgt, pred, use = "pairwise.complete.obs")^2 #R-squared
#> [1] 0.9595445

^{Created on 2020-12-15 by the reprex package (v0.3.0)}

0 replies

gerkovink · 2020-12-15T17:08:28Z

gerkovink
Dec 15, 2020
Maintainer

mice's function pool() is aimed at pooling parametric models cf. Rubin's rules - see e.g. Section 2.3.2 in FIMD v.2.

0 replies

Bigsealion · 2020-12-16T01:38:12Z

Bigsealion
Dec 16, 2020
Author

Thank you for your answer! It's helpful!
Beside, if I want to build a predict model that using X to predict y, should I add y as a predictor of imputation?
I'm worried that it will introduce information of label to predict model, but example in book 'Flexble Imputation of Missing Data' using all variable which contain interesting variable as predictor of imputaion. I want to know why the method in book is reasonable?
Thanks!

0 replies

gerkovink · 2020-12-16T10:40:15Z

gerkovink
Dec 16, 2020
Maintainer

Yes, otherwise you'll render your prediction procedure uncongenial to the imputation model - see e.g. Meng (1994) and Bartlett et al. (2015).

0 replies

gerkovink · 2020-12-16T10:41:35Z

gerkovink
Dec 16, 2020
Maintainer

FIMD Section 4.5 considers the relevant topics.

0 replies

stefvanbuuren · 2020-12-16T11:17:43Z

stefvanbuuren
Dec 16, 2020
Maintainer

Great discussion. Two remarks:

OP: "Can any analysis be performed and pooled on imputed data?": performed YES (easy), pooled YES (but may be difficult, depending on the nature of the model)
OP: "should I add y as a predictor of imputation?": YES, see https://stefvanbuuren.name/fimd/sec-modelform.html#sec:predictors for a recipe and reasons. Nevertheless, Little (1992) section 4.2 mentions a (not widely known) special case assuming MCAR where you should not include y and adapt case weights, so read that for the nitty-gritty.

0 replies

Bigsealion · 2020-12-17T01:33:25Z

Bigsealion
Dec 17, 2020
Author

Thank you very much for your reply!
I will read this materials and try to solve my problem by mice!
Thanks!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to perform 'with' in some complex analysis instead of lm? #332

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 8 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to perform 'with' in some complex analysis instead of lm? #332

Bigsealion Dec 15, 2020

Replies: 8 comments

gerkovink Dec 15, 2020 Maintainer

gerkovink Dec 15, 2020 Maintainer

gerkovink Dec 15, 2020 Maintainer

Bigsealion Dec 16, 2020 Author

gerkovink Dec 16, 2020 Maintainer

gerkovink Dec 16, 2020 Maintainer

stefvanbuuren Dec 16, 2020 Maintainer

Bigsealion Dec 17, 2020 Author

Bigsealion
Dec 15, 2020

gerkovink
Dec 15, 2020
Maintainer

gerkovink
Dec 15, 2020
Maintainer

gerkovink
Dec 15, 2020
Maintainer

Bigsealion
Dec 16, 2020
Author

gerkovink
Dec 16, 2020
Maintainer

gerkovink
Dec 16, 2020
Maintainer

stefvanbuuren
Dec 16, 2020
Maintainer

Bigsealion
Dec 17, 2020
Author