Calculating descrptive estimates (Mean, Variance and SD) on multiple imputed datasets #597
IvoRollmann
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
After searching the internet and several packages, it seems that there ist no function implemented, which calculates the descriptives of the original data, as well as the grand estimates and their standard error of the imputed datasets using rubins rule.
As I am working in psychology, reviewers tend to ask for such descriptives and I would like to be completely correct in not only providing the estimates of the original data, but also the grand estimates of the imputed data and their uncertainty. This would also
help in better understanding the missing values.
Thus, I have created such a function:
It returns a 3x3 data frame with columns each for a mean, var and sd and rows for estimate of originale data, grand mean of imputed data and the standard error using rubins rule.
Code is below.
Standard error estimates for Var and SD are calculated using the formulas of Sangtae Ahn and Jeffrey A. Fessler (2003).
I actually have two questions:
Here is the Code:
calculate_mi_descriptives <- function(mi_data, column){
mi_data needs to be an mids.
Calculates the mean, var and sd of a variable from a multiple imputed dataset
using rubins rule. Output is a table with the mean, var and sd of the original
data, the grand estimate of the imputed datasets and the standard error of the
grand estimates of the imputed datasets (calculated with rubins rule).
Standard error of var and sd were calculated using the formulas of
Sangtae Ahn and Jeffrey A. Fessler (2003).
load necessary packages
library(mice)
library(dplyr)
long_mids <- complete(mi_data, action = "long", include = TRUE)
imp_descriptive <- long_mids %>%
group_by(.imp) %>%
select({{column}}) %>%
summarise(mean = mean({{column}}, na.rm = TRUE),
var = var({{column}}, na.rm = TRUE),
sd = sd({{column}}, na.rm = TRUE))
orig_descriptive <- imp_descriptive[1,]
n <- long_mids %>%
filter(.imp == 0) %>%
nrow()
m <- mi_data$m
Calculate Grand estimate of imputed data
Q <- imp_descriptive %>%
filter(.imp != 0) %>%
summarise(grand_mean = mean(mean),
grand_var = mean(var),
grand_sd = mean(sd))
colnames(Q) <- c("mean", "var", "sd")
Calculate within error variance
U <- imp_descriptive %>%
filter(.imp != 0) %>%
mutate(mean_se = var/n,
var_se = var2/(n-1),
sd_se = sd/(2(n-1)),
.keep = "none") %>%
summarise(grand_mean_se = mean(mean_se),
grand_var_se = mean(var_se),
grand_sd_se = mean(sd_se))
Calculate between error variance
B <- imp_descriptive %>%
filter(.imp != 0) %>%
summarise(mean_var = var(mean),
var_var = var(var),
sd_var = var(sd))
Combine within and between error variance using rubins rule
T <- U + (1 + 1/m)*B
colnames(T) <- c("mean", "var", "sd")
Combine into one table
res_table <- rbind(orig_descriptive[,2:4], Q, sqrt(T)) %>% as.data.frame()
rownames(res_table) <- c("Original Data Estimates",
"Imputation Grand Estimates",
"Imputation Grand Estimate SE")
return(res_table)
}
Beta Was this translation helpful? Give feedback.
All reactions