Calculating descrptive estimates (Mean, Variance and SD) on multiple imputed datasets #597

IvoRollmann · 2023-10-12T08:13:16Z

IvoRollmann
Oct 12, 2023

After searching the internet and several packages, it seems that there ist no function implemented, which calculates the descriptives of the original data, as well as the grand estimates and their standard error of the imputed datasets using rubins rule.

As I am working in psychology, reviewers tend to ask for such descriptives and I would like to be completely correct in not only providing the estimates of the original data, but also the grand estimates of the imputed data and their uncertainty. This would also
help in better understanding the missing values.

Thus, I have created such a function:
It returns a 3x3 data frame with columns each for a mean, var and sd and rows for estimate of originale data, grand mean of imputed data and the standard error using rubins rule.
Code is below.

Standard error estimates for Var and SD are calculated using the formulas of Sangtae Ahn and Jeffrey A. Fessler (2003).

I actually have two questions:

For rubins rule I only used the one dimensional calculation, foregoing the covariance matrix. Is this procedure correct in this case.
If the function is okay. Could it be added in the mice package? That would ease programming a lot.

Here is the Code:

calculate_mi_descriptives <- function(mi_data, column){
mi_data needs to be an mids.
Calculates the mean, var and sd of a variable from a multiple imputed dataset
using rubins rule. Output is a table with the mean, var and sd of the original
data, the grand estimate of the imputed datasets and the standard error of the
grand estimates of the imputed datasets (calculated with rubins rule).
Standard error of var and sd were calculated using the formulas of
Sangtae Ahn and Jeffrey A. Fessler (2003).

load necessary packages
library(mice)
library(dplyr)

long_mids <- complete(mi_data, action = "long", include = TRUE)

imp_descriptive <- long_mids %>%
group_by(.imp) %>%
select({{column}}) %>%
summarise(mean = mean({{column}}, na.rm = TRUE),
var = var({{column}}, na.rm = TRUE),
sd = sd({{column}}, na.rm = TRUE))

orig_descriptive <- imp_descriptive[1,]

n <- long_mids %>%
filter(.imp == 0) %>%
nrow()

m <- mi_data$m

Calculate Grand estimate of imputed data
Q <- imp_descriptive %>%
filter(.imp != 0) %>%
summarise(grand_mean = mean(mean),
grand_var = mean(var),
grand_sd = mean(sd))
colnames(Q) <- c("mean", "var", "sd")

Calculate within error variance
U <- imp_descriptive %>%
filter(.imp != 0) %>%
mutate(mean_se = var/n,
var_se = var2/(n-1),
sd_se = sd/(2(n-1)),
.keep = "none") %>%
summarise(grand_mean_se = mean(mean_se),
grand_var_se = mean(var_se),
grand_sd_se = mean(sd_se))

Calculate between error variance
B <- imp_descriptive %>%
filter(.imp != 0) %>%
summarise(mean_var = var(mean),
var_var = var(var),
sd_var = var(sd))

Combine within and between error variance using rubins rule
T <- U + (1 + 1/m)*B
colnames(T) <- c("mean", "var", "sd")

Combine into one table
res_table <- rbind(orig_descriptive[,2:4], Q, sqrt(T)) %>% as.data.frame()
rownames(res_table) <- c("Original Data Estimates",
"Imputation Grand Estimates",
"Imputation Grand Estimate SE")

return(res_table)

}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculating descrptive estimates (Mean, Variance and SD) on multiple imputed datasets #597

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Calculating descrptive estimates (Mean, Variance and SD) on multiple imputed datasets #597

IvoRollmann Oct 12, 2023

Replies: 0 comments

IvoRollmann
Oct 12, 2023