Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad (non-robust) standard-errors using "bootstrap" => CI's and p-values do not match #347

Open
LGraz opened this issue May 16, 2024 · 2 comments

Comments

@LGraz
Copy link

LGraz commented May 16, 2024

Dear Team,
I appreciate this great product, THANKS!

I encountered a case where the confidence interval of or_fraction_mediated does not include 0, but the p-value is 0.9

I noticed that the SE estimate from the bootstrap is corrupted by outlier from the bootstraps (Hence the p-value might be too big). If thats so, I suggest using some more robust version (below I compare the SE with the MAD).

Here my minimal reproducible example using the data_WV_scaled.csv :

library(lavaan)

D <- read.csv("./data_WV_scaled.csv")

sem_mod_def <- {
"# Mediator
    F ~ df*D + of*O
    A ~ oa*O + da*D
    R ~ fr*F + ar*A
# direct effect
    R ~ dr*D + or*O
# Mediated effects
    dr_mediated := df*fr + da*ar
    or_mediated := of*fr + oa*ar
# total effect (direct + mediated)
    dr_total := dr + dr_mediated
    or_total := or + or_mediated
# mediated_effect / total_effect:
    dr_fraction_mediated := dr_mediated / dr_total
    or_fraction_mediated := or_mediated / or_total
"
}


# fit SEM and return parameters
set.seed(123)
object <- sem(sem_mod_def, D, se = "boot", # else "robust.sem"?
bootstrap = 2000)

BOOT <- lavaan:::lav_object_inspect_boot(object)
BOOT.def <- apply(BOOT, 1, object@[email protected])
or_fraction_mediated <- BOOT.def[6,]
mad(or_fraction_mediated)
#> [1] 0.38009
sd(or_fraction_mediated)
#> [1] 11.44
# ----> THATS A BIG DIFFERENCE !!!

parameterEstimates(object)
#>                     lhs op                  rhs                label    est     se      z pvalue ci.lower ci.upper
#> 1                     F  ~                    D                   df  0.411  0.023 17.665  0.000    0.362    0.453
#> 2                     F  ~                    O                   of -0.101  0.025 -4.021  0.000   -0.150   -0.051
#> 3                     A  ~                    O                   oa  0.479  0.025 19.148  0.000    0.431    0.528
#> 4                     A  ~                    D                   da -0.044  0.024 -1.839  0.066   -0.091    0.006
#> 5                     R  ~                    F                   fr  0.573  0.029 19.600  0.000    0.514    0.632
#> 6                     R  ~                    A                   ar -0.140  0.025 -5.534  0.000   -0.188   -0.090
#> 7                     R  ~                    D                   dr -0.024  0.024 -1.015  0.310   -0.069    0.023
#> 8                     R  ~                    O                   or  0.037  0.025  1.492  0.136   -0.011    0.085
#> 9                     F ~~                    F                       0.803  0.033 24.295  0.000    0.737    0.867
#> 10                    A ~~                    A                       0.759  0.025 30.706  0.000    0.708    0.806
#> 11                    R ~~                    R                       0.624  0.026 23.794  0.000    0.572    0.674
#> 12                    D ~~                    D                       0.999  0.000     NA     NA    0.999    0.999
#> 13                    D ~~                    O                      -0.206  0.000     NA     NA   -0.206   -0.206
#> 14                    O ~~                    O                       0.999  0.000     NA     NA    0.999    0.999
#> 15          dr_mediated :=          df*fr+da*ar          dr_mediated  0.242  0.019 12.966  0.000    0.204    0.278
#> 16          or_mediated :=          of*fr+oa*ar          or_mediated -0.125  0.020 -6.231  0.000   -0.163   -0.085
#> 17             dr_total :=       dr+dr_mediated             dr_total  0.218  0.025  8.742  0.000    0.168    0.267
#> 18             or_total :=       or+or_mediated             or_total -0.088  0.028 -3.161  0.002   -0.141   -0.033
#> 19 dr_fraction_mediated := dr_mediated/dr_total dr_fraction_mediated  1.110  0.122  9.118  0.000    0.907    1.380
#> 20 or_fraction_mediated := or_mediated/or_total or_fraction_mediated  1.422 11.440  0.124  0.901    0.909    3.317

Unrelated to this robustness issue: Would you recommend to me using se = "robust.sem" instead?

@yrosseel
Copy link
Owner

yrosseel commented Jun 2, 2024

I would not recommend se = "robust.sem" here, as you define new variables that are clearly nonlinear functions of the original parameters.
I also hesitate to just replace 'sd' by 'mad' while calculating the bootstrap-based standard errors. Surely, sd() is very sensitive to outliers, but if an outlier effectively occurs every now and then, then this should also be reflected in the standard errors. I would have to dive into the literature to better understand the theoretical implications if mad() is used instead of sd() while computing bootstrap based standard errors.
I would rather try to pinpoint when/why you often get outlying parameter estimates. Perhaps using bounded estimation may prevent this? (Try running this with the argument bounds = "standard").

@LGraz
Copy link
Author

LGraz commented Jun 4, 2024

bounds = "standard" does not have an effect.

pinpoint why outlying parameter:
or_fraction_mediated := or_mediated/or_total so if or_total is very close to 0 in one sample, then or_fraction_mediated explodes and hence also sd breaks down.

Simple exchanging sd() with mad() is not advisable.

What do you think about a warning if the sd and the mad are far away from each other? like:

# BOOT <- lavaan:::lav_object_inspect_boot(object)
# BOOT.def <- apply(BOOT, 1, object@[email protected])
sd_mad_ratio <- apply(BOOT.def, 1, sd) / apply(BOOT.def, 1, mad)
if(any(5 > sd_mad_ratio)){
    params_w_outliers <- names(sd_mad_ratio)[5 < sd_mad_ratio]
    warning(paste("The following bootstrap-parameters have a high
      ratio of standard deviation to median absolute deviation:", 
      params_w_outliers, 
      "\nP-values and Confidence Intervals might not match."))
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants