Cluster-robust standard errors for ordinal data (and WLSMV) #254

isaactpetersen · 2022-12-22T16:24:11Z

It'd be nice to have the option for cluster-robust standard errors for ordinal data (when using WLSMV). Much of my data are ordinal and nested, so this is the primary issue that prevents me from making a wholesale shift from Mplus to lavaan (and I much prefer R/lavaan).

This topic was discussed in the context of a broader issue on fitting multilevel models with ordinal data (#225). However, I thought it'd be worth separating the two issues because (a) cluster-robust SEs for ordinal data is a much more focused issue, and (b) based on the discussion in that thread, it sounds like multilevel models with ordinal data won't be implemented in lavaan for a long time (if ever), so I didn't want cluster-robust SEs for ordinal data to get lost in the shuffle. Let me know if you think this issue is too redundant and that I should close it.

In case it's helpful for implementation, I provide an Mplus code example for cluster-robust standard errors for ordinal data here:
#225 (comment)

FWIW, in that thread, Yves noted (#225 (comment)): "I have not found papers/literature about this yet, but I do have some ideas how this can be done: when we compute the 'cluster robust' standard errors, we must recompute the 'WLS.W' matrix (see lav_muthen1984) taking the clustering into account, in order to compute a cluster-robust 'Gamma' matrix, which is then used in lav_model_nvcov_robust_sem(). At least, that is my current hypothesis."

joh4nd · 2023-06-17T07:17:37Z

This is possible to specify this on my end. It does not raise a warning or error.

isaactpetersen · 2023-06-17T14:39:35Z

Yes, I received an error in version 0.6-11, but haven't received an error since version 0.6-12 and higher (reprex below). Has this been fixed, or has the error merely been suppressed? In other words, is lavaan successfully accounting for clustering with ordinal data and WLSMV?

devtools::install_version(
  "lavaan",
  version = "0.6-11",
  repos = "http://cran.us.r-project.org",
  lib = "C:/R/Old")
#> Downloading package from url: http://cran.us.r-project.org/src/contrib/Archive/lavaan/lavaan_0.6-11.tar.gz
library("lavaan", lib.loc = "C:/R/Old")
#> This is lavaan 0.6-11
#> lavaan is FREE software! Please report any bugs.

nSubjects <- 100
datapointsPerSubject <- 10
rho <- .70

mydata <- data.frame(
  ID = rep(1:nSubjects, each = datapointsPerSubject),
  predictor = NA,
  outcome = NA)

simulateOrdinalData <- function(n, rho){
  v1 <- sample(c(0, 1, 2), n, replace = TRUE, prob = c(1/3, 1/3, 1/3))
  
  index <- sample(c(0, 1), n, replace = TRUE, prob = c(1 - rho, rho))
  index <- index == 1
  v2 <- sample(c(0, 1, 2), n, replace = TRUE, prob = c(1/3, 1/3, 1/3))
  
  v2[index] <- v1[index]
  simdata <- data.frame(v1, v2)
  
  return(simdata)
}

for(i in 1:nSubjects){
  simdata <- simulateOrdinalData(n = datapointsPerSubject, rho = rho)
  mydata[which(mydata$ID == i), c("predictor","outcome")] <- simdata[,c("v1","v2")]
}

mydata$predictor[sample(1:nrow(mydata), size = 15)] <- NA
mydata$outcome[sample(1:nrow(mydata), size = 15)] <- NA

clusteredModel <- '
  outcome ~ predictor
'

clusteredModel_fit <- sem(
  model = clusteredModel,
  data = mydata,
  ordered = TRUE,
  std.lv = TRUE,
  cluster = "ID",
  estimator = "WLSMV",
  missing = "pairwise"
)
#> Warning in lav_options_set(opt): lavaan WARNING: information will be set to
#> "expected" for estimator = "DWLS"
#> Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: 15 cases were deleted due to missing values in 
#>        exogenous variable(s), while fixed.x = TRUE.
#> Error in th.start.idx[i]:th.end.idx[i]: NA/NaN argument

summary(clusteredModel_fit, standardized = TRUE)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'summary': object 'clusteredModel_fit' not found

sessionInfo()
#> R version 4.2.0 (2022-04-22 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.utf8 
#> [2] LC_CTYPE=English_United States.utf8   
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.utf8    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] lavaan_0.6-11
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.10       urlchecker_1.0.1  compiler_4.2.0    later_1.3.1      
#>  [5] remotes_2.4.2     profvis_0.3.8     R.methodsS3_1.8.2 prettyunits_1.1.1
#>  [9] R.utils_2.12.2    tools_4.2.0       pkgload_1.3.2     digest_0.6.31    
#> [13] pkgbuild_1.4.1    evaluate_0.21     memoise_2.0.1     lifecycle_1.0.3  
#> [17] R.cache_0.16.0    rlang_1.1.1       reprex_2.0.2      shiny_1.7.4      
#> [21] cli_3.6.1         yaml_2.3.7        pbivnorm_0.6.0    xfun_0.39        
#> [25] fastmap_1.1.1     stringr_1.5.0     withr_2.5.0       styler_1.10.1    
#> [29] knitr_1.43        htmlwidgets_1.6.2 fs_1.6.2          vctrs_0.6.3      
#> [33] devtools_2.4.5    stats4_4.2.0      glue_1.6.2        R6_2.5.1         
#> [37] processx_3.8.1    rmarkdown_2.22    sessioninfo_1.2.2 purrr_1.0.1      
#> [41] callr_3.7.3       magrittr_2.0.3    usethis_2.2.0     promises_1.2.0.1 
#> [45] ps_1.7.5          ellipsis_0.3.2    htmltools_0.5.5   mnormt_2.1.1     
#> [49] mime_0.12         xtable_1.8-4      httpuv_1.6.11     stringi_1.7.12   
#> [53] miniUI_0.1.1.1    cachem_1.0.8      crayon_1.5.2      R.oo_1.25.0

detach("package:lavaan", unload = TRUE)
install.packages("lavaan")
#> Installing package into 'C:/R/Packages'
#> (as 'lib' is unspecified)
#> package 'lavaan' successfully unpacked and MD5 sums checked
#> 
#> The downloaded binary packages are in
#>  C:\Users\Isaac\AppData\Local\Temp\Rtmp6BqtjZ\downloaded_packages

library("lavaan")
#> Warning: package 'lavaan' was built under R version 4.2.3
#> This is lavaan 0.6-15
#> lavaan is FREE software! Please report any bugs.

mydata <- data.frame(
  ID = rep(1:nSubjects, each = datapointsPerSubject),
  predictor = NA,
  outcome = NA)

for(i in 1:nSubjects){
  simdata <- simulateOrdinalData(n = datapointsPerSubject, rho = rho)
  mydata[which(mydata$ID == i), c("predictor","outcome")] <- simdata[,c("v1","v2")]
}

mydata$predictor[sample(1:nrow(mydata), size = 15)] <- NA
mydata$outcome[sample(1:nrow(mydata), size = 15)] <- NA

clusteredModel_fit <- sem(
  model = clusteredModel,
  data = mydata,
  ordered = TRUE,
  cluster = "ID",
  estimator = "WLSMV",
  missing = "pairwise"
)
#> Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: 15 cases were deleted due to missing values in 
#>        exogenous variable(s), while fixed.x = TRUE.
#> Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: some cases are empty and will be ignored:
#>   20 79 200 263 333 384 419 508 551 620 734 792 920 925 960

summary(clusteredModel_fit, standardized = TRUE)
#> lavaan 0.6.15 ended normally after 2 iterations
#> 
#>   Estimator                                       DWLS
#>   Optimization method                           NLMINB
#>   Number of model parameters                         3
#> 
#>                                                   Used       Total
#>   Number of observations                           970        1000
#>   Number of clusters [ID]                          100            
#>   Number of missing patterns                         1            
#> 
#> Model Test User Model:
#>                                               Standard      Scaled
#>   Test Statistic                                 0.000       0.000
#>   Degrees of freedom                                 0           0
#>                                                                   
#>   Test Statistic                                 0.000       0.000
#>   Degrees of freedom                                 0           0
#> 
#> Parameter Estimates:
#> 
#>   Standard errors                        Robust.cluster.sem
#>   Information                                      Expected
#>   Information saturated (h1) model             Unstructured
#> 
#> Regressions:
#>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
#>   outcome ~                                                             
#>     predictor         1.411    0.045   31.230    0.000    1.411    0.761
#> 
#> Intercepts:
#>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
#>    .outcome           0.000                               0.000    0.000
#> 
#> Thresholds:
#>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
#>     outcome|t1        0.735    0.070   10.567    0.000    0.735    0.477
#>     outcome|t2        2.054    0.065   31.522    0.000    2.054    1.332
#> 
#> Variances:
#>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
#>    .outcome           1.000                               1.000    0.421
#> 
#> Scales y*:
#>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
#>     outcome           1.000                               1.000    1.000

sessionInfo()
#> R version 4.2.0 (2022-04-22 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.utf8 
#> [2] LC_CTYPE=English_United States.utf8   
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.utf8    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] lavaan_0.6-15
#> 
#> loaded via a namespace (and not attached):
#>  [1] styler_1.10.1     xfun_0.39         remotes_2.4.2     purrr_1.0.1      
#>  [5] vctrs_0.6.3       miniUI_0.1.1.1    htmltools_0.5.5   usethis_2.2.0    
#>  [9] stats4_4.2.0      yaml_2.3.7        rlang_1.1.1       pkgbuild_1.4.1   
#> [13] R.oo_1.25.0       later_1.3.1       urlchecker_1.0.1  glue_1.6.2       
#> [17] withr_2.5.0       R.utils_2.12.2    sessioninfo_1.2.2 R.cache_0.16.0   
#> [21] lifecycle_1.0.3   stringr_1.5.0     R.methodsS3_1.8.2 devtools_2.4.5   
#> [25] htmlwidgets_1.6.2 memoise_2.0.1     evaluate_0.21     knitr_1.43       
#> [29] callr_3.7.3       fastmap_1.1.1     httpuv_1.6.11     ps_1.7.5         
#> [33] parallel_4.2.0    Rcpp_1.0.10       xtable_1.8-4      promises_1.2.0.1 
#> [37] cachem_1.0.8      pkgload_1.3.2     mime_0.12         fs_1.6.2         
#> [41] mnormt_2.1.1      digest_0.6.31     stringi_1.7.12    processx_3.8.1   
#> [45] shiny_1.7.4       quadprog_1.5-8    cli_3.6.1         tools_4.2.0      
#> [49] magrittr_2.0.3    profvis_0.3.8     crayon_1.5.2      pbivnorm_0.6.0   
#> [53] ellipsis_0.3.2    prettyunits_1.1.1 reprex_2.0.2      rmarkdown_2.22   
#> [57] R6_2.5.1          compiler_4.2.0

^{Created on 2023-06-17 with reprex v2.0.2}

joh4nd · 2023-06-18T08:31:07Z

Yet that's also the question I feel. Terrence replied on the google group that it is not possible, referring to this issue, suggesting that although a result is returned, it is difficult to interpret. Would be nice to hear from the developers. But they surely don't have all the time in the world to answer questions.

yrosseel · 2023-07-18T14:13:32Z

This is NOT supported by lavaan yet. Unfortunately, no warning/error message was printed (in lavaan <= 0.6-15), but the output is simply ignoring the clustering structure. The github version of lavaan (pre 0.6-16) will produce a hard stop.

yrosseel added the feature_request label Feb 7, 2023

yrosseel closed this as completed Jul 18, 2023

yrosseel reopened this Jul 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster-robust standard errors for ordinal data (and WLSMV) #254

Cluster-robust standard errors for ordinal data (and WLSMV) #254

isaactpetersen commented Dec 22, 2022

joh4nd commented Jun 17, 2023

isaactpetersen commented Jun 17, 2023

joh4nd commented Jun 18, 2023

yrosseel commented Jul 18, 2023

Cluster-robust standard errors for ordinal data (and WLSMV) #254

Cluster-robust standard errors for ordinal data (and WLSMV) #254

Comments

isaactpetersen commented Dec 22, 2022

joh4nd commented Jun 17, 2023

isaactpetersen commented Jun 17, 2023

joh4nd commented Jun 18, 2023

yrosseel commented Jul 18, 2023