SEM.qmd

---
title: "Ch. 5: Structural Equation Models"
format: html
author: Moritz Fischer
execute:
  cache: true
project-type: website
---

*Reading/working time: ~50 min.*

```{r install packages, message=FALSE, warning=FALSE}
#install.packages(c("lavaan", "ggplot2", "Rfast", "MBESS", "future.apply"), dependencies = TRUE)

library("lavaan")
library("Rfast")
library("ggplot2")
library("dplyr")
library("MBESS")
library(future.apply)
# this initializes parallel processing, for faster code execution
# By default, it uses all available cores
plan(multisession)
```

In this chapter, we will focus on some rather simple Structural Equation Models (SEM). The goal is to illustrate how simulations can be used to estimate statistical power to detect a given effect in a SEM. In the context of SEMs, the focal effect may be for instance a fit index (e.g., Chi-Square, Root Mean Square Error of Approximation, etc.) or a model coefficient (e.g., a regression coefficient for the association between two latent factors). In this chapter, we only focus on the latter, that is power analyses for regression coefficients in the context of SEM. Please see the bonus chapter titled "[Power analysis for fit indices in SEM](SEM_fit_index.qmd)" if you're interested in power analyses for fit indices.

::: {.callout-note}
In this chapter, we'll use the parallelization techniques described in the Chapter ["Bonus: Optimizing R code for speed"](optimizing_code.qmd), otherwise the simulations are too slow. If you wonder about the unknown commands in the code, please read the bonus chapter to learn how to considerably speed up your code! But nonetheless, running some of the chunks in this chapter will require quite some time. We therefore decided to use 500 iterations per simulation only in this chapter; please keep in mind that you should use more iterations in order to obtain more precise estimates! 
:::

# A simulation-based power analysis for a single regression coefficient between latent variables

Let's consider the following example: We are planning to conduct a study investigating whether people's openness to experience (as conceptualized in the big five personality traits) relates negatively to generalized prejudice, that is a latent variable comprising prejudice towards different social groups (e.g., women, foreigners, homosexuals, disabled people). We could plan to scrutinize this prediction with a SEM in which both openness to experience and generalized prejudice are modeled as latent variables. 

::: {.callout-note}

Please note that the predictions used as examples in this chapter are not necessarily theoretically meaningful hypotheses, we merely use them to illustrate the mechanics of simulation-based power analyses in the context of SEM.

:::


## Let's get some real data as starting point

Just like for any other simulation-based power analysis, we first need to come up with plausible estimates of the distribution of the (manifest) variables. For the sake of simplicity, let's assume that there is a published study that measured manifestations of our two latent variables and that the corresponding data set is publicly available. For the purpose of this tutorial, we will draw on a publication by Bergh et al. ([2016](https://www.researchgate.net/publication/306939541_Is_group_membership_necessary_for_understanding_generalized_prejudice_A_re-evaluation_of_why_prejudices_are_interrelated)) and the corresponding data set which has been made accessible as part of the `MPsychoR` package. Let's take a look at this data set.


```{r load data, message=FALSE}

#install.packages("MPsychoR")
library(MPsychoR)

data("Bergh")
attach(Bergh)

#let's take a look
head(Bergh)
tail(Bergh)

```

This data set comprises `r ncol(Bergh)` variables measured in `r nrow(Bergh)` participants. For now, we will focus on the following measured variables: 

-   `EP` is a continuous variable measuring ethnic prejudice. 
-   `SP` is a continuous variable measuring sexism. 
-   `HP` is a continuous variable measuring sexual prejudice towards gays and lesbians. 
-   `DP` is a continuous variable measuring prejudice toward people with disabilities. 
-   `O1`, `O2`, and `O3` are three items measuring openness to experience.

To get an impression of this data, we look at the correlations of the variables we're interested in.

```{r correlations SEM}

cor(cbind(EP, SP, HP, DP, O1, O2, O3)) |> round(2)

```

As we have discussed in the previous chapters, the starting point of every simulation-based power analysis is to specify the population parameters of the variables of interest. In our example, we can estimate the population parameters from the study by Bergh et al. (2016). We start by calculating the means of the  variables, rounding them generously, and storing them in a vector called `means_vector`. 

```{r mean vector SEM}

#store means
means_vector <- c(mean(EP), mean(SP), mean(HP), mean(DP), mean(O1), mean(O2), mean(O3)) |> round(2)

#Let's take a look
means_vector

```

We also need the variance-covariance matrix of our variables in order to simulate data. Luckily, we can estimate this from the Bergh et al. data as well. There are two ways to do this. First, we can use the `cov` function to obtain the variance-covariance matrix. This matrix incorporates the variance of each variable on the diagonal and the covariances in the remaining cells. 

```{r cov matrix SEM}

#store covariances
cov_mat <- cov(cbind(EP, SP, HP, DP, O1, O2, O3)) |> round(2)

#Let's take a look
cov_mat

```

This works well as long as we have a data set (e.g., from a pilot study or published work) to estimate the variances and covariances. In other cases, however, we might not have access to such a data set. In this case, we might only have a correlation table that was provided in a published paper. But that's no problem either, as we can transform the correlations and standard deviations of the variables of interest into a variance-covariance matrix. The following chunk shows how this works by using the `cor2cov` function from the `MBESS` package. 

```{r sd vector and correlations SEM}

#store correlation matrix 
cor_mat <- cor(cbind(EP, SP, HP, DP, O1, O2, O3)) 

#store standard deviations
sd_vector <- c(sd(EP), sd(SP), sd(HP), sd(DP), sd(O1), sd(O2), sd(O3))

#transform correlations and standard deviations into variance-covariance matrix
cov_mat2 <- MBESS::cor2cov(cor.mat = cor_mat, sd = sd_vector) |> as.data.frame() |> round(2)

#Let's take a look
cov_mat2

```

Let's do a plausibility check: Did the two ways to estimate the variance-covariance matrix lead to the same results?

```{r plausibility check SEM}

cov_mat == cov_mat2

```

Indeed, this worked! Both procedures lead to the exact same variance-covariance matrix. Now that we have an approximation of the variance-covariance matrix, we use the `rmvnorm` function from the `Rfast` package to simulate data from a multivariate normal distribution. The following code simulates `n = 50` observations from the specified population. 

```{r simulate data SEM}

#Set seed to make results reproducible
set.seed(21364)

#simulate data
my_first_simulated_data <- Rfast::rmvnorm(n = 50, mu=means_vector, sigma = cov_mat) |> as.data.frame()

#Let's take a look
head(my_first_simulated_data)

```

We could now fit a SEM to this simulated data set and check whether the regression coefficient modelling the association between openness to experience and generalized prejudice is significant at an $\alpha$-level of .005. We will work with the `lavaan` package to fit SEMs.

```{r analyze data SEM}

#specify SEM
model_sem <- "generalized_prejudice =~ EP + DP + SP + HP
              openness =~ O1 + O2 + O3
              generalized_prejudice ~ openness"

#fit the SEM to the simulated data set
fit_sem <- sem(model_sem, data = my_first_simulated_data)

#display the results
summary(fit_sem)

```

The results show that in this case, the regression coefficient is `r lavaan::parameterestimates(fit_sem)[8,]$est |> round(2)` which is significant with p = `r lavaan::parameterestimates(fit_sem)[8,]$pvalue |> round(3)`. But, actually, it is not our primary interest to see whether this particular simulated data set results in a significant regression coefficient. Rather, we want to know how many of a theoretically infinite number of simulations yield a significant p-value of the focal regression coefficient. Thus, as in the previous chapters, we now repeatedly simulate data sets of a certain size (say, 50 observations) from the specified population and store the results of the focal test (here: the p-value of the regression coefficient) in a vector called `p_values`. 

```{r multiple iterations SEM, warning=FALSE}

#Set seed to make results reproducible
set.seed(21364)

#let's do 500 iterations
iterations <- 500

#prepare an empty NA vector with 500 slots
p_values <- rep(NA, iterations)

#sample size per iteration
n <- 50


#simulate data
for(i in 1:iterations){

  simulated_data <- Rfast::rmvnorm(n = n, mu = means_vector, sigma = cov_mat) |> as.data.frame()
  fit_sem_simulated <- sem(model_sem, data = simulated_data)
  
  p_values[i] <- parameterestimates(fit_sem_simulated)[8,]$pvalue
  
}

```

How many of our 500 virtual samples would have found a significant p-value (i.e., p < .005)?

```{r results SEM}

#frequency table
table(p_values < .005)

#percentage of significant results
sum(p_values < .005)/iterations*100

```

Only `r round((sum(p_values < .005)*100/iterations),2)`% of samples with the same size of $n=50$ result in a significant p-value. We conclude that $n=50$ observations seems to be insufficient, as the power with these parameters is lower than 80%. 

## Sample size planning: Find the necessary sample size

But how many observations do we need to find the presumed effect with a power of 80%? Like before, we can now systematically vary certain parameters (e.g., sample size) of our simulation and see how that affects power. We could, for example, vary the sample size in a range from 30 to 200. Running these simulations typically requires quite some computing time.


```{r power analysis SEM, warning=FALSE}

#Set seed to make results reproducible
set.seed(21364)

#test ns between 30 and 200
ns_sem <- seq(30, 200, by=10) 

#prepare empty vector to store results
result_sem <- data.frame()

#set number of iterations
iterations_sem <- 500

#write function
sim_sem <- function(n, model, mu, sigma) {
  

  simulated_data <- Rfast::rmvnorm(n = n, mu = mu, sigma = sigma) |> as.data.frame()
  fit_sem_simulated <- sem(model, data = simulated_data)
  p_value_sem <- parameterestimates(fit_sem_simulated)[8,]$pvalue
  return(p_value_sem)
  
    }


#replicate function with varying ns
for (n in ns_sem) {  
  
p_values_sem <- future_replicate(iterations_sem, sim_sem(n = n, model = model_sem, mu = means_vector, sigma = cov_mat), future.seed=TRUE)  
result_sem <- rbind(result_sem, data.frame(
    n = n,
    power = sum(p_values_sem < .005)/iterations_sem)
  )

#The following line of code can be used to track the progress of the simulations 
#This can be helpful for simulations with a high number of iterations and/or a large parameter space which require a lot of time
#I have deactivated this here; to enable it, just remove the "#" sign at the beginning of the next line
#message(paste("Progress info: Simulations completed for n =", n))


}

```

Let's plot the results:

```{r plot power curve SEM, warning=FALSE}

ggplot(result_sem, aes(x=n, y=power)) + geom_point() + geom_line() + scale_x_continuous(n.breaks = 18, limits = c(30,200)) + scale_y_continuous(n.breaks = 10, limits = c(0,1)) + geom_hline(yintercept= 0.8, color = "red")

```

This graph suggests that we need a sample size of approximately 50 participants to reach a power of 80% with the given population estimates. That's all it takes to run a power analysis for a SEM! 

In the following two sub-paragraphs, we would like to present two alternatives and/or extensions to this first way of doing a power analysis for a SEM. First, we would like to present an alternative way to simulate data for SEMs, i.e., by using a built-in function in the `lavaan` package. Second, we would like to show how a "safeguard" approach to power analysis can be used within the context of SEM.

## Using the lavaan syntax to simulate data

In our opening example in this chapter, we used the `rmvnorm` function from the `Rfast`package to simulate data based on the means of the manifest variables as well as their variance-covariance matrix. An alternative to this procedure is to use a built-in function in the `lavaan` package, that is, the `simulateData()` function. The main idea here is to provide this function with a lavaan model that specifies all relevant population parameters and then to use this function to directly simulate data. 

More specifically, we need to incorporate the factor loadings, regression coefficients and (residual) variances of all latent and manifest variables in the lavaan syntax. As these parameters are hardly ever known without a previous study, we will again draw on the results from the data by Bergh et al. (2016). Let's again take a look at the results of our SEM when applying it to this data set.

```{r fit Bergh}

#fit the SEM to the pilot data set
fit_bergh <- sem(model_sem, data = Bergh)

#display the results
summary(fit_bergh)

```

In order to use the `simulateData()` function, we can take the estimates from this previous study and plug them into the lavaan syntax in the following chunk. 

```{r specify lavaan syntax}

#Set seed to make results reproducible
set.seed(21364)

#specify SEM
model_fully_specified <- 

"generalized_prejudice =~ 1*EP + 0.71*DP + 0.91*SP + 1*HP
openness =~ 1*O1 + 0.93*O2 + 1.14*O3
generalized_prejudice ~ -0.77*openness


generalized_prejudice ~~ 0.19*generalized_prejudice
openness ~~ 0.16*openness
EP ~~ 0.21*EP
DP ~~ 0.14*DP
SP ~~ 0.23*SP
HP ~~ 2.12*HP
O1 ~~ 0.07*O1
O2 ~~ 0.08*O2
O3 ~~ 0.05*O3

"

#lets try this
sim_lavaan <- simulateData(model_fully_specified, sample.nobs=100)
head(sim_lavaan)


```

The next step is to integrate this code into our first simulation-based power analysis in this chapter, that is, to replace the data simulation process using the `rmvnorm` function from the `Rfast` package with this new method using `simulateData()` from the `lavaan` package. To this end, I am adapting the `power analysis SEM` chunk from above accordingly.

```{r simulateData, warning=FALSE}

#Set seed to make results reproducible
set.seed(21364)

#test ns between 30 and 200
ns_sem <- seq(30, 200, by=10) 

#prepare empty vector to store results
result_sem <- data.frame()

#set number of iterations
iterations_sem <- 500

#write function
sim_sem_lavaan <- function(n, model) {
  
  
  sim_lavaan <- simulateData(model = model, sample.nobs=n)
  fit_sem_simulated <- sem(model_sem, data = sim_lavaan)
  p_value_sem <- parameterestimates(fit_sem_simulated)[8,]$pvalue
  return(p_value_sem)
  
}


#replicate function with varying ns
for (n in ns_sem) {  
  
  p_values_sem <- future_replicate(iterations_sem, sim_sem_lavaan(n = n, model = model_fully_specified), future.seed=TRUE)  
  result_sem <- rbind(result_sem, data.frame(
    n = n,
    power = sum(p_values_sem < .005, na.rm = TRUE)/iterations_sem)
  )
  
#The following line of code can be used to track the progress of the simulations 
#This can be helpful for simulations with a high number of iterations and/or a large parameter space which require a lot of time
#I have deactivated this here; to enable it, just remove the "#" sign at the beginning of the next line
#message(paste("Progress info: Simulations completed for n =", n))

}


```

Let's plot this again. 

```{r plot power curve SEM lavaan, warning=FALSE}

ggplot(result_sem, aes(x=n, y=power)) + geom_point() + geom_line() + scale_x_continuous(n.breaks = 18, limits = c(30,200)) + scale_y_continuous(n.breaks = 10, limits = c(0,1)) + geom_hline(yintercept= 0.8, color = "red")

```

Here, we conclude that approx. 55 participants would be needed to achieve 80% power. The slight difference compared to the previous power analysis should be explained by the fact that we rounded the numbers that define our statistical populations and that we only used 500 Monte Carlo iterations -- these differences should decrease with an increasing number of iterations.  

::: {.callout-tip}

Wang and Rhemtulla ([2021](https://doi.org/10.1177/2515245920918253)) developed a shiny app that can do power analyses for SEMs in a similar fashion, but that additionally provides a point-and-click interface. You can use it here, for instance, to replicate the results of this simulation-based power analysis: [https://yilinandrewang.shinyapps.io/pwrSEM/](https://yilinandrewang.shinyapps.io/pwrSEM/). 

:::

## A safeguard power approach for SEMs

Of note, the two power analyses for our SEM we have conducted so far used the observed effect size from the Bergh et al. (2016) data set as an estimate of the "true" effect size. But, as this point estimate may be imprecise (e.g., because of publication bias), it seems reasonable to use a more conservative estimate of the true effect size. One more conservative approach in this context is the safeguard power approach (Perugini et al., [2014](https://doi.org/10.1177/1745691614528519)), which we have already applied in Chapter 1 ([Linear Model 1: A single dichotomous predictor](LM1.qmd)).

Basically, all need to do in order to account for variability of observed effect sizes is to calculate a 60%-confidence interval around the point estimate of the observed effect size from our pilot data and to use the more conservative bound of this confidence interval (here: the upper bound) as our new effect size estimate. This can be easily done with the `parameterestimates` function from the `lavaan` package which takes the level of the confidence interval as an input parameter. Let's use this function on our object `fit_bergh` which stores the results of our SEM in the `Bergh` data set.  

```{r safeguard estimation}

parameterestimates(fit_bergh, level = .60)

```

This output shows that the upper bound of the 60% confidence interval around the focal regression coefficient is `r lavaan::parameterestimates(fit_bergh, level = .60)[8,]$ci.upper |> round(2)`. We can now use this as our new and more conservative effect size estimate. We can for example insert this value into our simulation using the lavaan syntax. For this purpose, we copy the chunk from above and simply replace the previous effect size estimate (`r lavaan::parameterestimates(fit_bergh)[8,]$est |> round(2)`) with the new estimate (`r lavaan::parameterestimates(fit_bergh, level = .60)[8,]$ci.upper |> round(2)`), while keeping all other parameters that define this data set. 


```{r lavaan model safeguard power}

#Set seed to make results reproducible
set.seed(21364)

#specify SEM
model_fully_specified_safeguard <- 

"generalized_prejudice =~ 1*EP + 0.71*DP + 0.91*SP + 1*HP
openness =~ 1*O1 + 0.93*O2 + 1.14*O3
generalized_prejudice ~ -0.72*openness


generalized_prejudice ~~ 0.19*generalized_prejudice
openness ~~ 0.16*openness
EP ~~ 0.21*EP
DP ~~ 0.14*DP
SP ~~ 0.23*SP
HP ~~ 2.12*HP
O1 ~~ 0.07*O1
O2 ~~ 0.08*O2
O3 ~~ 0.05*O3

"

#lets try this
sim_lavaan_safeguard <- simulateData(model_fully_specified_safeguard, sample.nobs=100)
head(sim_lavaan_safeguard)

```

Now, everything is ready for the actual safeguard power analysis. We can re-use the `sim_sem_lavaan` function we have defined above. Let's see what we get here! 

```{r safeguard power analysis, warning=FALSE}

#Set seed to make results reproducible
set.seed(21364)

#prepare empty vector to store results
result_sem_safeguard <- data.frame()

#replicate function with varying ns
for (n in ns_sem) {  
  
  p_values_sem_safeguard <- future_replicate(iterations_sem, sim_sem_lavaan(n = n, model = model_fully_specified_safeguard), future.seed=TRUE)  
  result_sem_safeguard <- rbind(result_sem_safeguard, data.frame(
    n = n,
    power = sum(p_values_sem_safeguard < .005, na.rm = TRUE)/iterations_sem)
  )
  
#The following line of code can be used to track the progress of the simulations 
#This can be helpful for simulations with a high number of iterations and/or a large parameter space which require a lot of time
#I have deactivated this here; to enable it, just remove the "#" sign at the beginning of the next line
#message(paste("Progress info: Simulations completed for n =", n))
  
}

ggplot(result_sem_safeguard, aes(x=n, y=power)) + geom_point() + geom_line() + scale_x_continuous(n.breaks = 18, limits = c(30,200)) + scale_y_continuous(n.breaks = 10, limits = c(0,1)) + geom_hline(yintercept= 0.8, color = "red")

```
This safeguard power analysis yields a required sample size of ca. 63 participants.

::: {.callout-note}

In addition to this safeguard power approach, we would also have liked to derive a smallest effect size of interest (SESOI). However, no prior studies on SESOI in the context of personality/stereotypes were available and the measures/response scales used by Bergh et al. (2016) were only vaguely reported in their manuscript, thereby making it difficult to derive a meaningful SESOI. We therefore only report a safeguard approach but no SESOI approach here.

:::

# A simulation-based power analysis for a mediation model with latent variables

Sometimes, researchers not only wish to investigate whether and how two (latent) variables related to each other, but also whether the association between two (manifest or latent) variables is mediated by a third variable. We will run a power analysis for such a latent mediation model, investigating whether gender affects generalized prejudice (fully or partially) through openness to experience, while using the Bergh et al. (2016) data set as a pilot study. We can repeat the same steps as in our previous power analysis, while incorporating gender into our analysis. Specifically, we will follow these steps:

1. find plausible estimates of the population parameters
2. specify the statistical model
3. simulate data from this population
4. compute the index of interest (e.g., the p-value) and store the results
5. repeat steps 2) and 3) multiple times
6. count how many samples would have detected the specified effect (i.e., compute the statistical power)
7. vary your simulation parameters until the desired level of power (e.g., 80%) is achieved

We first draw on the Bergh et al. (2016) data set to estimate the means and the variance-covariance matrix. In this data set, gender is a factor with two levels, male and female. We first need to transform this categorical variable into an integer variable, for instance coding "male" with 0 and "female" with 1. 

```{r means and cov mediation}

Bergh_int <- Bergh 
Bergh_int$gender <- ifelse(Bergh_int$gender == "male", 0, ifelse(Bergh_int$gender == "female", 1, NA))

attach(Bergh_int)

#store means
means_mediation <- c(mean(gender), mean(EP), mean(SP), mean(HP), mean(DP), mean(O1), mean(O2), mean(O3)) |> round(2)

#store covarainces
cov_mediation <- cov(cbind(gender, EP, SP, HP, DP, O1, O2, O3)) |> round(2)


```

::: {.callout-tip}

If we were  interested in a mediation model with only manifest variables, we could also use a useful shiny app developed by Schoemann et al. (2017), which can be accessed via [https://schoemanna.shinyapps.io/mc_power_med/](https://schoemanna.shinyapps.io/mc_power_med/). This app currently enables power analyses for some manifest mediation models: mediation with (i) one mediator, (ii) two parallel mediators, (iii) two serial mediators, and (iv) three parallel mediators. But as we want to incorporate `generalized prejudice` and `openness to experience` as latent variables, we need to write a simulation-based power analyses ourselves.   

:::


Now, we specify the statistical mediation model in `lavaan`. 

```{r specify mediation model}

#specify mediation model
model_mediation <- '

              # measurement model
              generalized_prejudice =~ EP + DP + SP + HP
              openness =~ O1 + O2 + O3
              
              # direct effect
              generalized_prejudice ~ c*gender
              
              # mediator
              openness ~ a*gender
              generalized_prejudice ~ b*openness


              # indirect effect (a*b)
              ab := a*b
           
              # total effect
              total := c + (a*b)

'

```

To verify that this model syntax works properly, we can fit this model to the data set provided by Bergh et al. (2016).

```{r test mediation model with the pilot data}

#fit the SEM to the simulated data set
fit_mediation <- sem(model_mediation, data = Bergh_int)

#display the results
summary(fit_mediation)

```

Now, everything is set up to run the actual power analysis. In the following chunk, we repeatedly simulate data from the specified population and store the p-value of the indirect effect while varying the sample size in a range from 500 to 1500. 

```{r simulate data mediation, cache=TRUE, warning=FALSE}
  
#Set seed to make results reproducible
set.seed(21364)

#test ns between 100 and 1500
ns_mediation <- seq(500, 1500, by=50) 

#prepare empty vector to store results
result_mediation <- data.frame()

#iterations
iterations_mediation <- 500

#write function
sim_mediation <- function(n, model, mu, sigma) {
  

      simulated_data_mediation <- Rfast::rmvnorm(n = n, mu = mu, sigma = sigma) |> as.data.frame()
      fit_mediation_simulated <- sem(model, data = simulated_data_mediation)
  
      p_value_mediation <- parameterestimates(fit_mediation_simulated)[21,]$pvalue
      return(p_value_mediation)
    }


#replicate function with varying ns
for (n in ns_mediation) {  
  
p_values_mediation <- future_replicate(iterations_mediation, sim_mediation(n = n, model = model_mediation, mu = means_mediation, sigma = cov_mediation), future.seed=TRUE)  
result_mediation <- rbind(result_mediation, data.frame(
    n = n,
    power = sum(p_values_mediation < .005)/iterations_mediation)
  )

#The following line of code can be used to track the progress of the simulations 
#This can be helpful for simulations with a high number of iterations and/or a large parameter space which require a lot of time
#I have deactivated this here; to enable it, just remove the "#" sign at the beginning of the next line
#message(paste("Progress info: Simulations completed for n =", n))

}
```

Let's plot the results:

```{r plot power curve mediation}

ggplot(result_mediation, aes(x=n, y=power)) + geom_point() + geom_line() + scale_y_continuous(n.breaks = 10) + scale_x_continuous(n.breaks = 20) + geom_hline(yintercept= 0.8, color = "red")

```

This shows that roughly 1,300 to 1,400 participants will be needed to obtain sufficient power under the assumptions we specified. To achieve a more precise estimate, just increase the number of iterations (and get a cup of coffee while you wait for the results 😅). 

# References

Bergh, R., Akrami, N., Sidanius, J., & Sibley, C. G. (2016). Is group membership necessary for understanding generalized prejudice? A re-evaluation of why prejudices are interrelated. Journal of Personality and Social Psychology, 111(3), 367–395. [https://doi.org/10.1037/pspi0000064](https://www.researchgate.net/publication/306939541_Is_group_membership_necessary_for_understanding_generalized_prejudice_A_re-evaluation_of_why_prejudices_are_interrelated)

Perugini, M., Gallucci, M., & Costantini, G. (2014). Safeguard power as a protection against imprecise power estimates. Perspectives on Psychological Science, 9(3), 319--332. <https://doi.org/10.1177/1745691614528519>

Schoemann, A. M., Boulton, A. J., & Short, S. D. (2017). Determining power and sample size for simple and complex mediation models. Social Psychological and Personality Science, 8(4), 379–386. [https://doi.org/10.1177/1948550617715068](https://www.researchgate.net/publication/317631067_Determining_Power_and_Sample_Size_for_Simple_and_Complex_Mediation_Models)

Wang, Y. A., & Rhemtulla, M. (2021). Power analysis for parameter estimation in structural equation modeling: A discussion and tutorial. Advances in Methods and Practices in Psychological Science, 4(1), 1–17. [https://doi.org/10.1177/2515245920918253](https://doi.org/10.1177/2515245920918253)