Hierarchical_LME_models.qmd

---
title: "Hierarchical models (LMEs)"
format: html
editor: visual
---

# Introduction

Experimental data often contain 'clusters'; these are natural groups (e.g. testing sites, subjects, countries) where observations within each cluster tend to be more similar to one another than those in other clusters. Ignoring this grouped structure of the data violates the assumption of independent observations of simple linear regression, which can lead to biased estimates and unreliable statistical inferences.

This week we will explore hierarchical linear models using data from 10 different schools. The goal is to understand how **study hours** affect **test scores**, accounting for both school-specific differences and overall trends. We will fit models using three approaches: **complete pooling, no pooling**, and **hierarchical modeling** with **random intercepts** and **slopes**.

## Simulating Hierarchical School Data

We'll simulate test scores from 10 different schools, where each school has its own intercept (baseline test score) and slope (effect of study hours on test scores). Each student's test score is influenced by how many hours they studied, with some noise added to the data.

```{r}
# Load necessary libraries
library(ggplot2)
library(dplyr)
library(tidyr)
library(lme4)
library(tibble)
library(truncnorm)
library(patchwork)

# Set seed for reproducibility
set.seed(123)

# Parameters
n_schools <- 10  # Number of schools
n_students <- 100  # Students per school

mu_study_hours <- seq(8, 2, length.out = n_schools)  # Schools with higher average study hours
mu_test_scores <- seq(50, 120, length.out = n_schools)  # Schools with higher average test scores

sigma_beta_0 <- 10  # Std dev of intercepts within schools
mu_beta_1 <- 3  # Average slope for study hours
sigma_beta_1 <- 1  # Std dev of slopes (to ensure positive slopes within schools)
sigma_y <- 10  # Std dev of test scores

# Simulate data for each school
data <- data.frame()
for (i in 1:n_schools) {
  # Simulate study hours for each student in the school
  study_hours <- rtruncnorm(n_students, a = 0, b = Inf, mean = mu_study_hours[i], sd = 1)
  
  # Simulate school-specific intercept (baseline test score)
  beta_0 <- rnorm(1, mean = mu_test_scores[i], sd = sigma_beta_0)
  
  # Simulate school-specific slope (effect of study hours)
  beta_1 <- rnorm(1, mean = mu_beta_1, sd = sigma_beta_1)
  
  # Generate test scores
  test_scores <- beta_0 + beta_1 * study_hours + rnorm(n_students, sd = sigma_y)
  
  # Combine school data into the main data frame
  school_data <- data.frame(school = factor(i), study_hours = study_hours, test_scores = test_scores)
  data <- rbind(data, school_data)
}
```

### Explanation

In this dataset:

-   We have 10 schools, each with 100 students.

-   The intercept (`beta_0`) for each school represents the baseline test score (with 0 study hours), varying by school.

-   The slope (`beta_1`) represents the effect of study hours on test scores, which may differ across schools.

-   Each student's test score is influenced by their study hours and school-specific factors, with some added noise to represent unobserved variability.

## Complete Pooling Model

In the complete pooling model, we assume that all schools have the same intercept and slope, effectively ignoring the hierarchical structure.

$$test\_scores_{i}​=β_0​+β_1​study\_hours_{i}​+ϵ_{i}​ $$

where,

-   $β_0​$ is the common or **fixed intercept** for all schools.

-   $β_1​$ is the common or **fixed slope** for all schools.

-   $ϵ_i​∼N(0,σ^2)$ represents the residual error.

**Assumptions of the complete pooling model:**

1.  The observations (test_scores) are normally distributed.

2.  Observations are independent.

3.  There is a linear relationship between the outcome and predictor variables.

We can already see that this model is inappropriate as assumption 2 is being violated: the test scores of students in the same school are not independent.

![](complete_pooling_model.png)

```{r}
complete_pooling_model_lm <- lm(test_scores ~ study_hours,
  data = data)

summary(complete_pooling_model_lm)
# Extract the fixed effects (coefficients)
fixed_effects <- coef(complete_pooling_model_lm)

# Extract the effect of 'study_hours' on 'test_scores'
study_hours_effect <- fixed_effects["study_hours"]

# Visualize complete pooling model
ggplot(data, aes(x = study_hours, y = test_scores)) +
  geom_point(alpha = 0.5, color = "blue") +
  # Plot the regression line using the lm coefficients
  geom_abline(intercept = fixed_effects["(Intercept)"],
              slope = fixed_effects["study_hours"],
              color = "red") +
  labs(title = "Complete Pooling Model", x = "Study Hours", y = "Test Scores") +
  theme_minimal() +
  # Annotate the correlation using the slope of the regression line (study_hours_effect)
  annotate("text", x = Inf, y = -Inf, 
           label = sprintf("Effect of Study Hours: %.2f", study_hours_effect),
           hjust = 1.1, vjust = -0.5, color = "black", size = 5, fontface = "italic") 


# Vizualize the relationship per school 
ggplot(data, aes(x = study_hours, y = test_scores, color = school)) +
  geom_point(alpha = 0.5) +
  ggtitle("Relationship Between Study Hours and Test Scores")

```

### Explanation

The complete pooling model assumes all schools have the same relationship between study hours and test scores. The red line represents the single fitted linear relationship across all students, ignoring school-specific differences.

It is quite unexpected the there would be a negative relationship between study_hours and test_scores.

From the second plot we can begin to understand what is going on. The data is clustered in groups (i.e. schools), and therefore observations from different schools should not be pooled together. Let's see what happens when we fit a separate linear regression model for each school.

## No Pooling Model

The no pooling model fits a separate linear model for each school, treating the schools independently.

$$test\_scores_{ij}​=β_{0j}​+β_{1j}​study\_hours_{ij}​+ϵ_{ij}$$

where:

-   $β_{0j}$ and $β_{1j}$ ​ are the intercept and slope for school j.

-   $ϵ_{ij}​∼N(0,σ^2)$ represents the residual error.

**Assumptions of the no pooling model:**

1.  The observations (test_scores) are normally distributed.

2.  Every school's model is fit independently of the other schools; there are no common parameters between schools (except from the standard deviation which is the same for all schools).

3.  There is a linear relationship between the outcome and predictor variables. ![](no_pooling_model.png)

```{r}
# Initialize a list to store the models
school_lm_models <- list()

# Get unique schools
unique_schools <- unique(data$school)


# Loop over each school
for (school in unique_schools) {
  # Subset the data for the current school
  school_data <- data[data$school == school, ]
   
  # Run the  linear regression for this school's data
  model <- lm(test_scores ~ study_hours, data = school_data)
  
  # Store the model in the list, using the school name as the key
  school_lm_models[[school]] <- model
}

# Access the models for each school
school_lm_models[['School A']]  # Example of accessing a specific school's model

# To view summaries for each model
for (school in unique_schools) {
  cat("\n\nSummary for", school, ":\n")
  print(summary(school_lm_models[[school]]))
}

# Plot the results
ggplot(data, aes(x = study_hours, y = test_scores, color = factor(school))) +
  geom_point(alpha = 0.5) +
  facet_wrap(~school) +
  geom_smooth(method = "lm", se = FALSE) +  # Adds a regression line without confidence interval shading
  labs(title = "No Pooling LME Model", x = "Study Hours", y = "Test Scores") +
  theme_minimal()
```

### Explanation

The no pooling model fits separate linear regressions for each school, capturing individual school trends. The plot shows distinct linear fits for each school, but without sharing any information between them.

Here we see that when we fit separate linear models for each school, there is a positive relationship between study_hours and test_scores, as expected. Compare this with the negative relationship we got from the overall model. This is called the **Simpson's paradox:** the phenomenon whereby a relationship seen in the overall model disappears or reverses when the data is divided into groups.

One problem with the no pooling model is that it cannot be generalized to a group outside of our sample. It assumes, in other words, that the relationship between study hours and test scores in one school can tell us nothing about this relationship in another school which is 'under-utilizing' the data.

### Note

Before we fit any other model, lets see what happens to our overall model when we add school as a co-variate.

```{r}
model_school_covariate <- lm(test_scores ~ study_hours + factor(school), data = data) 
summary(model_school_covariate)
```

**Interpretation:**

**Intercept:** the average test scores of students in school 1 when study hours = 0.

**study_hours:** the effect of study_hours on test_scores when controlling for differences in mean test_scores across schools.

**schoolX**: the difference in average test_scores between school 1 and schoolX, when controlling for study hours.

Lets see what happens when we add school as an interaction term.

```{r}
model_school_interaction<- lm(test_scores ~ study_hours*factor(school), data = data)
summary(model_school_interaction)
```

**Interpretation:**

**Intercept:** the average test scores of students in school 1 when study hours = 0.

**study_hours:** the effect of study_hours on test_scores in school 1.

**schoolX**: the difference in baseline test_scores when study_hours = 0 between school 1 and school X.

**study_hours:schoolX** :the difference in the slope of the relationship between study_hours and test_scores between school 1 and school X.

## 

## Hierarchical Model with Random Intercepts and Slopes (Partial Pooling model)

Now we'll fit a hierarchical LME model that allows each school to have its own intercept and slope, but the group-level parameters are drawn from a common distribution. This model strikes a balance between the complete pooling and no pooling approaches.

$$test\_scores_{ij}​=β_{0j}​+β_{1j}​study\_hours_{ij}​+ϵ_{ij}​$$

where:

-   $β_{0j}​∼N(μ_{β_0}​​,σ_{β_0}​​)$

-   $β_{1j}​∼N(μ_{β_1}​​,σ_{β_1​}​)$

-   $ϵ_{ij​}∼N(0,σ^2)$

#### **Or another way to think about it,**

$$test\_scores_{ij}​=(β_0 + b_{0j})​+(β_1+ b_{ij})​study\_hours_{ij}​+ϵ_{ij}​ $$

-   $β_0​$ is the common or **fixed intercept** for all schools.

-   $b_{0j}$ is the **random intercept** for school *j*. It represents the deviation of the *j*-th school's intercept from $β_0$.

-   $β_1​$ is the common or **fixed slope** for all schools.

-   $b_{1j}$ is the **random slope** for school *j*. It represents the deviation of the *j*-th school's slope from $β_1$.

-   $ϵij​∼N(0,σ^2)$ represents the residual error.

**Assumptions of the hierarchical model:**

1.  The observations (test_scores) are normally distributed.

2.  Every school's model deviates to some extent from the grand mean and from the mean effect of study_hours. This implies that there is some between-school variability in the individual-level intercept and slope adjustments by school.

3.  There is a linear relationship between the outcome and predictor variables.

![](partial_pooling_model.png)

```{r}
# Fit hierarchical model with random intercepts and slopes using `lme4`
hierarchical_model <- lmer(
  test_scores ~ study_hours + (1  + study_hours | school),
  data = data
)

# Summary of the model
summary(hierarchical_model)

# Extract fixed effects
fixed_effects <- fixef(hierarchical_model)
fixed_slope <- fixed_effects[2]
fixed_intercept <- fixed_effects[1]

# Extract fitted values from the hierarchical model
data$partial_pooling_predicted <- predict(hierarchical_model, newdata = data)
 # Example of creating fixed effect predictions
fixed_effects <- data.frame(
  study_hours = seq(min(data$study_hours), max(data$study_hours), length.out = 100),
  test_scores = fixed_intercept + fixed_slope * seq(min(data$study_hours), max(data$study_hours), length.out = 100)
)


# Visualization of hierarchical model with overall line
ggplot(data, aes(x = study_hours, y = test_scores, color = school)) +
  geom_point(alpha = 0.5) +
  geom_line(aes(y = partial_pooling_predicted), size = 1) +  # Shrinkage lines from random effects model
  geom_abline(intercept=fixed_intercept, slope=fixed_slope, color="black", linetype="dashed", size=1) +
  labs(title = "Hierarchical LME Model with Random Intercepts and Slopes",
       x = "Study Hours",
       y = "Test Scores") +
  theme_minimal()+ annotate("text", x = Inf, y = Inf, label = paste("Fixed Effect Coefficient for Study Hours: ", round(fixed_slope, 2)),
           hjust = 1.1, vjust = 1.5, size = 5, color = "black")
```

```{r}
# Separate linear models for each school (No Pooling)
data <- data %>%
  group_by(school) %>%
  mutate(no_pooling_predicted = predict(lm(test_scores ~ study_hours, data = cur_data())))


# No Pooling Plot: Separate regression lines for each school (no influence from other schools)
ggplot(data, aes(x = study_hours, y = test_scores, color = factor(school))) +
  geom_point(alpha = 0.5) +
  geom_line(aes(y = no_pooling_predicted), size = 1) +  
  # Independent school lines
  labs(title = "No Pooling: Separate Regressions per School", 
       x = "Study Hours", 
       y = "Test Scores") +
  facet_wrap(~school) + 
   theme_minimal() 


# Extract fitted values from the hierarchical model
data$partial_pooling_predicted <- predict(hierarchical_model, newdata = data)
 # Example of creating fixed effect predictions
fixed_effects <- data.frame(
  study_hours = seq(min(data$study_hours), max(data$study_hours), length.out = 100),
  test_scores = fixed_intercept + fixed_slope * seq(min(data$study_hours), max(data$study_hours), length.out = 100)
)

# Plot with fixed effects line
 ggplot(data, aes(x = study_hours, y = test_scores, color = factor(school))) +
  geom_point(alpha = 0.5) +  # Observed data points
  geom_line(aes(y = partial_pooling_predicted), size = 1) +  # Shrinkage lines from random effects model
  #geom_line(data = fixed_effects, aes(y = test_scores), linetype = "dashed", size = 1, color = "black") +  # Fixed effect line
  labs(title = "Partial Pooling: Random Effects Model",
       x = "Study Hours",
       y = "Test Scores") +
  facet_wrap(~school) +  # Separate plot per school
  theme_minimal()
 

```

### Shrinkage

When we compare the individual plots for the no pooling and partial pooling models we can see that the individual slopes are slightly different. This is due to '**Shrinkage':** a phenomenon whereby individual-level estimates are shrunk towards the mean estimates over the entire data. The more extreme the data from a given subject relative to the mean estimates from the data set, or the less data we have from a given subject, the greater the shrinkage.

# Week 13: LME on time-series data

This week we will look at how to fit an LME on time-series data (aka repeated measures or longitudinal data).

Let's start by loading the Cherry blossom dataset from the bayesrules() package. This dataset contains the running times (in minutes) for 36 runners in the annual 10-mile Cherry Blossom race in Wadhington, D.C. The runners are in their 50s or 60s and have competed multiple times.

```{r}
#install.packages("bayesrules")
library(bayesrules)

# Load data
data(cherry_blossom_sample)
running <- cherry_blossom_sample %>% 
  select(runner, age, net)%>% 
  na.omit()

head(running)

ggplot(running, aes(x = runner, y = net)) + 
  geom_boxplot()

```

### Complete pooling model

In the complete pooling model, we assume that all runners have the same intercept and slope, effectively ignoring the hierarchical structure.

$$running\_times_{i}​=β_0​+β_1​age_{i}​+ϵ_{i}​ $$

where,

-   $β_0​$ is the common or **fixed intercept** for all runners.

-   $β_1​$ is the common or **fixed slope** for all runners.

-   $ϵ_i​∼N(0,σ^2)$ represents the residual error.

**Assumptions of the complete pooling model:**

1.  The observations (running_times) are normally distributed.

2.  Observations are independent.

3.  There is a linear relationship between the outcome and predictor variables.

```{r}
# Complete pooling model: treating all runners as a single group
complete_pooling_model <- lm(net ~ age, data = running)

# Summary of the complete pooling model
summary(complete_pooling_model)


ggplot(running, aes(y = net, x = age)) + 
  geom_point()+
  geom_smooth(method = "lm", se = FALSE, color= 'red')

```

**Interpretation**:

-   **Intercept:** The average running time at age 0 (clearly not a useful estimate).

-   **age:** The average difference in running times each year.

This estimate of the age coefficient β1 suggests that running times tend to increase by a mere 0.27 minutes for each year in age. Further, this is not a statistically significant increase. This is unlikely to be true, as our intuition tells us that as people age, they get slower at running.

### No Pooling Model

The no pooling model fits a separate linear model for each runner, treating the runners independently.

$$running\_times_{ij}​=β_{0j}​+β_{1j}​age_{ij}​+ϵ_{ij}$$

where:

-   $β_{0j}$ and $β_{1j}$ ​ are the intercept and slope for runner j.

-   $ϵ_{ij}​∼N(0,σ^2)$ represents the residual error.

**Assumptions of the no pooling model:**

1.  The observations (running_times) are normally distributed.

2.  Every runner's model is fit independently of the other runners; there are no common parameters between runners (except from the standard deviation which is the same for all runners).

3.  There is a linear relationship between the outcome and predictor variables.

```{r}
# Initialize a list to store the modelsr
runner_lm_models <- list()

# Get unique runners
unique_runners <- unique(running$runner)


# Loop over each runner
for (runner in unique_runners) {
  # Subset the data for the current runner
  running_data <- running[running$runner == runner, ]
   
  # Run the  linear regression for this runner's data
  model <- lm(net ~ age, data = running_data)
  
  # Store the model in the list, using the runner name as the key
  runner_lm_models[[runner]] <- model
}

# Access the models for each runner
runner_lm_models[['Runner 1']]  # Example of accessing a specific runner's model

# To view summaries for each model
for (runner in unique_runners) {
  cat("\n\nSummary for", runner, ":\n")
  print(summary(runner_lm_models[[runner]]))
}

# Plot the results
ggplot(running, aes(x = age, y = net, color = factor(runner))) +
  geom_point(alpha = 0.5) +
  facet_wrap(~runner) +
  geom_smooth(method = "lm", se = FALSE) +  # Adds a regression line without confidence interval shading
  labs(title = "No Pooling LME Model", x = "Age", y = "Running Time") +
  theme_minimal()
```

**Interpretation**:

-   **Intercept:** The predicted running time for each runner at age 0.

-   **age:** The difference in running times each year for each runner.

**Problems with the no pooling method:**

1.  We cannot reliably generalize to groups runners those in our sample.

2.  It assumes that data from one runner cannot tell us anything useful about another runner and thus ignores potentially valuable information. This is especially problematic when we have a small number of observations per runner.

## Hierarchical Model with Random Intercepts and Slopes (Partial Pooling model)

$$running\_times_{ij}​=β_{0j}​+β_{1j}​age_{ij}​+ϵ_{ij}​$$

where:

-   $β_{0j}​∼N(μ_{β_0}​​,σ_{β_0}​​)$

-   $β_{1j}​∼N(μ_{β_1}​​,σ_{β_1​}​)$

-   $ϵ_{ij​}∼N(0,σ^2)$

#### **Or another way to think about it,**

$$running\_times_{ij}​=(β_0 + b_{0j})​+(β_1+ b_{ij})​age_{ij}​+ϵ_{ij}​ $$

-   $β_0​$ is the common or **fixed intercept** for all runners.

-   $b_{0j}$ is the **random intercept** for runner *j*. It represents the deviation of the *j*-th runner's intercept from $β_0$.

-   $β_1​$ is the common or **fixed slope** for all runners.

-   $b_{1j}$ is the **random slope** for runner *j*. It represents the deviation of the *j*-th runner's slope from $β_1$.

-   $ϵij​∼N(0,σ^2)$ represents the residual error.

**Assumptions of the hierarchical model:**

1.  The observations (running_times) are normally distributed.

2.  Every runner's model deviates to some extent from the grand mean and from the mean effect of age. This implies that there is some between-runner variability in the individual-level intercept and slope adjustments by runner.

3.  There is a linear relationship between the outcome and predictor variables.

```{r}
library(lmerTest)

# Mixed effects model: allowing individual random effects for intercepts and slopes
mixed_effects_model <- lmer(net ~ age + (1+ age| runner), data = running)

# Summary of the mixed effects model
summary(mixed_effects_model)

# Extract fitted values from the hierarchical model
running$partial_pooling_predicted <- predict(mixed_effects_model, newdata = running)
# Extract fixed effects
fixed_effects <- fixef(mixed_effects_model)
fixed_slope <- fixed_effects[2]
fixed_intercept <- fixed_effects[1]


# Visualization of hierarchical model with overall line
ggplot(running, aes(x = age, y = net, color = runner)) +
  geom_point(alpha = 0.5) +
  geom_line(aes(y = partial_pooling_predicted), size = 1) +  # Shrinkage lines from random effects model
  geom_abline(intercept=fixed_intercept, slope=fixed_slope, color="black", linetype="dashed", size=1) +
  labs(title = "Hierarchical LME Model with Random Intercepts and Slopes",
       x = "Age",
       y = "Running times") +
  theme_minimal()+ annotate("text", x = Inf, y = Inf, label = paste("Fixed Effect Coefficient for age: ", round(fixed_slope, 2)),
           hjust = 1.1, vjust = 1.5, size = 5, color = "black")

```

**Interpretation:**

**Fixed effects**

-   **Intercept:** The predicted average running time when age is 0.

-   **age:** The average difference in running times each year.

**Random effects**

-   **Groups (runner)**: The random effects are grouped by runner.

-   **Intercept**: This indicates the degree to which each runner's baseline performance (when age is 0) varies from the overall average intercept.

-   **age**: This tells us how much the effect of age on running times differs for each runner.

-   **Corr**: This shows the correlation between the random intercepts and slopes for runners. A positive correlation suggests that runners with higher baseline running times (higher intercepts) tend to show less decline in performance with age (smaller negative slope), while a negative correlation would suggest the opposite.

**Residual Variance**

-   **Residual**: This is the variance in running times that remains unexplained by the model. The standard deviation tells us the typical size of the residuals, indicating how much the observed running times deviate from the predicted running times on average.

#### Visualize the shirnkage

```{r}
# Plot the no pooled graphs 
ggplot(running, aes(x = age, y = net, color = factor(runner))) +
  geom_point(alpha = 0.5) +
  facet_wrap(~runner) +
  geom_smooth(method = "lm", se = FALSE) +  # Adds a regression line without confidence interval shading
  labs(title = "No Pooling Model", x = "Age", y = "Running Times") +
  facet_wrap(~runner)+
  theme_minimal()

# Plot the partially polled graphs 
 ggplot(running, aes(x = age, y = net, color = factor(runner))) +
  geom_point(alpha = 0.5) +  # Observed data points
  geom_line(aes(y = partial_pooling_predicted), size = 1) +  # Shrinkage lines from random effects model
  labs(title = "Partial Pooling: Random Effects Model",
       x = "Age",
       y = "Running Times") +
  facet_wrap(~runner) +  # Separate plot per school
  theme_minimal()
 

```

## Week 14: Bayesian LME 

### Load surprise pilot 21 data and get a random sample of 10 people

```{r}
# Load necessary libraries
library(tidyverse)
library(ggplot2)
library(dplyr)
library(tidyr)
library(lme4)
library(tibble)
library(truncnorm)
library(brms)
library(rstan)

set.seed(123)
pilot_21_data <-read.csv("pilot_21_variables.csv")

#Identify unique participants
unique_participants <- unique(pilot_21_data$Random_ID)

#Randomly sample 10 unique participants
set.seed(123)  # Setting seed for reproducibility
sampled_participant_ids <- sample(unique_participants, 10)

#Extract trials corresponding to the sampled participants
sampled_pilot_21_data <- pilot_21_data %>% 
  filter(Random_ID %in% sampled_participant_ids)

# Print sampled data
print(sampled_pilot_21_data)

```

### Complete Pooling Model

In the complete pooling model, we assume that all participants have the same intercept and slope, ignoring the hierarchical structure.

$$
Y_{i}|β_{0},β_1,σ ∼N(μ_i,σ^2)\ with\ μ_i=β_0 + β_1X_{i}
$$

$$Anxiety_{i}​=β_0​+β_1​PE_{i}​+ϵ_{i}​ $$

where,

-   $Y_i$ and $X_i$ represent participant $i$'s outcome (anxiety) and predictor variable (PE), respectively.

-   $β_0​$ is the common or **fixed intercept** for all participants.

-   $β_1​$ is the common or **fixed slope** for all participants.

-   $ϵ_i​∼N(0,σ^2)$ represents the residual error.

    **Priors**:

-   $β_0​∼N(50,10)$

-   $β_1∼N(0,5)$

-   $σ∼Cauchy(0,2.5)$

**Assumptions of the complete pooling model:**

1.  The observations (anxiety ratings) are normally distributed.

2.  Observations are independent.

3.  There is a linear relationship between the outcome and predictor variables.

![](complete_pooling_model.png)

#### brms

```{r}

# Fit complete pooling model using `brms`
complete_pooling_model <- brm(
  Response_Ax ~ Response_SubjPE,
  data = sampled_pilot_21_data,
  family = gaussian(),
  prior = c(
    prior(normal(50, 10), class = Intercept),
    prior(normal(0, 5.5), class = b),
    prior(cauchy(0, 2.5), class = sigma)
  ),
  iter = 2000, warmup = 1000, chains = 4, cores = 2
)

# Summary of the complete pooling model
summary(complete_pooling_model)
fixed_effects <- fixef(complete_pooling_model)
PE_effect <- fixed_effects["Response_SubjPE", "Estimate"]


# Visualize complete pooling model
ggplot(sampled_pilot_21_data, aes(x = Response_SubjPE, y = Response_Ax)) +
  geom_point(alpha = 0.5, color = "blue") +
  geom_abline(intercept = fixef(complete_pooling_model)["Intercept", "Estimate"],
              slope = fixef(complete_pooling_model)["Response_SubjPE", "Estimate"],
              color = "red") +
  labs(title = "Complete Pooling Bayesian Model", x = "Prediction Error", y = "Anxiety") +
  theme_minimal()+
  annotate("text", x = Inf, y = -Inf, label = sprintf("Correlation: %.2f", PE_effect),
           hjust = 1.1, vjust = -0.5, color = "black", size = 5, fontface = "italic") 

```

This positive correlation between PE and Anxiety suggests that the more positive the PE (i.e. the more positively surprising the outcome) the more anxious people get. This is highly counter-intuitive, I think. Let's see what happens when we model the hierarchical structure of the data.

### Partial Pooling/Hierarchical Bayesian Model with Random Intercepts

Now we'll fit a hierarchical Bayesian model that allows each participant to have their own intercept but the group-level parameters are drawn from a common distribution.

$$Anxiety_{ij}​=β_{0j}​+β_{1}​PE_{ij}​+ϵ_{ij}​$$

where:

-   $β_{0j}​∼N(μ_{β_0}​​,σ_{β_0}​​)$

-   $β_1​$ is the common or fixed slope for all participants.

-   $ϵ_{ij​}∼N(0,σ^2)$

**Priors**:

$μ_{β_0}​​∼N(50,10)$

$σ_{β0}​​∼Cauchy(0,2.5)$

$β_1∼N(0,5)$

$σ∼Cauchy(0,2.5)$

### **Or another way to think about it,**

$$
Y_{ij}|β_{0j},β_{1},σ_y ∼N(μ_{ij},σ^2_y)\ with\ μ_{ij}=(β_0 + b_{0j})​+β_1​X_{ij}
$$

$$Anxiety_{ij}​=(β_0 + b_{0j})​+β_1​study\_hours_{ij}​+ϵ_{ij}​ $$

-   $Y_i$ and $X_i$ represent the outcome (anxiety) and predictor variable (prediction error), respectively, for trial $i$ from participant $j$.

-   $β_0​$ is the common or **fixed intercept** for all participants.

-   $b_{0j}$ is the **random intercept** for participant *j*. It represents the deviation of the *j*-th participant's intercept from $β_0$.

-   $β_1​$ is the common or **fixed slope** for all participants.

-   $ϵ_{ij}​∼N(0,σ^2)$ represents the residual error.

    **Priors**:

-   $β_0​∼N(50,10)$

-   $b_{0j}∼N(0,σ_{0j})$

-   $β_1∼N(0,5)$

-   $σ_{0j} ∼Cauchy(0,2.5)$

-   $σ_y∼Cauchy(0,2.5)$

**Assumptions of the hierarchical model:**

1.  The observations (anxiety) are normally distributed.

2.  Every participant's model deviates to some extent from the grand mean. This implies that there is some between-subject variability in the individual-level intercept adjustments.

3.  There is a linear relationship between the outcome and predictor variables.

![](partial_pooling_model.png)

#### brms

```{r}
#brms
library(brms)

fit_brms_lme <- brm(
  Response_Ax ~ Response_SubjPE + (1 | Random_ID),
  data = sampled_pilot_21_data,
  family = gaussian(),
  prior = c(
    prior(normal(0, 1), class = "b", coef = "Response_SubjPE"),  # Using normal priors throughout here
    prior(cauchy(0, 2.5), class = "sigma"),# Prior for the error term, the Cauchy distribution has some great properties with heavy tails and is quite robust
    prior(cauchy(0, 2.5), class = "sd" , group= "Random_ID")
  ),
  iter = 2000,  # This is for the MCMC
  chains = 4,
  seed = 123
)


fit_brms_lme

# These are now posterior distributions
plot(fit_brms_lme)


# Extracting the random effects
random_effects <- ranef(fit_brms_lme)
print(random_effects)

# Optional: Extracting the random effects for Random_ID specifically
random_effects_random_id <- random_effects$Random_ID
print(random_effects_random_id)


# Create new data for predictions based only on fixed effects
new_data_fixed <- sampled_pilot_21_data %>%
  distinct(Response_SubjPE) %>%
  mutate(Random_ID = NA)  # For fixed effects, we set Random_ID to NA to ignore random effects

# Get predictions for fixed effects only
pred_fixed <- fitted(fit_brms_lme, newdata = new_data_fixed, re_formula = NA)  # re_formula = NA for fixed effects
new_data_fixed$Response_Ax <- pred_fixed[, "Estimate"]

# Get predictions including random effects
pred_random <- fitted(fit_brms_lme, newdata = sampled_pilot_21_data)

# Add these predictions back to the original data
sampled_pilot_21_data$predicted_random <- pred_random[, "Estimate"]

# Plot with ggplot
ggplot() +
  # Random effects: one line per Random_ID, colored by Random_ID
  geom_line(data = sampled_pilot_21_data, aes(x = Response_SubjPE, y = predicted_random, group = Random_ID, color = as.factor(Random_ID)), 
            alpha = 0.7) +
  # Scatter plot of the raw data, points colored by Random_ID
  geom_point(data = sampled_pilot_21_data, aes(x = Response_SubjPE, y = Response_Ax, color = as.factor(Random_ID)), 
             alpha = 0.8) +
  # Fixed effects: a single line (not colored by Random_ID)
  geom_line(data = new_data_fixed, aes(x = Response_SubjPE, y = Response_Ax), 
            color = "black", size = 1.2, linetype = "dashed") +
  labs(
    title = "Fixed and Random Effects Plot Colored by Random_ID",
    x = "Response_SubjPE",
    y = "Response_Ax",
    color = "Random_ID"
  ) +
  theme_minimal() +
  theme(legend.position = "right")


```

Plot the individual regressions to compare with the LME

```{r}

# Fit individual linear models (lm) for each Random_ID
lm_fits <- sampled_pilot_21_data %>%
  group_by(Random_ID) %>%
  do(model = lm(Response_Ax ~ Response_SubjPE, data = .))

# Extract the predicted values for each Random_ID
predictions <- lm_fits %>%
  rowwise() %>%
  do({
    data.frame(
      Response_SubjPE = .$model$model$Response_SubjPE,
      Random_ID = .$Random_ID,
      predicted = predict(.$model),
      Response_Ax = .$model$model$Response_Ax
    )
  }) %>%
  ungroup()

# Now, plot the individual regression lines and points for each Random_ID
ggplot(predictions, aes(x = Response_SubjPE, y = Response_Ax, color = as.factor(Random_ID))) +
  # Add individual regression lines for each Random_ID
  geom_line(aes(y = predicted, group = Random_ID), alpha = 0.7) +
  # Add the actual data points for each Random_ID
  geom_point(aes(y = Response_Ax), alpha = 0.8) +
  labs(
    title = "Individual Linear Models by Random_ID",
    x = "Response_SubjPE",
    y = "Response_Ax",
    color = "Random_ID"
  ) +
  theme_minimal() +
  theme(legend.position = "right")

```

#### stan

```{r}
#stan
library(dplyr)
library(rstan)
# Prepare data for Stan
stan_data <- list(
  N = nrow(sampled_pilot_21_data),
  K = 1,  # Number of predictors
  J = length(unique(sampled_pilot_21_data$Random_ID)),
  y = sampled_pilot_21_data$Response_Ax,
  x = sampled_pilot_21_data$Response_SubjPE,
  participant = as.numeric(factor(sampled_pilot_21_data$Random_ID))
)

stan_model_code <- "
data {
  int<lower=0> N;         // number of observations
  int<lower=0> K;         // number of predictors
  int<lower=1> J;         // number of participants
  vector[N] y;            // response variable
  vector[N] x;            // predictor variable
  int<lower=1,upper=J> participant[N];  // participant ID for each observation
}

parameters {
  real alpha;             // intercept
  real beta;              // slope
  real<lower=0> sigma;    // error scale
  vector[J] u;            // random effects
  real<lower=0> tau;      // random effect standard deviation
}

model {
  // Priors
  alpha ~ normal(50, 10);
  beta ~ normal(0, 1);
  sigma ~ cauchy(0, 2.5);
  tau ~ cauchy(0, 2.5);

  // Likelihood
  for (n in 1:N) {
    y[n] ~ normal(alpha + beta * x[n] + u[participant[n]], sigma);
  }

  // Random effects
  u ~ normal(0, tau);
}
"
# Compile the model
stan_model <- stan_model(model_code = stan_model_code)

# Fit the model
fit <- sampling(stan_model, data = stan_data, iter = 2000, chains = 4, seed = 123)

# Print the fit summary
print(fit)

# Extract posterior samples
posterior_samples <- rstan::extract(fit)

# Calculate mean and 95% credible intervals for random effects (deviations)
u_means <- colMeans(posterior_samples$u)
u_lower <- apply(posterior_samples$u, 2, quantile, probs = 0.025)
u_upper <- apply(posterior_samples$u, 2, quantile, probs = 0.975)

# Calculate the intercept mean
alpha_mean <- mean(posterior_samples$alpha)

# Create a data frame for plotting
deviations_df <- data.frame(
  participant = unique(sampled_pilot_21_data$Random_ID),
  mean_deviation = u_means + alpha_mean,  # Adjust deviations
  lower_bound = u_lower + alpha_mean,     # Adjust lower bound
  upper_bound = u_upper + alpha_mean      # Adjust upper bound
)

# Plot the deviations with intercept line
ggplot(deviations_df, aes(x = participant, y = mean_deviation)) +
  geom_point() +
  geom_errorbar(aes(ymin = lower_bound, ymax = upper_bound), width = 0.2) +
  geom_hline(yintercept = alpha_mean, linetype = "dashed", color = "red") +  # Intercept line
  labs(title = "Individual Deviations from Intercept",
       x = "Participant ID",
       y = "Deviation from Intercept") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  annotate("text", x = 1, y = alpha_mean + 1, label = paste("Intercept =", round(alpha_mean, 2)), color = "red", hjust = 0)
```

**Interpretation**

-   **alpha:** The fixed intercept; the average anxiety across participants when PE=0

-   **beta:** The fixed slope; the average drop in anxiety which every point increase in PE.

-   **sigma:** The standard deviation of anxiety ratings across participants.

-   **u\[X\]:** Participant X's deviation from the fixed intercept.

-   **tau:** The standard deviation in intercepts across participants.

-   **lp\_\_** : log_posterior = log_likelihood + log_prior

#### LME

Let's compare our results from the Bayesian hierarchical model to the LME model.

```{r}

lme_model <- lmer(Response_Ax ~Response_SubjPE + (1 | Random_ID), data = sampled_pilot_21_data)
summary(lme_model)
```

We see that the fixed slope for PE in the hierarchical model is now negative. Therefore the more negatively surprising the feedback, the more anxious participants report to feel. This is another example of the **Simpson's paradox.**