-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathMLM_Assignment.Rmd
705 lines (467 loc) · 33.1 KB
/
MLM_Assignment.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
---
title: "Assessing the Impact of a Stress Reduction Treatment on Nurses Using a 3-Level (Longitudinal) Multisite Randomised Control Trial"
author: "fmcv76"
date: "02/04/24"
output:
pdf_document: default
html_document: default
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r include = FALSE}
# Model section titles
title_1 = "3-Level Intercept-Only Model for Time within Nurses within Hospitals"
title_2 = "3-Level Random Intercept Model with Time Stamp Time Level Covariate"
title_3 = "Comparison of 3-Level Intercept-Only Model with 3-Level Random Intercept Model with Time Stamp Time Level Covariate"
title_4 = "3-Level Random Intercept Model with Time Stamp Time Level Covariate and Nurse Level Covariates"
title_5 = "Comparison of 3-Level Random Intercept Model with Time Stamp Time Level Covariate and 3-Level Random intercept Model with Time Stamp Time Level Covariate and Treatment, Experience, and Gender Nurse Level Covariates"
title_6 = "3-Level Random Intercept Model with Time Stamp Time Level Covariate, Nurse Level Covariates, and Hospital Size Hospital Level Covariate"
title_7 = "Comparison of 3-Level Random Intercept Model with Time Stamp Time Level Covariate and Nurse Level Covariates with 3-Level Random Intercept Model with Time Stamp Time Level Covariate, Nurse Level Covariates, and Hospital Size Hospital Level Covariate"
title_8 = "3-Level Random Intercept Model with Time Stamp Time Level Covariate (Both Fixed Effect and Random Slope), Nurse Level Covariates, and Hospital Size Hospital Level Covariate"
title_9 = "3-Level Random Intercept Model with Time Stamp Time Level Covariate, Nurse Level Covariates, and Hospital Size Hospital Level Covariate with Interaction Between Treatment and Time"
title_10 = "Comparison of 3-Level Random Intercept Model with Time Stamp Time Level Covariate, Nurse Level Covariates, and Hospital Size Hospital Level Covariate with 3-Level Random Intercept Model with Time Stamp Time Level Covariate, Nurse Level Covariates, and Hospital Size Hospital Level Covariate with Interaction Between Treatment and Time"
```
## Initial Data Loading and Checks
```{r include = FALSE}
library(lme4)
library(lmerTest)
library(performance)
library(ggplot2)
library(scales)
library(sjPlot)
library(tidyverse)
library(tinytex)
library(wordcountaddin)
library(pbkrtest)
library(emmeans)
```
```{r}
# Load data
filepath = "https://andygolightly.github.io/teaching/MATH43515/summative/alasdair.csv"
mst_data_wide = read.csv(filepath, header = TRUE)
# Preprocess long data
mst_data = mst_data_wide %>%
tidyr::pivot_longer(cols = c(Responset1, Responset2, Responset3),
names_to = "Time",
values_to = "Response") %>% # Reshape data to long format
dplyr::mutate(Time = case_when(Time == "Responset1" ~ "1",
Time == "Responset2" ~ "2",
Time == "Responset3" ~ "3",
TRUE ~ NA)) %>%
dplyr::rename(Treatment = Trt, Nurse = ID) %>%
dplyr::mutate(Nurse = as.factor(Nurse),
Hospital = as.factor(Hospital),
Treatment = as.factor(Treatment),
Gender = as.factor(Gender),
Size = as.factor(Size),
Time = as.factor(Time))
# Preprocess wide data
mst_data_wide = mst_data_wide %>%
dplyr::rename(Treatment = Trt, Nurse = ID) %>%
dplyr::mutate(Nurse = as.factor(Nurse),
Hospital = as.factor(Hospital),
Treatment = as.factor(Treatment),
Gender = as.factor(Gender),
Size = as.factor(Size))
# Check for NAs
colSums(is.na(mst_data))
```
## Introduction
Randomised Control Trials (RCTs) are used to assess the effectiveness of an intervention. Individuals are randomly assigned to either the treatment or control group and this aims to prevent systematic differences between intervention groups with regards to both observables and unobservables, preventing selection into treatment bias. Multisite Randomised Control Trials (MSRCTs) are RCTs which take place across multiple locations with individuals still the unit of randomisation. MSRCTs can allow for a more diverse demographic of individuals studied by including multiple locations, increasing the generalisability of findings. In a healthcare setting this can help ensure that interventions are effective across different demographic groups and geographic locations. However, MSRCTs can be practically and ethically challenging given the intervention randomisation at the individual level.
This report will assess the results of a 3-level (longitudinal) MSRCT. Nurses from 20 Hospitals were randomly assigned to either the control or treatment group, with the treatment group receiving a stress reduction training intervention. Nurse Stress Scores were collected for 10 Nurses per Hospital, monthly for 3 months post intervention.
The variables collected as part of the MSRCT were:
- __Time__ - Months post treatment.
- __Nurse__ - Anonymised Nurse ID.
- __Hospital__ - Anonymised Hospital ID.
- __Treatment__ - Flag for whether a Nurse is in the control (0) or treatment (1) group.
- __Experience__ - Nurse experience in years.
- __Gender__ - Flag for gender, male (0) or female (1).
- __Size__ - Flag for whether the A&E department is small (0) or large (1).
- __Response__ - Stress Score (0-100).
__Research Question:__ Does the stress reduction treatment have a significant impact on Nurse Stress Scores and does this impact vary over Time?
__Summary Statistics__
```{r echo = FALSE}
# Summary statistics
mst_data_wide %>%
dplyr::select(-c(Nurse, Hospital)) %>%
summary()
```
```{r echo = FALSE, fig.height = 4.25}
# Distribution of stress scores at month 1 by intervention group histogram
ggplot(data = mst_data_wide, aes(x = Responset1, fill = Treatment)) +
geom_histogram(aes(y = after_stat(count)), binwidth = 1, colour = "black", position = "identity", alpha = 0.3) +
scale_fill_manual(values = c("1" = "red", "0" = "blue"), labels = c("1" = "Treatment Group", "0" = "Control Group"), guide_legend(title = "")) +
scale_x_continuous(breaks = breaks_width(1), minor_breaks = NULL) +
scale_y_continuous(breaks = breaks_width(2)) +
labs(x = "Stress Score", y = "Count", title = "Distribution of Stress Scores Overlaid by Intervention Group - Month 1") +
theme(panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.5), axis.text = element_text(color = "black", size = 10),
axis.title = element_text(size = 12), plot.title = element_text(size = 12, face = "bold"))
```
```{r echo = FALSE, fig.height = 4.25}
# Distribution of stress scores at month 2 by intervention group histogram
ggplot(data = mst_data_wide, aes(x = Responset2, fill = Treatment)) +
geom_histogram(aes(y = after_stat(count)), binwidth = 1, colour = "black", position = "identity", alpha = 0.3) +
scale_fill_manual(values = c("1" = "red", "0" = "blue"), labels = c("1" = "Treatment Group", "0" = "Control Group"), guide_legend(title = "")) +
scale_x_continuous(breaks = breaks_width(1), minor_breaks = NULL) +
scale_y_continuous(breaks = breaks_width(2)) +
labs(x = "Stress Score", y = "Count", title = "Distribution of Stress Scores Overlaid by Intervention Group - Month 2") +
theme(panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.5), axis.text = element_text(color = "black", size = 10),
axis.title = element_text(size = 12), plot.title = element_text(size = 12, face = "bold"))
```
```{r echo = FALSE, fig.height = 4.25}
# Distribution of stress scores at month 3 by intervention group histogram
ggplot(data = mst_data_wide, aes(x = Responset3, fill = Treatment)) +
geom_histogram(aes(y = after_stat(count)), binwidth = 1, colour = "black", position = "identity", alpha = 0.3) +
scale_fill_manual(values = c("1" = "red", "0" = "blue"), labels = c("1" = "Treatment Group", "0" = "Control Group"), guide_legend(title = "")) +
scale_x_continuous(breaks = breaks_width(1), minor_breaks = NULL) +
scale_y_continuous(breaks = breaks_width(2)) +
labs(x = "Stress Score", y = "Count", title = "Distribution of Stress Scores Overlaid by Intervention Group - Month 3") +
theme(panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.5), axis.text = element_text(color = "black", size = 10),
axis.title = element_text(size = 12), plot.title = element_text(size = 12, face = "bold"))
```
A naive analysis of the above histograms indicates that there is an initial treatment effect and that it reduces over time. The Stress Score distributions of the control and treatment groups start with little overlap, and with each time step the treatment group distribution converges towards the control group distribution i.e. the treatment group Stress Scores increase over time post treatment with the control group remaining similar.
```{r echo = FALSE}
# Distribution of stress scores over time by intervention group
ggplot(data = mst_data, aes(x = Time, y = Response, colour = Treatment)) +
stat_boxplot(geom = "errorbar", width = 0.4, coef = 1.5, position = position_dodge(width = 0.75)) +
geom_boxplot() +
scale_colour_manual(values = c("1" = "#EC4646", "0" = "#4646EC"), labels = c("1" = "Treatment Group", "0" = "Control Group"), guide_legend(title = "")) +
scale_y_continuous(breaks = breaks_width(2)) +
labs(x = "Time (Months Post Treatment)", y = "Stress Score", title = "Distribution of Stress Scores Over Time by Intervention Group") +
theme(panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.5), axis.text = element_text(color = "black", size = 10),
axis.title = element_text(size = 12), plot.title = element_text(size = 12, face = "bold"))
```
A naive analysis of the above plot indicates that the median Stress Score in the treatment group is ~4.5 lower than the control group one month after treatment, with this reducing to ~4 and ~2.5 lower after two and three months respectively. The treatment group Stress Scores increase over time post treatment converging towards the control group at an increase in median Stress Score of ~1 per month.
```{r echo = FALSE}
# Make time numeric for modelling
mst_data = mst_data %>%
dplyr::mutate(Time = as.numeric(Time))
```
\newpage
## Methods
Multilevel models help solve the problem when analysing hierarchical data of lower-level-units belonging to the same upper-level-item being correlated. Multilevel models are appropriate for use in this case given the hierarchical structure of the data from the MSRCT. In this context multilevel models can provide more accurate estimates of treatment effects than models which ignore the hierarchical data structure by accounting for the correlation within clusters. The Intraclass Correlation Coefficient (ICC) is a measure of how much variance in the response can be explained by the hierarchical grouping structure of the data. The Variance Partition Coefficient (VPC) is a measure of how much of the variance in the response can be explained by each level in the grouping structure. Both ICC and VPC can be used to assess whether a grouping structure should be included in modelling.
ICC and VPC values can be calculated as follows:
$$ICC=\frac{\sigma^2_{Hospitals}+\sigma^2_{Nurses}}{\sigma^2_{Hospitals}+\sigma^2_{Nurses}+\sigma^2_{Time}}$$
$$VPC_{Hospitals}=\frac{\sigma^2_{Hospitals}}{\sigma^2_{Hospitals}+\sigma^2_{Nurses}+\sigma^2_{Time}}$$
$$VPC_{Nurses}=\frac{\sigma^2_{Nurses}}{\sigma^2_{Hospitals}+\sigma^2_{Nurses}+\sigma^2_{Time}}$$
Where:
- $\sigma^2_{\text{Time}}$: Variance within Nurses.
- $\sigma^2_{\text{Nurses}}$: Variance between Nurses within Hospitals.
- $\sigma^2_{\text{Hospitals}}$: Variance between Hospitals.
The 3-level structure of the data is as follows: Time nested within Nurses nested within Hospitals.
- The covariate at the Time level is: Time Stamp.
- The covariates at the Nurse level are: Treatment, Experience, Gender.
- The covariate at the Hospital level is: Hospital Size.
A bottom-up modelling approach will be taken as follows:
- Fit the empty model checking ICC/VPC values for whether to include hierarchical structure.
- Include lower level covariates and hypothesis test for their inclusion.
- Include higher level covariates and hypothesis test for their inclusion.
- Include random slopes and hypothesis test for their inclusion.
- Include interaction terms and hypothesis test for their inclusion.
\newpage
## Analysis
__`r title_1`__
```{r}
# Fit intercept-only model for time nested within nurses nested within hospitals
hospital_nurse_intercept_only_model = lmer(formula = Response ~
1 + (1 | Hospital) +
(1 | Hospital : Nurse),
data = mst_data)
summary(hospital_nurse_intercept_only_model)
# Calculate ICC
icc(hospital_nurse_intercept_only_model)
```
__Calculating ICC and VPC Values__
```{r}
# Extract summary of variances
var_summary = as.data.frame(VarCorr(hospital_nurse_intercept_only_model))
sig = var_summary$vcov[3] # Residual variance
sigv = var_summary$vcov[2] # RE variance for Hospital
sigu = var_summary$vcov[1] # RE variance for Nurse
totalvar = sum(var_summary$vcov) # Total variance
vpc_hospital = round(100 * sigv / totalvar, 2)
vpc_nurse = round(100 * sigu / totalvar, 2)
icc = round(100 * (sigu + sigv) / totalvar, 2)
```
The ICC is `r icc`% meaning `r icc`% of the variation in Stress Scores can be explained by the Nurses nested within Hospitals grouping structure. The VPC at the Nurse level is `r vpc_nurse`% and the VPC at the Hospital level is `r vpc_hospital`%, meaning `r vpc_nurse`% of the variation in Stress Scores can be explained by the Nurses grouping structure and `r vpc_hospital`% by the Hospitals grouping structure. The high within Nurse correlation makes sense given that measurements pertaining to an individual Nurse should exhibit larger correlation than measurements from different Nurses. Therefore, the Nurse grouping structure should be include in modelling. The `r vpc_hospital`% variability between Hospitals is a relatively large VPC value in a healthcare setting and therefore the Hospital grouping structure should be included in modelling.
__`r title_2`__
```{r}
# Fit random intercept model for time within nurses within hospitals
# with time level covariate
ri_Time_model = lmer(formula = Response ~
1 + Time +
(1 | Hospital) +
(1 | Hospital : Nurse),
data = mst_data)
summary(ri_Time_model)
```
Time Stamp is statistically significant.
__`r title_3`__
Test the null hypothesis that the fixed effect of Time Stamp is 0 against an alternative that it is not 0.
```{r}
anova(hospital_nurse_intercept_only_model, ri_Time_model)
```
The null hypothesis is rejected and Time Stamp is needed.
__`r title_4`__
```{r}
# Fit random intercept model for time within nurses within hospitals
# with time level covariate and nurse level covariates
ri_Time_nurseCovs_model = lmer(formula = Response ~
1 + Time + Treatment + Experience + Gender +
(1 | Hospital) +
(1 | Hospital : Nurse),
data = mst_data)
summary(ri_Time_nurseCovs_model)
```
Treatment is statistically significant.
Experience is statistically significant.
Gender is statistically significant.
__`r title_5`__
1. Test the null hypothesis that the fixed effect of Treatment is 0 against an alternative it is not 0.
2. Test the null hypothesis that the fixed effect of Experience is 0 against an alternative it is not 0.
3. Test the null hypothesis that the fixed effect of Gender is 0 against an alternative it is not 0.
```{r}
# Fit random intercept model for time within nurses within hospitals
# with Time time level covariate and Treatment nurse level covariate
ri_Time_Treatment_model = lmer(formula = Response ~
1 + Time + Treatment +
(1 | Hospital) +
(1 | Hospital : Nurse),
data = mst_data)
# Fit random intercept model for time within nurses within hospitals
# with Time time level covariate and Treatment and Experience nurse level covariates
ri_Time_Treatment_Experience_model = lmer(formula = Response ~
1 + Time + Treatment + Experience +
(1 | Hospital) +
(1 | Hospital : Nurse),
data = mst_data)
# Fit random intercept model for time within nurses within
# hospitals with Time time level covariate and Treatment,
# Experience, and Gender nurse level covariates
ri_Time_nurseCovs_model = lmer(formula = Response ~
1 + Time + Treatment + Experience + Gender +
(1 | Hospital) +
(1 | Hospital : Nurse),
data = mst_data)
anova(ri_Time_model, ri_Time_Treatment_model,
ri_Time_Treatment_Experience_model, ri_Time_nurseCovs_model)
```
1. The null hypothesis is rejected and Treatment is needed, and should be included regardless given it is the effect being estimated.
2. The null hypothesis is rejected and Experience is needed.
3. The null hypothesis is rejected and Gender is needed.
\newpage
__`r title_6`__
```{r}
# Fit random intercept model for time within nurses within hospitals
# with Time time level covariate, nurse level covariates, and Hospital Size
# hospital level covariate
ri_Time_nurseCovs_Size_model = lmer(formula = Response ~
1 + Time + Treatment + Experience + Gender + Size +
(1 | Hospital) +
(1 | Hospital : Nurse),
data = mst_data)
summary(ri_Time_nurseCovs_Size_model)
```
Hospital Size is statistically significant.
__`r title_7`__
Test the null hypothesis that the fixed effect of Hospital Size is 0 against an alternative that it is not 0.
```{r}
anova(ri_Time_nurseCovs_model, ri_Time_nurseCovs_Size_model)
```
The null hypothesis is rejected and Hospital Size is needed.
__`r title_8`__
```{r}
# Fit random intercept model for time within nurses within hospitals
# with Time time level covariate (Both Fixed Effect and Random Slope),
# nurse level covariates, and Hospital Size hospital level covariate
ri_Time_slope_nurseCovs_Size_model = lmer(formula = Response ~
1 + Time + Treatment + Experience + Gender + Size +
(1 | Hospital) +
(1 + Time | Hospital : Nurse),
data = mst_data)
summary(ri_Time_slope_nurseCovs_Size_model)
```
Test the null hypothesis that the random slope for Time is 0 against an alternative that it is not 0.
```{r}
ranova(ri_Time_slope_nurseCovs_Size_model)
```
The null hypothesis is retained and the random slope for Time is not needed.
__`r title_9`__
```{r}
# Fit random intercept model for time within nurses within hospitals
# with Time time level covariate, nurse level covariates, and Hospital Size
# hospital level covariate with interaction between Treatment and Time
final_model = lmer(formula = Response ~
1 + Treatment + Time + Treatment:Time + Experience + Gender + Size +
(1 | Hospital) +
(1 | Hospital : Nurse),
data = mst_data)
summary(final_model)
```
The interaction between Treatment and Time is statistically significant.
Time is not statistically significant but should be retained given the statistically significant Treatment Time interaction.
__`r title_10`__
Test the null hypothesis that the interaction between Treatment and Time is 0 against an alternative that it is not 0.
```{r}
anova(ri_Time_nurseCovs_Size_model, final_model)
```
The null hypothesis is rejected and the interaction between Treatment and Time is needed.
\newpage
__Model Diagnostics__
```{r}
plot(final_model)
```
- __Linearity Assumption__ - Points spread roughly evenly above and below the 0 line with some visible lines assessed to be as a result of the discrete nature of Response, therefore the assumption of a linear relationship between predictor variables and the response variable plausibly holds.
- __Homoscedasticity Assumption__ - There is not an obvious change in spread for the residuals over the range of fitted values, therefore the assumption of residuals having constant variance across different values of the response variable plausibly holds.
```{r}
residuals_1_to_n_minus_1 = residuals(final_model)[1:(length(residuals(final_model)) - 1)]
residuals_2_to_n = residuals(final_model)[2:length(residuals(final_model))]
plot(residuals_1_to_n_minus_1, residuals_2_to_n)
cor(residuals_1_to_n_minus_1, residuals_2_to_n)
```
There appears to be a small association between adjascent residuals so residuals were randomised and reanalysed.
```{r}
# Check if randomising rows reduces correlation of adjascent residuals
set.seed(123)
indices = sample(seq(1, nrow(mst_data), by = 1), replace = FALSE,
prob = rep(1 / nrow(mst_data), nrow(mst_data)))
mst_data_randomised = mst_data[order(indices), ]
# Refit final model with randomised row order
model_rand = lmer(formula = Response ~
1 + Treatment + Time + Treatment:Time + Experience + Gender + Size +
(1 | Hospital) +
(1 | Hospital : Nurse),
data = mst_data_randomised)
residuals_1_to_n_minus_1 = residuals(model_rand)[1:(length(residuals(model_rand)) - 1)]
residuals_2_to_n = residuals(model_rand)[2:length(residuals(model_rand))]
plot(residuals_1_to_n_minus_1, residuals_2_to_n)
cor(residuals_1_to_n_minus_1, residuals_2_to_n)
```
- __Independence Assumption__ - The Pearson's correlation between the first 599 residuals (all except the last) and the last 599 residuals (all except the first) is low when the row order of the data is randomised, therefore the assumption of indepentent residuals plausibly holds.
```{r}
qqnorm(resid(final_model))
qqline(resid(final_model), col = "red")
```
- __Normality Assumption - Residuals__ - The residuals mostly fit well with the theoretical quantiles of a normal distribution with only a slight deviation at each tail, therefore the assumption of normally distributed residuals plausibly holds.
```{r}
qqnorm(ranef(final_model)$Hospital[, 1])
qqline(ranef(final_model)$Hospital[, 1], col = "red")
```
- __Normality Assumption - Random Effects - Hospitals__ - The random effects for Hospitals mostly fit well with the theoretical quantiles of a normal distribution with a small deviation at each tail, therefore the assumption of normally distributed random effects plausibly holds.
```{r}
qqnorm(ranef(final_model)$`Hospital:Nurse`[, 1])
qqline(ranef(final_model)$`Hospital:Nurse`[, 1], col = "red")
```
- __Normality Assumption - Random Effects - Nurses within Hospitals__ - The random effects for Nurses within Hospitals mostly fit well with the theoretical quantiles of a normal distribution with a small deviation at each tail, therefore the assumption of normally distributed random effects plausibly holds.
__Final Model__
```{r}
summary(final_model)
# Calculate the confidence interval for the average treatment effect at Time 0
# given the interaction between Treatment and Time
t0 = emmeans(final_model, specs = ~ Treatment | Time, at = list(Time = 0))
t0_estimates = confint(contrast(t0, method = "trt.vs.ctrl", ref = 1))
t0_mean_treatment_effect = sprintf("%.2f", round(t0_estimates$estimate, 2))
t0_lower_CI = sprintf("%.2f", round(t0_estimates$lower.CL, 2))
t0_upper_CI = sprintf("%.2f", round(t0_estimates$upper.CL, 2))
# Calculate the confidence interval for the average treatment effect at Time 1
# given the interaction between Treatment and Time
t1 = emmeans(final_model, specs = ~ Treatment | Time, at = list(Time = 1))
t1_estimates = confint(contrast(t1, method = "trt.vs.ctrl", ref = 1))
t1_mean_treatment_effect = sprintf("%.2f", round(t1_estimates$estimate, 2))
t1_lower_CI = sprintf("%.2f", round(t1_estimates$lower.CL, 2))
t1_upper_CI = sprintf("%.2f", round(t1_estimates$upper.CL, 2))
# Calculate the confidence interval for the average treatment effect at Time 2
# given the interaction between Treatment and Time
t2 = emmeans(final_model, specs = ~ Treatment | Time, at = list(Time = 2))
t2_estimates = confint(contrast(t2, method = "trt.vs.ctrl", ref = 1))
t2_mean_treatment_effect = sprintf("%.2f", round(t2_estimates$estimate, 2))
t2_lower_CI = sprintf("%.2f", round(t2_estimates$lower.CL, 2))
t2_upper_CI = sprintf("%.2f", round(t2_estimates$upper.CL, 2))
# Calculate the confidence interval for the average treatment effect at Time 3
# given the interaction between Treatment and Time
t3 = emmeans(final_model, specs = ~ Treatment | Time, at = list(Time = 3))
t3_estimates = confint(contrast(t3, method = "trt.vs.ctrl", ref = 1))
t3_mean_treatment_effect = sprintf("%.2f", round(t3_estimates$estimate, 2))
t3_lower_CI = sprintf("%.2f", round(t3_estimates$lower.CL, 2))
t3_upper_CI = sprintf("%.2f", round(t3_estimates$upper.CL, 2))
```
The mean treatment effect is estimated to be -5.41 at Time 0, meaning that Stress Scores are estimated to be on average 5.41 points lower for Nurses who have received the treatment than those who have not, immediately after treatment. Given that the data does not cover Stress Scores at Time 0, this estimate is an extrapolation and should be used with caution. The estimated 95% confidence interval ranges from -6.22 to -4.59 meaning that we are 95% confident that the true effect of the intervention on Stress Scores lies within this range, after controlling for Time, the interaction between Treatment and Time, Experience, Gender, Hospital Size, random effects by Hospital and by Nurses within Hospitals. The treatment effect is statistically significant with a p value of < 0.0001. The below table details the estimated treatment effect at each Time given the interaction between Treatment and Time.
| Time (Months Post Treatment) | Time 0 | Time 1 | Time 2 | Time 3 |
|:-------|-------:|-------:|-------:|-------:|
| Average Treatment Effect | `r t0_mean_treatment_effect` | `r t1_mean_treatment_effect` | `r t2_mean_treatment_effect` | `r t3_mean_treatment_effect` |
| Confidence Interval Upper Bound | `r t0_lower_CI` | `r t1_lower_CI` | `r t2_lower_CI` | `r t3_lower_CI` |
| Confidence Interval Lower Bound | `r t0_upper_CI` | `r t1_upper_CI` | `r t2_upper_CI` | `r t3_upper_CI` |
| p-value | < 0.0001 | < 0.0001 | < 0.0001 | < 0.0001 |
The interaction with Time on the mean treatment effect is estimated to be 0.93, meaning that for each unit increase in Time, the mean treatment effect is estimated to decrease by 0.93, i.e. Stress Scores are estimated to increase by 0.93 on average every month which passes post treatment for Nurses in the treatment group. The estimated mean effect of Time on Stress Scores is not statistically significant, with Time included in modelling because of the statistically significant Treatment Time interaction.
```{r echo = FALSE, fig.height = 4}
# mst_data = mst_data %>%
# dplyr::select(-Prediction)
mst_data$Prediction = predict(final_model)
# Subset to only include hospital 9 data
mst_data_visualisation_subset = mst_data %>%
dplyr::filter(Hospital == "9")
intervention_group_names = c("1" = "Treatment Group", "0" = "Control Group")
ggplot(mst_data_visualisation_subset, aes(Time, Response)) +
geom_line(aes(y = Prediction, group = Nurse, colour = Nurse)) +
geom_point(aes(Time, Response, colour = Nurse), size = 1.5, alpha = 0.5, position = position_jitter(width = 0.1, height = 0.25)) +
geom_smooth(formula = y ~ x, method = "lm", color = "red", se = FALSE, alpha = 0.5) +
facet_grid(~ Treatment, labeller = as_labeller(intervention_group_names)) +
scale_x_continuous(breaks = breaks_width(0.5)) +
scale_y_continuous(breaks = breaks_width(1)) +
labs(x = "Time (Months Post Treatment)", y = "Stress Score", title = "Final Model - Hospital 9") +
theme(panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.5), axis.text = element_text(color = "black", size = 10),
axis.title = element_text(size = 12), plot.title = element_text(size = 12, face = "bold"), legend.position = "none")
```
```{r echo = FALSE, fig.height = 4}
# Subset to only include hospital 16 data
mst_data_visualisation_subset = mst_data %>%
dplyr::filter(Hospital == "16")
intervention_group_names = c("1" = "Treatment Group", "0" = "Control Group")
ggplot(mst_data_visualisation_subset, aes(Time, Response)) +
geom_line(aes(y = Prediction, group = Nurse, colour = Nurse)) +
geom_point(aes(Time, Response, colour = Nurse), size = 1.5, alpha = 0.5, position = position_jitter(width = 0.1, height = 0.25)) +
geom_smooth(formula = y ~ x, method = "lm", color = "red", se = FALSE, alpha = 0.5) +
facet_grid(~ Treatment, labeller = as_labeller(intervention_group_names)) +
scale_x_continuous(breaks = breaks_width(0.5)) +
scale_y_continuous(breaks = breaks_width(1)) +
labs(x = "Time (Months Post Treatment)", y = "Stress Score", title = "Final Model - Hospital 16") +
theme(panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.5), axis.text = element_text(color = "black", size = 10),
axis.title = element_text(size = 12), plot.title = element_text(size = 12, face = "bold"), legend.position = "none")
```
The above figures visualise the random effects of Nurses within Hospitals 9 and 16, with red lines indicating naive models fit only on the data shown without considering the hierarchical data structure. In the treatment groups Stress Scores are lower and increase with Time in contrast to the control groups. Hospital 9 is small and Hospital 16 is big and the overall increased Stress Scores in big Hospitals can be seen between the two figures.
## Discussion
__Research Question__
In regards to the research question: Does the stress reduction treatment have a significant impact on Nurse Stress Scores and does this impact vary over Time?, the estimated mean intervention effect is -5.41 at Time 0, meaning that Stress Scores are estimated to be on average 5.41 points lower for Nurses who have received the treatment than those who have not, immediately after treatment. This effect is statistically significant. Regarding the impact over time, for every month which passes, the mean intervention effect is estimated to decrease by 0.93 points, meaning Stress Scores are estimated to increase by 0.93 on average every month which passes post treatment for Nurses in the treatment group. This effect is statistically significant. Both the impact of the treatment and the impact of the treatment over time can be be seen in the two example figures above and the two tables below.
| Time (Months Post Treatment) | Time 0 | Time 1 | Time 2 | Time 3 |
|:-------|-------:|-------:|-------:|-------:|
| Average Treatment Effect | `r t0_mean_treatment_effect` | `r t1_mean_treatment_effect` | `r t2_mean_treatment_effect` | `r t3_mean_treatment_effect` |
| Confidence Interval Upper Bound | `r t0_lower_CI` | `r t1_lower_CI` | `r t2_lower_CI` | `r t3_lower_CI` |
| Confidence Interval Lower Bound | `r t0_upper_CI` | `r t1_upper_CI` | `r t2_upper_CI` | `r t3_upper_CI` |
| p-value | < 0.0001 | < 0.0001 | < 0.0001 | < 0.0001 |
| Group | Time | Mean Stress Score |
|:-------|--------:|-------:|
| Control | 1 | 41.2 |
| Control | 2 | 41.6 |
| Control | 3 | 41.4 |
| Treatment | 1 | 36.9 |
| Treatment | 2 | 38.2 |
| Treatment | 3 | 38.9 |
The estimated mean treatment effects are relatively large as a proportion of mean Stress Scores initially, however the estimated mean treatment effect reduces quickly over time. A full Cost-Benefit Analysis will need to be undertaken to assess whether the cost of the treatment is worth the short-term Stress Score reduction, given that treatment might need to be repeated frequently to reduce Stress Scores in the long-term.
__Fixed Effects__
The fixed effect of Experience is estimated to be -0.11, meaning that for every year of Experience a Nurse has, the estimated mean Stress Score decreases by 0.11.
The fixed effect of Gender is estimated to be 2.09, meaning that the estimated mean Stress Score increases by 2.09 for female Nurses when compared to male Nurses.
The fixed effect of Hospital Size is estimated to be 1.98, meaning that the estimated mean Stress Score increases by 1.98 for Nurses in large Hospitals when compared to small Hospitals.
__Limitations__
There are several limitations to the study, including:
- It is questionable whether it is practical or ethical to administer the treatment to individual Nurses in Hospitals but not others.
- It is not clear how the initial 20 Hospitals were selected from the wider population of Hospitals in the country so there may be selection into experiment bias present.
- The study is only as reliable as the measurement technique used to capture Nurse Stress Scores, for which the reliability is unknown.
## Word Count
```{r warning = FALSE, include = FALSE}
# Minus table words cause they are figures
n_words = paste0("Word Count: ", as.character(word_count() - 195))
```
```{r echo = FALSE}
print(n_words)
```