-
Notifications
You must be signed in to change notification settings - Fork 191
/
MixedModeling-GKHajduk-script.R
146 lines (78 loc) · 4.01 KB
/
MixedModeling-GKHajduk-script.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
######################################
# #
# Mixed effects modeling in R #
# #
######################################
## authors: Gabriela K Hajduk, based on workshop developed by Liam Bailey
## contact details: gkhajduk.github.io; email: [email protected]
## date: 2017-03-09
##
###---- Explore the data -----###
## load the data and have a look at it
load("dragons.RData")
head(dragons)
## Let's say we want to know how the body length affects test scores.
## Have a look at the data distribution:
hist(dragons$testScore) # seems close to normal distribution - good!
## It is good practice to standardise your explanatory variables before proceeding - you can use scale() to do that:
dragons$bodyLength2 <- scale(dragons$bodyLength)
## Back to our question: is test score affected by body length?
###---- Fit all data in one analysis -----###
## One way to analyse this data would be to try fitting a linear model to all our data, ignoring the sites and the mountain ranges for now.
library(lme4)
library(dplyr)
basic.lm <- lm(testScore ~ bodyLength2, data = dragons)
summary(basic.lm)
## Let's plot the data with ggplot2
library(ggplot2)
ggplot(dragons, aes(x = bodyLength, y = testScore)) +
geom_point()+
geom_smooth(method = "lm")
### Assumptions?
## Plot the residuals - the red line should be close to being flat, like the dashed grey line
plot(basic.lm, which = 1) # not perfect, but look alright
## Have a quick look at the qqplot too - point should ideally fall onto the diagonal dashed line
plot(basic.lm, which = 2) # a bit off at the extremes, but that's often the case; again doesn't look too bad
## However, what about observation independence? Are our data independent?
## We collected multiple samples from eight mountain ranges
## It's perfectly plausible that the data from within each mountain range are more similar to each other than the data from different mountain ranges - they are correlated. Pseudoreplication isn't our friend.
## Have a look at the data to see if above is true
boxplot(testScore ~ mountainRange, data = dragons) # certainly looks like something is going on here
## We could also plot it colouring points by mountain range
ggplot(dragons, aes(x = bodyLength, y = testScore, colour = mountainRange))+
geom_point(size = 2)+
theme_classic()+
theme(legend.position = "none")
## From the above plots it looks like our mountain ranges vary both in the dragon body length and in their test scores. This confirms that our observations from within each of the ranges aren't independent. We can't ignore that.
## So what do we do?
###----- Run multiple analyses -----###
## We could run many separate analyses and fit a regression for each of the mountain ranges.
## Lets have a quick look at the data split by mountain range
## We use the facet_wrap to do that
ggplot(aes(bodyLength, testScore), data = dragons) + geom_point() +
facet_wrap(~ mountainRange) +
xlab("length") + ylab("test score")
##----- Modify the model -----###
## We want to use all the data, but account for the data coming from different mountain ranges
## let's add mountain range as a fixed effect to our basic.lm
mountain.lm <- lm(testScore ~ bodyLength2 + mountainRange, data = dragons)
summary(mountain.lm)
## now body length is not significant
###----- Mixed effects models -----###
##----- First mixed model -----##
### model
### plots
### summary
### variance accounted for by mountain ranges
##-- implicit vs explicit nesting --##
head(dragons) # we have site and mountainRange
str(dragons) # we took samples from three sites per mountain range and eight mountain ranges in total
### create new "sample" variable
##----- Second mixed model -----##
### model
### summary
### plot
##----- Model selection for the keen -----##
### full model
### reduced model
### comparison