-
-
Notifications
You must be signed in to change notification settings - Fork 55
/
README.Rmd
398 lines (275 loc) · 20.6 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
---
output: github_document
bibliography: paper/paper.bib
csl: vignettes/apa.csl
---
# bayestestR <img src='man/figures/logo.png' align="right" height="139" />
```{r, echo = FALSE, warning=FALSE, message=FALSE}
library(ggplot2)
library(bayestestR)
options(digits = 2)
figheight <- 6
figwidth <- 6 * 1.618034
knitr::opts_chunk$set(
collapse = TRUE,
dpi = 300,
fig.height = figheight,
fig.width = figwidth,
fig.path = "man/figures/"
)
```
[![DOI](https://joss.theoj.org/papers/10.21105/joss.01541/status.svg)](https://doi.org/10.21105/joss.01541) [![status](https://tinyverse.netlify.com/badge/bayestestR)](https://CRAN.R-project.org/package=bayestestR) [![lifecycle](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://lifecycle.r-lib.org/articles/stages.html)
***Become a Bayesian master you will***
---
:warning: We changed the default the CI width! Please make an [informed decision](https://easystats.github.io/bayestestR/articles/credible_interval.html) and set it explicitly (`ci = 0.89`, `ci = 0.95` or anything else that you decide) :warning:
---
Existing R packages allow users to easily fit a large variety of models and extract and visualize the posterior draws. However, most of these packages only return a limited set of indices (e.g., point-estimates and CIs). **bayestestR** provides a comprehensive and consistent set of functions to analyze and describe posterior distributions generated by a variety of models objects, including popular modeling packages such as **rstanarm**, **brms** or **BayesFactor**.
You can reference the package and its documentation as follows:
- Makowski, D., Ben-Shachar, M. S., \& Lüdecke, D. (2019). *bayestestR: Describing Effects and their Uncertainty, Existence and Significance within the Bayesian Framework*. Journal of Open Source Software, 4(40), 1541. [10.21105/joss.01541](https://doi.org/10.21105/joss.01541)
- Makowski, D., Ben-Shachar, M. S., Chen, S. H. A., \& Lüdecke, D. (2019). *Indices of Effect Existence and Significance in the Bayesian Framework*. Frontiers in Psychology 2019;10:2767. [10.3389/fpsyg.2019.02767](https://doi.org/10.3389/fpsyg.2019.02767)
## Installation
[![CRAN](http://www.r-pkg.org/badges/version/bayestestR)](https://cran.r-project.org/package=bayestestR) [![insight status badge](https://easystats.r-universe.dev/badges/bayestestR)](https://easystats.r-universe.dev) [![R-CMD-check](https://github.com/easystats/bayestestR/workflows/R-CMD-check/badge.svg?branch=main)](https://github.com/easystats/bayestestR/actions)
The *bayestestR* package is available on CRAN, while its latest development version is available on R-universe (from _rOpenSci_).
Type | Source | Command
---|---|---
Release | CRAN | `install.packages("bayestestR")`
Development | R-universe | `install.packages("bayestestR", repos = "https://easystats.r-universe.dev")`
Once you have downloaded the package, you can then load it using:
```{r, eval=FALSE}
library("bayestestR")
```
> **Tip**
>
> **Instead of `library(bayestestR)`, use `library(easystats)`.**
> **This will make all features of the easystats-ecosystem available.**
>
> **To stay updated, use `easystats::install_latest()`.**
## Documentation
Access the package [documentation](https://easystats.github.io/bayestestR/) and check-out these vignettes:
### Tutorials
- [Get Started with Bayesian Analysis](https://easystats.github.io/bayestestR/articles/bayestestR.html)
- [Example 1: Initiation to Bayesian models](https://easystats.github.io/bayestestR/articles/example1.html)
- [Example 2: Confirmation of Bayesian skills](https://easystats.github.io/bayestestR/articles/example2.html)
- [Example 3: Become a Bayesian master](https://easystats.github.io/bayestestR/articles/example3.html)
### Articles
- [Credible Intervals (CI)](https://easystats.github.io/bayestestR/articles/credible_interval.html)
- [Probability of Direction (pd)](https://easystats.github.io/bayestestR/articles/probability_of_direction.html)
- [Region of Practical Equivalence (ROPE)](https://easystats.github.io/bayestestR/articles/region_of_practical_equivalence.html)
- [Bayes Factors (BF)](https://easystats.github.io/bayestestR/articles/bayes_factors.html)
- [Comparison of Point-Estimates](https://easystats.github.io/bayestestR/articles/web_only/indicesEstimationComparison.html)
- [Comparison of Indices of Effect Existence](https://doi.org/10.3389/fpsyg.2019.02767)
- [Reporting Guidelines](https://easystats.github.io/bayestestR/articles/guidelines.html)
# Features
In the Bayesian framework, parameters are estimated in a probabilistic fashion as *distributions*. These distributions can be summarised and described by reporting four types of indices:
- [**Centrality**](https://easystats.github.io/bayestestR/articles/web_only/indicesEstimationComparison.html)
- `mean()`, `median()` or [`map_estimate()`](https://easystats.github.io/bayestestR/reference/map_estimate.html) for an estimation of the mode.
- [`point_estimate()`](https://easystats.github.io/bayestestR/reference/point_estimate.html) can be used to get them at once and can be run directly on models.
- [**Uncertainty**](https://easystats.github.io/bayestestR/articles/credible_interval.html)
- [`hdi()`](https://easystats.github.io/bayestestR/reference/hdi.html) for *Highest Density Intervals (HDI)*, [`spi()`](https://easystats.github.io/bayestestR/reference/spi.html) for *Shortest Probability Intervals (SPI)* or [`eti()`](https://easystats.github.io/bayestestR/reference/eti.html) for *Equal-Tailed Intervals (ETI)*.
- [`ci()`](https://easystats.github.io/bayestestR/reference/ci.html) can be used as a general method for Confidence and Credible Intervals (CI).
- [**Effect Existence**](https://easystats.github.io/bayestestR/articles/indicesExistenceComparison.html): whether an effect is different from 0.
- [`p_direction()`](https://easystats.github.io/bayestestR/reference/p_direction.html) for a Bayesian equivalent of the frequentist *p*-value (see [Makowski et al., 2019](https://doi.org/10.3389/fpsyg.2019.02767))
- [`p_pointnull()`](https://easystats.github.io/bayestestR/reference/p_map.html) represents the odds of null hypothesis (*h0 = 0*) compared to the most likely hypothesis (the MAP).
- [`bf_pointnull()`](https://easystats.github.io/bayestestR/reference/bayesfactor_parameters.html) for a classic *Bayes Factor (BF)* assessing the likelihood of effect presence against its absence (*h0 = 0*).
- [**Effect Significance**](https://easystats.github.io/bayestestR/articles/indicesExistenceComparison.html): whether the effect size can be considered as non-negligible.
- [`p_rope()`](https://easystats.github.io/bayestestR/reference/p_rope.html) is the probability of the effect falling inside a [*Region of Practical Equivalence (ROPE)*](https://easystats.github.io/bayestestR/articles/region_of_practical_equivalence.html).
- [`bf_rope()`](https://easystats.github.io/bayestestR/reference/bayesfactor_parameters.html) computes a Bayes factor against the null as defined by a region (the ROPE).
- [`p_significance()`](https://easystats.github.io/bayestestR/reference/p_significance.html) that combines a region of equivalence with the probability of direction.
[`describe_posterior()`](https://easystats.github.io/bayestestR/reference/describe_posterior.html) is the master function with which you can compute all of the indices cited below at once.
```{r message=FALSE, warning=FALSE}
describe_posterior(
rnorm(10000),
centrality = "median",
test = c("p_direction", "p_significance"),
verbose = FALSE
)
```
`describe_posterior()` works for many objects, including more complex *brmsfit*-models. For better readability, the output is separated by model components:
```{r message=FALSE, warning=FALSE, eval=FALSE}
zinb <- read.csv("http://stats.idre.ucla.edu/stat/data/fish.csv")
set.seed(123)
model <- brm(
bf(
count ~ child + camper + (1 | persons),
zi ~ child + camper + (1 | persons)
),
data = zinb,
family = zero_inflated_poisson(),
chains = 1,
iter = 500
)
describe_posterior(
model,
effects = "all",
component = "all",
test = c("p_direction", "p_significance"),
centrality = "all"
)
```
```{r message=FALSE, warning=FALSE, echo=FALSE}
model <- insight::download_model("brms_zi_3")
describe_posterior(
model,
effects = "all",
component = "all",
test = c("p_direction", "p_significance"),
centrality = "all",
verbose = FALSE
)
```
*bayestestR* also includes [**many other features**](https://easystats.github.io/bayestestR/reference/index.html) useful for your Bayesian analyses. Here are some more examples:
## Point-estimates
```{r message=FALSE, warning=FALSE}
library(bayestestR)
posterior <- distribution_gamma(10000, 1.5) # Generate a skewed distribution
centrality <- point_estimate(posterior) # Get indices of centrality
centrality
```
As for other [**easystats**](https://github.com/easystats) packages, `plot()` methods are available from the [**see**](https://easystats.github.io/see/) package for many functions:
```{r message=FALSE, warning=FALSE, include=FALSE}
library(see)
plot(centrality)
```
```{r message=FALSE, warning=FALSE, echo=FALSE}
library(ggplot2)
library(see)
plot(centrality) +
xlab("nParameter Value") +
theme(axis.text.y = element_blank()) +
scale_x_continuous(limits = c(NA, 8)) +
see::theme_modern()
```
While the **median** and the **mean** are available through base R functions, [`map_estimate()`](https://easystats.github.io/bayestestR/reference/map_estimate.html) in *bayestestR* can be used to directly find the **Highest Maximum A Posteriori (MAP)** estimate of a posterior, *i.e.*, the value associated with the highest probability density (the "peak" of the posterior distribution). In other words, it is an estimation of the *mode* for continuous parameters.
## Uncertainty (CI)
[`hdi()`](https://easystats.github.io/bayestestR/reference/hdi.html) computes the **Highest Density Interval (HDI)** of a posterior distribution, i.e., the interval which contains all points within the interval have a higher probability density than points outside the interval. The HDI can be used in the context of Bayesian posterior characterization as **Credible Interval (CI)**.
Unlike equal-tailed intervals (see [`eti()`](https://easystats.github.io/bayestestR/reference/eti.html)) that typically exclude 2.5\% from each tail of the distribution, the HDI is *not* equal-tailed and therefore always includes the mode(s) of posterior distributions.
```{r message=FALSE, warning=FALSE}
posterior <- distribution_chisquared(10000, 4)
hdi(posterior, ci = 0.89)
eti(posterior, ci = 0.89)
```
```{r message=FALSE, warning=FALSE, echo=FALSE}
posterior <- distribution_chisquared(10000, 4)
dat <- as.data.frame(density(posterior))
dat1 <- dat
dat1$fill <- ifelse(dat1$x < hdi(posterior)$CI_low, "low",
ifelse(dat1$x > hdi(posterior)$CI_high, "high", "middle")
)
p1 <- ggplot(dat1, aes(x = x, y = y, fill = fill)) +
geom_ribbon(aes(ymin = 0, ymax = y)) +
geom_vline(xintercept = 0, linetype = "dotted") +
theme_classic(base_size = 12) +
scale_y_continuous(expand = c(0, 0), limits = c(0, 0.25)) +
scale_fill_manual(values = c("high" = "#FFC107", "low" = "#FFC107", "middle" = "#E91E63"), guide = "none") +
annotate("text", x = 2.5, y = 0.05, label = "The 95% HDI", color = "white", size = 5) +
xlab(NULL) +
ylab("Probability Density\n")
# ggsave("paper/Figure2.png", width = 13, height = 8, units = "in", dpi = 300)
dat2 <- dat
dat2$fill <- ifelse(dat2$x < ci(posterior)$CI_low, "low",
ifelse(dat2$x > ci(posterior)$CI_high, "high", "middle")
)
p2 <- ggplot(dat2, aes(x = x, y = y, fill = fill)) +
geom_ribbon(aes(ymin = 0, ymax = y)) +
geom_vline(xintercept = 0, linetype = "dotted") +
theme_classic(base_size = 12) +
scale_y_continuous(expand = c(0, 0), limits = c(0, 0.25)) +
scale_fill_manual(values = c("high" = "#FFC107", "low" = "#FFC107", "middle" = "#2196F3"), guide = "none") +
annotate("text", x = 2.9, y = 0.05, label = "The 95% ETI", color = "white", size = 5) +
xlab("\nParameter Value") +
ylab("Probability Density\n")
see::plots(p1, p2, tags = TRUE)
```
## Existence and Significance Testing
### Probability of Direction (*pd*)
[`p_direction()`](https://easystats.github.io/bayestestR/reference/p_direction.html) computes the *Probability of Direction* (*p*d, also known as the Maximum Probability of Effect - *MPE*). It varies between 50\% and 100\% (*i.e.*, `0.5` and `1`) and can be interpreted as the probability (expressed in percentage) that a parameter (described by its posterior distribution) is strictly positive or negative (whichever is the most probable). It is mathematically defined as the proportion of the posterior distribution that is of the median's sign. Although differently expressed, this index is fairly similar (*i.e.*, is strongly correlated) to the frequentist *p*-value.
**Relationship with the p-value**: In most cases, it seems that the *pd* corresponds to the frequentist one-sided *p*-value through the formula `p-value = (1-pd/100)` and to the two-sided *p*-value (the most commonly reported) through the formula `p-value = 2*(1-pd/100)`. Thus, a `pd` of `95%`, `97.5%` `99.5%` and `99.95%` corresponds approximately to a two-sided *p*-value of respectively `.1`, `.05`, `.01` and `.001`. See the [*reporting guidelines*](https://easystats.github.io/bayestestR/articles/guidelines.html).
```{r message=FALSE, warning=FALSE}
posterior <- distribution_normal(10000, 0.4, 0.2)
p_direction(posterior)
```
```{r message=FALSE, warning=FALSE, echo=FALSE}
posterior <- distribution_normal(10000, 0.4, 0.2)
dat3 <- as.data.frame(density(posterior))
dat3$fill <- ifelse(dat3$x < 0, "low", "high")
ggplot(dat3, aes(x = x, y = y, fill = fill)) +
geom_ribbon(aes(ymin = 0, ymax = y)) +
geom_vline(xintercept = 0, linetype = "dotted") +
theme_classic(base_size = 12) +
scale_y_continuous(expand = c(0, 0), limits = c(0, 2)) +
scale_fill_manual(values = c("high" = "#FFC107", "low" = "#E91E63"), guide = "none") +
xlab("\nParameter Value") +
ylab("Probability Density\n")
```
### ROPE
[`rope()`](https://easystats.github.io/bayestestR/reference/rope.html) computes the proportion (in percentage) of the HDI (default to the 89\% HDI) of a posterior distribution that lies within a region of practical equivalence.
Statistically, the probability of a posterior distribution of being different from 0 does not make much sense (the probability of it being different from a single point being infinite). Therefore, the idea underlining ROPE is to let the user define an area around the null value enclosing values that are *equivalent to the null* value for practical purposes [@kruschke2018bayesian, @kruschke2018rejecting].
Kruschke suggests that such null value could be set, by default, to the -0.1 to 0.1 range of a standardized parameter (negligible effect size according to Cohen, 1988). This could be generalized: For instance, for linear models, the ROPE could be set as `0 +/- .1 * sd(y)`. This ROPE range can be automatically computed for models using the [rope_range](https://easystats.github.io/bayestestR/reference/rope_range.html) function.
Kruschke suggests using the proportion of the 95\% (or 90\%, considered more stable) HDI that falls within the ROPE as an index for "null-hypothesis" testing (as understood under the Bayesian framework, see [equivalence_test](https://easystats.github.io/bayestestR/reference/equivalence_test.html)).
```{r message=FALSE, warning=FALSE}
posterior <- distribution_normal(10000, 0.4, 0.2)
rope(posterior, range = c(-0.1, 0.1))
```
```{r message=FALSE, warning=FALSE, echo=FALSE}
posterior <- distribution_normal(10000, 0.4, 0.2)
dat4 <- as.data.frame(density(posterior))
dat4$fill <- ifelse(dat4$x < -0.1, "low",
ifelse(dat4$x > 0.1, "high", "middle")
)
ggplot(dat4, aes(x = x, y = y, fill = fill)) +
geom_ribbon(aes(ymin = 0, ymax = y)) +
geom_vline(xintercept = 0, linetype = "dotted") +
theme_classic(base_size = 12) +
scale_y_continuous(expand = c(0, 0), limits = c(0, 2)) +
scale_x_continuous(breaks = c(-0.1, 0.1, 0.4)) +
scale_fill_manual(values = c("high" = "#FFC107", "low" = "#FFC107", "middle" = "#E91E63"), guide = "none") +
xlab("\nParameter Value") +
ylab("Probability Density\n")
```
### Bayes Factor
[`bayesfactor_parameters()`](https://easystats.github.io/bayestestR/reference/bayesfactor_parameters.html) computes Bayes factors against the null (either a point or an interval), bases on prior and posterior samples of a single parameter. This Bayes factor indicates the degree by which the mass of the posterior distribution has shifted further away from or closer to the null value(s) (relative to the prior distribution), thus indicating if the null value has become less or more likely given the observed data.
When the null is an interval, the Bayes factor is computed by comparing the prior and posterior odds of the parameter falling within or outside the null; When the null is a point, a Savage-Dickey density ratio is computed, which is also an approximation of a Bayes factor comparing the marginal likelihoods of the model against a model in which the tested parameter has been restricted to the point null [@wagenmakers2010bayesian].
```{r message=FALSE, warning=FALSE}
prior <- distribution_normal(10000, mean = 0, sd = 1)
posterior <- distribution_normal(10000, mean = 1, sd = 0.7)
bayesfactor_parameters(posterior, prior, direction = "two-sided", null = 0, verbose = FALSE)
```
```{r message=FALSE, warning=FALSE, echo=FALSE}
posterior <- as.data.frame(density(distribution_normal(10000, mean = 1, sd = 0.7)))
prior <- as.data.frame(density(distribution_normal(10000, mean = 0, sd = 1)))
ggplot(posterior, aes(x = x, y = y)) +
geom_ribbon(aes(ymin = 0, ymax = y), fill = "#FFC107") +
geom_line(data = prior, linewidth = 1, linetype = "dotted") +
geom_segment(x = 0, xend = 0, y = 0, yend = max(prior$y), color = "#2196F3", linewidth = 1) +
geom_point(x = 0, y = max(prior$y), color = "#2196F3", size = 5) +
geom_segment(x = 0, xend = 0, y = 0, yend = density_at(posterior$x, 0, bw = "nrd0"), color = "#E91E63", linewidth = 1) +
geom_point(x = 0, y = density_at(posterior$x, 0, bw = "nrd0"), color = "#E91E63", size = 5) +
theme_classic(base_size = 12) +
scale_y_continuous(expand = c(0, 0), limits = c(0, 0.65)) +
scale_fill_manual(values = c("high" = "#FFC107", "low" = "#E91E63"), guide = "none") +
xlab("\nParameter Value") +
ylab("Probability Density\n")
```
<sup>*The lollipops represent the density of a point-null on the prior distribution (the blue lollipop on the dotted distribution) and on the posterior distribution (the red lollipop on the yellow distribution). The ratio between the two - the Savage-Dickey ratio - indicates the degree by which the mass of the parameter distribution has shifted away from or closer to the null.*</sup>
For more info, see [the Bayes factors vignette](https://easystats.github.io/bayestestR/articles/bayes_factors.html).
## Utilities
### Find ROPE's appropriate range
[`rope_range()`](https://easystats.github.io/bayestestR/reference/rope_range.html): This function attempts at automatically finding suitable "default" values for the Region Of Practical Equivalence (ROPE). Kruschke (2018) suggests that such null value could be set, by default, to a range from `-0.1` to `0.1` of a standardized parameter (negligible effect size according to Cohen, 1988), which can be generalised for linear models to `-0.1 * sd(y), 0.1 * sd(y)`. For logistic models, the parameters expressed in log odds ratio can be converted to standardized difference through the formula `sqrt(3)/pi`, resulting in a range of `-0.05` to `0.05`.
```{r message=FALSE, warning=FALSE, results='hide', eval=FALSE}
rope_range(model)
```
### Density Estimation
[`estimate_density()`](https://easystats.github.io/bayestestR/reference/estimate_density.html): This function is a wrapper over different methods of density estimation. By default, it uses the base R `density` with by default uses a different smoothing bandwidth (`"SJ"`) from the legacy default implemented the base R `density` function (`"nrd0"`). However, Deng \& Wickham suggest that `method = "KernSmooth"` is the fastest and the most accurate.
### Perfect Distributions
[`distribution()`](https://easystats.github.io/bayestestR/reference/distribution.html): Generate a sample of size n with near-perfect distributions.
```{r message=FALSE, warning=FALSE}
distribution(n = 10)
```
### Probability of a Value
[`density_at()`](https://easystats.github.io/bayestestR/reference/density_at.html): Compute the density of a given point of a distribution.
```{r message=FALSE, warning=FALSE}
density_at(rnorm(1000, 1, 1), 1)
```
## Code of Conduct
Please note that the bayestestR project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/1/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.
# References