diff --git a/demos_rstan/brms_demo.html b/demos_rstan/brms_demo.html index 0cadc01..be11404 100644 --- a/demos_rstan/brms_demo.html +++ b/demos_rstan/brms_demo.html @@ -1615,7 +1615,7 @@

Bayesian data analysis - BRMS demos

Aki Vehtari

-

First version 2023-12-05. Last modified 2023-12-05.

+

First version 2023-12-05. Last modified 2024-01-03.

@@ -1625,11 +1625,15 @@

Setup

Load packages

library(tidyr)
 library(dplyr)
+library(tibble)
+library(pillar)
+library(stringr)
 library(brms)
 options(brms.backend = "cmdstanr", mc.cores = 2)
 library(posterior)
 options(pillar.negative = FALSE)
 library(loo)
+library(priorsense)
 library(ggplot2)
 library(bayesplot)
 theme_set(bayesplot::theme_default(base_family = "sans"))
@@ -1647,9 +1651,27 @@ 

1 Introduction

2 Bernoulli model

Toy data with sequence of failures (0) and successes (1). We would like to learn about the unknown probability of success.

data_bern <- data.frame(y = c(1, 1, 1, 0, 1, 1, 1, 0, 1, 0))
-

brms uses by default student_t(3, 0, 2.5), bu we can assign uniform prior (beta(1,1)).

+

As usual in case of generalizd linear models, (GLMs) brms defines the priors on the latent model parameters. With Bernoulli the default link function is logit, and thus the prior is set on logit(theta). As there are no covariates logit(theta)=Intercept. The brms default prior for Intercept is student_t(3, 0, 2.5), but we use student_t(7, 0, 1.5) which is close to logistic distribution, and thus makes the prior near-uniform for theta. We can simulate from these priors to check the implied prior on theta. We next compare the result to using normal(0, 1) prior on logit probability. We visualize the implied priors by sampling from the priors.

+
data.frame(theta = plogis(ggdist::rstudent_t(n=20000, df=3, mu=0, sigma=2.5))) |>
+  mcmc_hist() +
+  xlim(c(0,1)) +
+  labs(title='Default brms student_t(3, 0, 2.5) prior on Intercept')
+

+
data.frame(theta = plogis(ggdist::rstudent_t(n=20000, df=7, mu=0, sigma=1.5))) |>
+  mcmc_hist() +
+  xlim(c(0,1)) +
+  labs(title='student_t(7, 0, 1.5) prior on Intercept')
+

+

Almost uniform prior on theta could be obtained also with normal(0,1.5)

+
data.frame(theta = plogis(rnorm(n=20000, mean=0, sd=1.5))) |>
+  mcmc_hist() +
+  xlim(c(0,1)) +
+  labs(title='normal(0, 1.5) prior on Intercept')
+

+

Formula y ~ 1 corresponds to a model $() =

+
#\alpha\times 1 = \alpha$. `brms? denotes the $\alpha$ as `Intercept`.
fit_bern <- brm(y ~ 1, family = bernoulli(), data = data_bern,
-                prior = prior("", class='Intercept'),
+                prior = prior(student_t(7, 0, 1.5), class='Intercept'),
                 seed = SEED, refresh = 0)

Check the summary of the posterior and convergence

fit_bern
@@ -1662,7 +1684,7 @@

2 Bernoulli model

Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS -Intercept 0.95 0.73 -0.43 2.49 1.00 1624 1768 +Intercept 0.76 0.64 -0.43 2.09 1.00 1734 1726 Draws were sampled using sample(hmc). For each parameter, Bulk_ESS and Tail_ESS are effective sample size measures, and Rhat is the potential @@ -1676,7 +1698,7 @@

2 Bernoulli model

# A tibble: 1 × 10
   variable     mean median    sd   mad     q5   q95  rhat ess_bulk ess_tail
   <chr>       <dbl>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>    <dbl>
-1 b_Intercept 0.947  0.919 0.733 0.712 -0.202  2.18  1.00    1624.    1768.
+1 b_Intercept 0.763 0.746 0.641 0.636 -0.242 1.90 1.00 1734. 1726.

We can compute the probability of success by using plogis which is equal to inverse-logit function

draws <- draws |>
   mutate_variables(theta=plogis(b_Intercept))
@@ -1687,79 +1709,32 @@

2 Bernoulli model

# A tibble: 1 × 10
   variable  mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
   <chr>    <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
-1 theta    0.700  0.715 0.138 0.142 0.450 0.898  1.00    1624.    1768.
+1 theta 0.668 0.678 0.130 0.134 0.440 0.870 1.00 1734. 1726.

Histogram of theta

mcmc_hist(draws, pars='theta') +
   xlab('theta') +
   xlim(c(0,1))
-

-

We next compare the result to using normal(0, 1) prior on logit probability. Visualize the prior by drawing samples from it

-
prior_mean <- 0
-prior_sd <- 1
-prior_draws <- data.frame(
-                 theta = plogis(rnorm(20000, prior_mean, prior_sd)))
-mcmc_hist(prior_draws) +
-  xlim(c(0,1))
-

-
fit_bern <- brm(y ~ 1, family = bernoulli(), data = data_bern,
-                prior = prior(normal(0, 1), class='Intercept'),
-                seed = SEED, refresh = 0)
-

Check the summary of the posterior and convergence

-
fit_bern
-
 Family: bernoulli 
-  Links: mu = logit 
-Formula: y ~ 1 
-   Data: data_bern (Number of observations: 10) 
-  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
-         total post-warmup draws = 4000
-
-Population-Level Effects: 
-          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
-Intercept     0.61      0.52    -0.45     1.65 1.00     1688     2211
-
-Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
-and Tail_ESS are effective sample size measures, and Rhat is the potential
-scale reduction factor on split chains (at convergence, Rhat = 1).
-

We can examine the latent parameter

-
draws <- as_draws_df(fit_bern)
-draws |>
-  subset_draws(variable='b_Intercept') |>
-  summarise_draws()
-
# A tibble: 1 × 10
-  variable     mean median    sd   mad     q5   q95  rhat ess_bulk ess_tail
-  <chr>       <dbl>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>    <dbl>
-1 b_Intercept 0.605  0.600 0.525 0.501 -0.245  1.48  1.00    1688.    2211.
-

We can compute the probability of success by using plogis which is equal to inverse-logit function

-
draws <- draws |>
-  mutate_variables(theta=plogis(b_Intercept))
-

Summary of theta by using summarise_draws()

-
draws |>
-  subset_draws(variable='theta') |>
-  summarise_draws()
-
# A tibble: 1 × 10
-  variable  mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
-  <chr>    <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
-1 theta    0.639  0.646 0.115 0.115 0.439 0.815  1.00    1688.    2211.
-

Histogram of theta

-
mcmc_hist(draws, pars='theta') +
-  xlab('theta') +
-  xlim(c(0,1))
-

-

As the number of observations is small, there is small change in the posterior mean when the prior is changed. You can experiment with different priors and varying the number of observations.

+

+

Make prior sensitivity analysis by powerscaling both prior and likelihood. Focus on theta which is the quantity of interest.

+
theta <- draws |>
+  subset_draws(variable='theta')
+powerscale_sensitivity(fit_bern, prediction = \(x, ...) theta, num_args=list(digits=2)
+                       )$sensitivity |>
+                         filter(variable=='theta') |>
+                         mutate(across(where(is.double),  ~num(.x, digits=2)))
+
# A tibble: 1 × 4
+  variable     prior likelihood diagnosis
+  <chr>    <num:.2!>  <num:.2!> <chr>    
+1 theta         0.04       0.11 -        

3 Binomial model

Instead of sequence of 0’s and 1’s, we can summarize the data with the number of trials and the number successes and use Binomial model. The prior is specified in the ‘latent space’. The actual probability of success, theta = plogis(alpha), where plogis is the inverse of the logistic function.

-

Visualize the prior by drawing samples from it

-
prior_mean <- 0
-prior_sd <- 1
-prior_draws <- data.frame(theta = plogis(rnorm(20000, prior_mean, prior_sd)))
-mcmc_hist(prior_draws)
-

-

Binomial model with the same data

+

Binomial model with the same data and prior

data_bin <- data.frame(N = c(10), y = c(7))
+

Formula y | trials(N) ~ 1 corresponds to a model \(\mathrm{logit}(\theta) = \alpha\), and the number of trials for each observation is provided by | trials(N)

fit_bin <- brm(y | trials(N) ~ 1, family = binomial(), data = data_bin,
-               prior = prior(normal(0,1), class='Intercept'),
+               prior = prior(student_t(7, 0,1.5), class='Intercept'),
                seed = SEED, refresh = 0)

Check the summary of the posterior and convergence

fit_bin
@@ -1772,11 +1747,12 @@

3 Binomial model

Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS -Intercept 0.59 0.54 -0.50 1.64 1.00 1770 2034 +Intercept 0.77 0.64 -0.46 2.09 1.00 1660 1769 Draws were sampled using sample(hmc). For each parameter, Bulk_ESS and Tail_ESS are effective sample size measures, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). +

The diagnostic indicates prior-data conflict, that is, both prior and likelihood are informative. If there is true strong prior information that would justify the normal(0,1) prior, then this is fine, but otherwise more thinking is required (goal is not adjust prior to remove diagnostic warnings withoyt thinking). In this toy example, we proceed with this prior.

Extract the posterior draws

draws <- as_draws_df(fit_bin)

We can get summary information using summarise_draws()

@@ -1786,7 +1762,7 @@

3 Binomial model

# A tibble: 1 × 10
   variable     mean median    sd   mad     q5   q95  rhat ess_bulk ess_tail
   <chr>       <dbl>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>    <dbl>
-1 b_Intercept 0.586  0.592 0.540 0.519 -0.323  1.48  1.00    1770.    2034.
+1 b_Intercept 0.767 0.758 0.636 0.622 -0.249 1.88 1.00 1660. 1769.

We can compute the probability of success by using plogis which is equal to inverse-logit function

draws <- draws |>
   mutate_variables(theta=plogis(b_Intercept))
@@ -1797,12 +1773,12 @@

3 Binomial model

# A tibble: 1 × 10
   variable  mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
   <chr>    <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
-1 theta    0.634  0.644 0.119 0.117 0.420 0.815  1.00    1770.    2034.
+1 theta 0.669 0.681 0.130 0.132 0.438 0.868 1.00 1660. 1769.

Histogram of theta

mcmc_hist(draws, pars='theta') +
   xlab('theta') +
   xlim(c(0,1))
-

+

Re-run the model with a new data dataset without recompiling

data_bin <- data.frame(N = c(5), y = c(4))
 fit_bin <- update(fit_bin, newdata = data_bin)
@@ -1817,7 +1793,7 @@

3 Binomial model

Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS -Intercept 0.76 0.71 -0.58 2.21 1.00 1325 1957 +Intercept 1.08 0.94 -0.63 3.07 1.00 1384 1655 Draws were sampled using sample(hmc). For each parameter, Bulk_ESS and Tail_ESS are effective sample size measures, and Rhat is the potential @@ -1831,7 +1807,7 @@

3 Binomial model

# A tibble: 1 × 10
   variable     mean median    sd   mad     q5   q95  rhat ess_bulk ess_tail
   <chr>       <dbl>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>    <dbl>
-1 b_Intercept 0.758  0.745 0.715 0.705 -0.373  1.92  1.00    1325.    1957.
+1 b_Intercept 1.08 0.997 0.941 0.903 -0.319 2.72 1.00 1384. 1655.

We can compute the probability of success by using plogis which is equal to inverse-logit function

draws <- draws |>
   mutate_variables(theta=plogis(b_Intercept))
@@ -1842,12 +1818,12 @@

3 Binomial model

# A tibble: 1 × 10
   variable  mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
   <chr>    <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
-1 theta    0.664  0.678 0.144 0.152 0.408 0.872  1.00    1325.    1957.
+1 theta 0.712 0.730 0.161 0.171 0.421 0.938 1.00 1384. 1655.

Histogram of theta

mcmc_hist(draws, pars='theta') +
   xlab('theta') +
   xlim(c(0,1))
-

+

4 Comparison of two groups with Binomial

@@ -1858,37 +1834,37 @@

4 Comparison of two gr

Data, where grp2 is an indicator variable defined as a factor type, which is useful for categorical variables.

data_bin2 <- data.frame(N = c(674, 680), y = c(39,22), grp2 = factor(c('control','treatment')))
-

To analyse whether the treatment is useful, we can use Binomial model for both groups and compute odds-ratio.

-
fit_bin2 <- brm(y | trials(N) ~ grp2, family = binomial(), data = data_bin2,
-                prior = prior(normal(0,1), class='Intercept'),
+

To analyse whether the treatment is useful, we can use Binomial model for both groups and compute odds-ratio. To recreate the model as two independent (separate) binomial models, we use formula y | trials(N) ~ 0 + grp2, which corresponds to a model \(\mathrm{logit}(\theta) = \alpha \times 0 + \beta_\mathrm{control}\times x_\mathrm{control} + \beta_\mathrm{treatment}\times x_\mathrm{treatment} = \beta_\mathrm{control}\times x_\mathrm{control} + \beta_\mathrm{treatment}\times x_\mathrm{treatment}\), where \(x_\mathrm{control}\) is a vector with 1 for control and 0 for treatment, and \(x_\mathrm{treatemnt}\) is a vector with 1 for treatemnt and 0 for control. As only of the vectors have 1, this corresponds to separate models \(\mathrm{logit}(\theta_\mathrm{control}) = \beta_\mathrm{control}\) and \(\mathrm{logit}(\theta_\mathrm{treatment}) = \beta_\mathrm{treatment}\). We can provide the same prior for all \(\beta\)’s by setting the prior with class='b'. With prior student_t(7, 0,1.5), both \(\beta\)’s are shrunk towards 0, but independently.

+
fit_bin2 <- brm(y | trials(N) ~ 0 + grp2, family = binomial(), data = data_bin2,
+                prior = prior(student_t(7, 0,1.5), class='b'),
                 seed = SEED, refresh = 0)
-

Check the summary of the posterior and convergence. brms is using the first factor level control as the baseline and thus reports the coefficient (population-level effect) for treatment (shown s grp2treatment)

+

Check the summary of the posterior and convergence. brms is using the first factor level control as the baseline and thus reports the coefficient (population-level effect) for treatment (shown s grp2treatment) Check the summary of the posterior and convergence. With ~ 0 + grp2 there is no Intercept and and are presented as grp2control and grp2treatment.

fit_bin2
 Family: binomial 
   Links: mu = logit 
-Formula: y | trials(N) ~ grp2 
+Formula: y | trials(N) ~ 0 + grp2 
    Data: data_bin2 (Number of observations: 2) 
   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
          total post-warmup draws = 4000
 
 Population-Level Effects: 
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
-Intercept        -2.76      0.16    -3.09    -2.46 1.00     4528     2752
-grp2treatment    -0.58      0.27    -1.13    -0.07 1.00     2445     2245
+grp2control      -2.77      0.16    -3.10    -2.48 1.00     3563     3085
+grp2treatment    -3.37      0.22    -3.81    -2.93 1.00     3824     1939
 
 Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
 and Tail_ESS are effective sample size measures, and Rhat is the potential
 scale reduction factor on split chains (at convergence, Rhat = 1).
-

Compute theta for each group and the odds-ratio

+

Compute theta for each group and the odds-ratio. brms uses bariable names b_grp2control and b_grp2treatment for \(\beta_\mathrm{control}\) and \(\beta_\mathrm{treatment}\) respectively.

draws_bin2 <- as_draws_df(fit_bin2) |>
-  mutate(theta_control = plogis(b_Intercept),
-         theta_treatment = plogis(b_Intercept + b_grp2treatment),
+  mutate(theta_control = plogis(b_grp2control),
+         theta_treatment = plogis(b_grp2treatment),
          oddsratio = (theta_treatment/(1-theta_treatment))/(theta_control/(1-theta_control)))

Plot oddsratio

mcmc_hist(draws_bin2, pars='oddsratio') +
   scale_x_continuous(breaks=seq(0.2,1.6,by=0.2))+
   geom_vline(xintercept=1, linetype='dashed')
-

+

Probability that the oddsratio<1

draws_bin2 |>
   mutate(poddsratio = oddsratio<1) |>
@@ -1897,15 +1873,105 @@ 

4 Comparison of two gr
# A tibble: 1 × 3
   variable    mean mcse_mean
   <chr>      <dbl>     <dbl>
-1 poddsratio 0.988   0.00200
-

oddratio 95% posterior interval

+1 poddsratio 0.986 0.00230

+

oddsratio 95% posterior interval

draws_bin2 |>
   subset(variable='oddsratio') |>
   summarise_draws(~quantile(.x, probs = c(0.025, 0.975)), ~mcse_quantile(.x, probs = c(0.025, 0.975)))
# A tibble: 1 × 5
   variable  `2.5%` `97.5%` mcse_q2.5 mcse_q97.5
   <chr>      <dbl>   <dbl>     <dbl>      <dbl>
-1 oddsratio  0.322   0.936   0.00547     0.0134
+1 oddsratio 0.317 0.931 0.00586 0.0134
+

Make prior sensitivity analysis by powerscaling both prior and likelihood. Focus on oddsratio which is the quantity of interest. We see that the likelihood is much more informative than the prior, and we would expect to see a different posterior only with a highly informative prior (possibly based on previous similar experiments).

+
oddsratio <- draws_bin2 |>
+  subset_draws(variable='oddsratio')
+powerscale_sensitivity(fit_bin2, prediction = \(x, ...) oddsratio, num_args=list(digits=2)
+                       )$sensitivity |>
+                         filter(variable=='oddsratio') |>
+                         mutate(across(where(is.double),  ~num(.x, digits=2)))
+
# A tibble: 1 × 4
+  variable      prior likelihood diagnosis
+  <chr>     <num:.2!>  <num:.2!> <chr>    
+1 oddsratio      0.01       0.14 -        
+

Above we used formula y | trials(N) ~ 0 + grp2 to have separate model for control and treatment group. An alternative model y | trials(N) ~ grp2 which is equal to y | trials(N) ~ 1 + grp2, would correspond to a model $() = + x = + x. Now \(\alpha\) models the probability of death (via logistic link) in the control group and \(\alpha + \beta_\mathrm{treatment}\) models the probability of death (via logistic link) in the treatment group. Now the models for the groups are connected. Furthermore, if we set independent student_t(7, 0, 1.5) priors on \(\alpha\) and \(\beta_\mathrm{treatment}\), the implied priors on \(\theta_\mathrm{control}\) and \(\theta_\mathrm{treatment}\) are different. We can verify this with a prior simulation.

+
data.frame(theta_control = plogis(ggdist::rstudent_t(n=20000, df=7, mu=0, sigma=1.5))) |>
+  mcmc_hist() +
+  xlim(c(0,1)) +
+  labs(title='student_t(7, 0, 1.5) prior on Intercept') +
+data.frame(theta_treatment = plogis(ggdist::rstudent_t(n=20000, df=7, mu=0, sigma=1.5))+
+             plogis(ggdist::rstudent_t(n=20000, df=7, mu=0, sigma=1.5))) |>
+  mcmc_hist() +
+  xlim(c(0,1)) +
+  labs(title='student_t(7, 0, 1.5) prior on Intercept and b_grp2treatment')
+

+

In this case, with relatively big treatment and control group, the likelihood is informative, and the difference between using y | trials(N) ~ 0 + grp2 or y | trials(N) ~ grp2 is negligible.

+

Third option would be a hierarchical model with formula y | trials(N) ~ 1 + (1 | grp2), which is equivalent to y | trials(N) ~ 1 + (1 | grp2), and corresponds to a model \(\mathrm{logit}(\theta) = \alpha \times 1 + \beta_\mathrm{control}\times x_\mathrm{control} + \beta_\mathrm{treatment}\times x_\mathrm{treatment}\), but now the prior on \(\beta_\mathrm{control}\) and \(\beta_\mathrm{treatment}\) is \(\mathrm{normal}(0, \sigma_\mathrm{grp})\). The default brms prior for \(\sigma_\mathrm{grp}\) is student_t(3, 0, 2.5). Now \(\alpha\) models the overall probablity of death (via logistic link), and \(\beta_\mathrm{control}\) and \(\beta_\mathrm{treatment}\) model the difference from that having the same prior. Prior for \(\beta_\mathrm{control}\) and \(\beta_\mathrm{treatment}\) includes unknown scale \(\sigma_\mathrm{grp}\). If the there is not difference between control and treatment groups, the posterior of \(\sigma_\mathrm{grp}\) has more mass near 0, and bigger the difference between control and treatment groups are, more mass there is away from 0. With just two groups, there is not much information about \(\sigma_\mathrm{grp}\), and unless there is a informative prior on \(\sigma_\mathrm{grp}\), two group hierarchical model is not that useful. Hierarchical models are more useful with more than two groups. In the following, we use the previously used student_t(7, 0,1.5) prior on intercept and the default brms prior student_t(3, 0, 2.5) on \(\sigma_\mathrm{grp}\).

+
fit_bin2 <- brm(y | trials(N) ~ 1 + (1 | grp2), family = binomial(), data = data_bin2,
+                prior = prior(student_t(7, 0,1.5), class='Intercept'),
+                seed = SEED, refresh = 0, control=list(adapt_delta=0.99))
+

Check the summary of the posterior and convergence. The summary reports that there are Group-Level Effects: ~grp2 with 2 levels (control and treatment), with sd(Intercept) denoting \(\sigma_\mathrm{grp}\). In addition, the summary lists Population-Level Effects: Intercept (\(\alpha\)) as in the prevous non-hierarchical models.

+
fit_bin2
+
Warning: There were 1 divergent transitions after warmup. Increasing
+adapt_delta above 0.99 may help. See
+http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
+
 Family: binomial 
+  Links: mu = logit 
+Formula: y | trials(N) ~ 1 + (1 | grp2) 
+   Data: data_bin2 (Number of observations: 2) 
+  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
+         total post-warmup draws = 4000
+
+Group-Level Effects: 
+~grp2 (Number of levels: 2) 
+              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
+sd(Intercept)     1.69      1.57     0.15     5.69 1.01      538     1113
+
+Population-Level Effects: 
+          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
+Intercept    -2.18      1.28    -3.85     1.01 1.01      569     1027
+
+Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
+and Tail_ESS are effective sample size measures, and Rhat is the potential
+scale reduction factor on split chains (at convergence, Rhat = 1).
+

We can also look at the variable names brms uses internally

+
as_draws_rvars(fit_bin2)
+
# A draws_rvars: 1000 iterations, 4 chains, and 5 variables
+$b_Intercept: rvar<1000,4>[1] mean ± sd:
+[1] -2.2 ± 1.3 
+
+$sd_grp2__Intercept: rvar<1000,4>[1] mean ± sd:
+[1] 1.7 ± 1.6 
+
+$r_grp2: rvar<1000,4>[2,1] mean ± sd:
+          Intercept   
+control   -0.63 ± 1.3 
+treatment -1.19 ± 1.3 
+
+$lprior: rvar<1000,4>[1] mean ± sd:
+[1] -4.3 ± 0.74 
+
+$lp__: rvar<1000,4>[1] mean ± sd:
+[1] -13 ± 1.8 
+

Although there is no difference, illustrate how to compute the oddsratio from hierarchical model

+
draws_bin2 <- as_draws_df(fit_bin2)
+oddsratio <- draws_bin2 |>
+  mutate_variables(theta_control = plogis(b_Intercept + `r_grp2[control,Intercept]`),
+                   theta_treatment = plogis(b_Intercept + `r_grp2[treatment,Intercept]`),
+                   oddsratio = (theta_treatment/(1-theta_treatment))/(theta_control/(1-theta_control))) |>
+  subset_draws(variable='oddsratio')
+oddsratio |> mcmc_hist() +
+  scale_x_continuous(breaks=seq(0.2,1.6,by=0.2))+
+  geom_vline(xintercept=1, linetype='dashed')
+

+

Make also prior sensitivity analysis with focus on oddsratio.

+
powerscale_sensitivity(fit_bin2, prediction = \(x, ...) oddsratio, num_args=list(digits=2)
+                       )$sensitivity |>
+                         filter(variable=='oddsratio') |>
+                         mutate(across(where(is.double),  ~num(.x, digits=2)))
+
# A tibble: 1 × 4
+  variable      prior likelihood diagnosis
+  <chr>     <num:.2!>  <num:.2!> <chr>    
+1 oddsratio      0.00       0.16 -        

5 Linear Gaussian model

@@ -1921,16 +1987,46 @@

5 Linear Gaussian mode guides(linetype = "none")

To analyse has there been change in the average summer month temperature we use a linear model with Gaussian model for the unexplained variation. By default brms uses uniform prior for the coefficients.

-

temp ~ year means temp depends on the intercept and temp. The model could also be defined as temp ~ 1 + year which explicitly shows the intercept part. The corresponding regression model is temp ~ normal(b_Intercept1 + b_yearyear, sigma)

+

Formula temp ~ year corresponds to model \(\mathrm{temp} ~ \mathrm{normal}(\alpha + \beta \times \mathrm{temp}, \sigma). The model could also be defined as `temp ~ 1 + year` which explicitly shows the intercept (\)$) part. Using the variable names brms uses the model can be written also as temp ~ normal(b_Intercept*1 + b_year*year, sigma). We start with the default priors to see some tricks that brms does behind the curtain.

fit_lin <- brm(temp ~ year, data = data_lin, family = gaussian(),
                seed = SEED, refresh = 0)
-

We can check the all the priors used. In general it is good to use proper priors, but sometimes flat priors are fine and produce proper posterior.

+

Check the summary of the posterior and convergence.

+
fit_lin
+
 Family: gaussian 
+  Links: mu = identity; sigma = identity 
+Formula: temp ~ year 
+   Data: data_lin (Number of observations: 71) 
+  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
+         total post-warmup draws = 4000
+
+Population-Level Effects: 
+          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
+Intercept   -34.69     12.49   -58.73   -10.19 1.00     3995     3035
+year          0.02      0.01     0.01     0.03 1.00     3996     3035
+
+Family Specific Parameters: 
+      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
+sigma     1.08      0.09     0.91     1.28 1.00     3057     3011
+
+Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
+and Tail_ESS are effective sample size measures, and Rhat is the potential
+scale reduction factor on split chains (at convergence, Rhat = 1).
+

Convergence diagnostics look good. We see that posterior mean of Intercept is -34.7, which may sound strange, but that is the intercept at year 0, that is, very far from the data range, and thus doesn’t have meaningful interpretation directly. The posterior mean of year coefficient is 0.02, that is, we estimate that the summer temperature is increasing 0.02°C per year (which would make 1°C in 50 years).

+

We can check \(R^2\) which corresponds to the proporion of variance explained by the model. The linear model explains 0.16=16% of the total data variance.

+
bayes_R2(fit_lin) |> round(2)
+
   Estimate Est.Error Q2.5 Q97.5
+R2     0.16      0.07 0.03   0.3
+

We can check the all the priors used.

prior_summary(fit_lin)
                  prior     class coef group resp dpar nlpar lb ub       source
                  (flat)         b                                       default
                  (flat)         b year                             (vectorized)
  student_t(3, 9.5, 2.5) Intercept                                       default
    student_t(3, 0, 2.5)     sigma                             0         default
+

We see that class=b and coef=year have flat, that is, improper uniform prior, Intercept has student_t(3, 9.5, 2.5), and sigma has student_t(3, 0, 2.5) prior. In general it is good to use proper priors, but sometimes flat priors are fine and produce proper posterior (like in this case). Important part here is that by default, brms sets the prior on Intercept after centering the covariate values (design matrix). In this case, brms uses temp - mean(temp) = temp - 1987 instead of original years. This in general improves the sampling efficiency. As the Intercept is now defined at the middle of the data, the default Intercept prior is centered on median of the target (here target is year). If we would like to set informative priors, we need to set the informative prior on Intercept given the centered covariate values. We can turn of the centering by setting argument center=FALSE, and we can set the prior on original intercept by using a formula temp ~ 0 + Intercept + year. In this case, we are happy with the default prior for the intercept. In this specific casse, the flat prior on coefficient is also fine, but we add an weakly informative prior just for the illustration. Let’s assume we expect the temperature to change less than 1°C in 10 years. With student_t(3, 0, 0.03) about 95% prior mass has less than 0.1°C change in year, and with low degrees of freedom (3) we have thick tails making the likelihood dominate in case of prior-data conflict. In real life, we do have much more information about the temperature change, and naturally a hierarchical spatio-temporal model with all temperature measurement locations would be even better.

+
fit_lin <- brm(temp ~ year, data = data_lin, family = gaussian(),
+               prior = prior(student_t(3, 0, 0.03), class='b'),
+               seed = SEED, refresh = 0)

Check the summary of the posterior and convergence

fit_lin
 Family: gaussian 
@@ -1942,50 +2038,59 @@ 

5 Linear Gaussian mode Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS -Intercept -34.69 12.49 -58.73 -10.19 1.00 3995 3035 -year 0.02 0.01 0.01 0.03 1.00 3996 3035 +Intercept -32.54 12.28 -56.70 -9.01 1.00 4183 3259 +year 0.02 0.01 0.01 0.03 1.00 4182 3259 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS -sigma 1.08 0.09 0.91 1.28 1.00 3057 3011 +sigma 1.08 0.09 0.92 1.27 1.00 3494 2709 Draws were sampled using sample(hmc). For each parameter, Bulk_ESS and Tail_ESS are effective sample size measures, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).

-

Extract the posterior draws and check the summaries

+

Make prior sensitivity analysis by powerscaling both prior and likelihood.

+
powerscale_sensitivity(fit_lin)$sensitivity |>
+                                mutate(across(where(is.double),  ~num(.x, digits=2)))
+
# A tibble: 3 × 4
+  variable        prior likelihood diagnosis
+  <chr>       <num:.2!>  <num:.2!> <chr>    
+1 b_Intercept      0.03       0.09 -        
+2 b_year           0.03       0.09 -        
+3 sigma            0.00       0.13 -        
+

Our weakly informative proper prior has negligible sensitivity, and the likelihood is informative. Extract the posterior draws and check the summaries

draws_lin <- as_draws_df(fit_lin) 
 draws_lin |> summarise_draws()
# A tibble: 5 × 10
   variable        mean   median      sd     mad       q5      q95  rhat ess_bulk
   <chr>          <dbl>    <dbl>   <dbl>   <dbl>    <dbl>    <dbl> <dbl>    <dbl>
-1 b_Intercept -3.47e+1 -3.47e+1 1.25e+1 1.23e+1 -5.49e+1 -1.39e+1  1.00    3995.
-2 b_year       2.22e-2  2.22e-2 6.29e-3 6.20e-3  1.17e-2  3.24e-2  1.00    3996.
-3 sigma        1.08e+0  1.08e+0 9.49e-2 9.45e-2  9.33e-1  1.25e+0  1.00    3057.
-4 lprior      -3.27e+0 -3.26e+0 2.16e-2 2.11e-2 -3.30e+0 -3.23e+0  1.00    2908.
-5 lp__        -1.09e+2 -1.08e+2 1.31e+0 1.06e+0 -1.12e+2 -1.07e+2  1.00    2014.
+1 b_Intercept -3.25e+1 -3.24e+1 1.23e+1 1.24e+1 -5.29e+1 -1.29e+1  1.00    4183.
+2 b_year       2.11e-2  2.11e-2 6.18e-3 6.22e-3  1.12e-2  3.14e-2  1.00    4182.
+3 sigma        1.08e+0  1.07e+0 9.14e-2 9.08e-2  9.43e-1  1.24e+0  1.00    3494.
+4 lprior      -1.08e+0 -1.06e+0 1.65e-1 1.65e-1 -1.38e+0 -8.51e-1  1.00    4173.
+5 lp__        -1.07e+2 -1.06e+2 1.21e+0 9.72e-1 -1.09e+2 -1.05e+2  1.00    1899.
 # ℹ 1 more variable: ess_tail <dbl>

If one of the columns is hidden we can force printing all columns

draws_lin |> summarise_draws() |> print(width=Inf)
# A tibble: 5 × 10
   variable         mean    median       sd      mad        q5       q95  rhat
   <chr>           <dbl>     <dbl>    <dbl>    <dbl>     <dbl>     <dbl> <dbl>
-1 b_Intercept  -34.7     -34.7    12.5     12.3      -54.9     -13.9     1.00
-2 b_year         0.0222    0.0222  0.00629  0.00620    0.0117    0.0324  1.00
-3 sigma          1.08      1.08    0.0949   0.0945     0.933     1.25    1.00
-4 lprior        -3.27     -3.26    0.0216   0.0211    -3.30     -3.23    1.00
-5 lp__        -109.     -108.      1.31     1.06    -112.     -107.      1.00
+1 b_Intercept  -32.5     -32.4    12.3     12.4      -52.9     -12.9     1.00
+2 b_year         0.0211    0.0211  0.00618  0.00622    0.0112    0.0314  1.00
+3 sigma          1.08      1.07    0.0914   0.0908     0.943     1.24    1.00
+4 lprior        -1.08     -1.06    0.165    0.165     -1.38     -0.851   1.00
+5 lp__        -107.     -106.      1.21     0.972   -109.     -105.      1.00
   ess_bulk ess_tail
      <dbl>    <dbl>
-1    3995.    3035.
-2    3996.    3035.
-3    3057.    3011.
-4    2908.    2802.
-5    2014.    2714.
+1 4183. 3259. +2 4182. 3259. +3 3494. 2709. +4 4173. 3285. +5 1899. 2576.

Histogram of b_year

draws_lin |>
   mcmc_hist(pars='b_year') +
   xlab('Average temperature increase per year')
-

+

Probability that the coefficient b_year > 0 and the corresponding MCSE

draws_lin |>
   mutate(I_b_year_gt_0 = b_year>0) |>
@@ -1994,7 +2099,8 @@ 

5 Linear Gaussian mode
# A tibble: 1 × 3
   variable       mean mcse_mean
   <chr>         <dbl>     <dbl>
-1 I_b_year_gt_0  1.00  0.000353
+1 I_b_year_gt_0 1 NA

+

All posterior draws have b_year>0, the probability gets rounded to 1, and MCSE is not available as the obserevd posterior variance is 0.

95% posterior interval for temperature increase per 100 years

draws_lin |>
   mutate(b_year_100 = b_year*100) |>
@@ -2005,7 +2111,7 @@ 

5 Linear Gaussian mode
# A tibble: 1 × 5
   variable   `2.5%` `97.5%` mcse_q2.5 mcse_q97.5
   <chr>       <dbl>   <dbl>     <dbl>      <dbl>
-1 b_year_100   0.99    3.44      0.03       0.03
+1 b_year_100 0.93 3.33 0.03 0.03

Plot posterior draws of the linear function values at each year. add_linpred_draws() takes the years from the data and uses fit_lin to make the predictions.

data_lin |>
   add_linpred_draws(fit_lin) |>
@@ -2019,7 +2125,7 @@ 

5 Linear Gaussian mode labs(x= "Year", y = 'Summer temp. @Kilpisjärvi') + theme(legend.position="none")+ scale_x_continuous(breaks=seq(1950,2020,by=10))

-

+

Alternativelly plot a spaghetti plot for 100 draws

data_lin |>
   add_linpred_draws(fit_lin, ndraws=100) |>
@@ -2033,7 +2139,7 @@ 

5 Linear Gaussian mode labs(x= "Year", y = 'Summer temp. @Kilpisjärvi') + theme(legend.position="none")+ scale_x_continuous(breaks=seq(1950,2020,by=10))

-

+

Plot posterior predictive distribution at each year until 2030 add_predicted_draws() takes the years from the data and uses fit_lin to make the predictions.

data_lin |>
   add_row(year=2023:2030) |>
@@ -2049,14 +2155,15 @@ 

5 Linear Gaussian mode theme(legend.position="none")+ scale_x_continuous(breaks=seq(1950,2030,by=10))

Warning: Removed 32000 rows containing missing values (`geom_point()`).
-

+

-

6 Linear Student’s t model

-

The temperatures used in the above analyses are averages over three months, which makes it more likely that they are normally distributed, but there can be extreme events in the feather and we can check whether more robust Student’s t observation model woul give different results.

+

6 Linear Student’s \(t\) model

+

The temperatures used in the above analyses are averages over three months, which makes it more likely that they are normally distributed, but there can be extreme events in the feather and we can check whether more robust Student’s \(t\) observation model would give different results.

fit_lin_t <- brm(temp ~ year, data = data_lin, family = student(),
+                 prior = prior(student_t(3, 0, 0.03), class='b'),
                  seed = SEED, refresh = 0)
-

Check the summary of the posterior and convergence. The b_year posterior looks similar as before and the posterior for degrees of freedom nu has most of the posterior mas for quite large values indicating there is no strong support for thick tailed variation in temperature.

+

Check the summary of the posterior and convergence. The b_year posterior looks similar as before and the posterior for degrees of freedom nu has most of the posterior mass for quite large values indicating there is no strong support for thick tailed variation in average summer temperatures.

fit_lin_t
 Family: student 
   Links: mu = identity; sigma = identity; nu = identity 
@@ -2067,13 +2174,13 @@ 

6 Linear Student’s t Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS -Intercept -35.80 12.58 -59.61 -10.87 1.00 4023 2679 -year 0.02 0.01 0.01 0.03 1.00 4021 2678 +Intercept -34.01 12.27 -58.50 -9.31 1.00 3979 2893 +year 0.02 0.01 0.01 0.03 1.00 3979 2923 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS -sigma 1.03 0.10 0.85 1.25 1.00 3454 3159 -nu 24.53 14.46 6.24 60.95 1.00 3450 2698 +sigma 1.03 0.10 0.86 1.24 1.00 3209 2302 +nu 24.54 14.36 6.36 60.80 1.00 2972 2325 Draws were sampled using sample(hmc). For each parameter, Bulk_ESS and Tail_ESS are effective sample size measures, and Rhat is the potential @@ -2082,11 +2189,11 @@

6 Linear Student’s t

7 Pareto-smoothed importance-sampling leave-one-out cross-validation (PSIS-LOO)

We can use leave-one-out cross-validation to compare the expected predictive performance.

-

LOO comparison shows normal and Student’s t model have similar performance.

+

LOO comparison shows normal and Student’s \(t\) model have similar performance.

loo_compare(loo(fit_lin), loo(fit_lin_t))
          elpd_diff se_diff
 fit_lin    0.0       0.0   
-fit_lin_t -0.3       0.4   
+fit_lin_t -0.4 0.3

8 Heteroskedastic linear model

@@ -2094,6 +2201,7 @@

8 Heteroskedastic line
fit_lin_h <- brm(bf(temp ~ year,
                     sigma ~ year),
                  data = data_lin, family = gaussian(),
+                 prior = prior(student_t(3, 0, 0.03), class='b'),
                  seed = SEED, refresh = 0)

Check the summary of the posterior and convergence. The b_year posterior looks similar as before. The posterior for sigma_year looks like having mosst of the ma for negative values, indicating decrease in temperature variation around the mean.

fit_lin_h
@@ -2107,10 +2215,10 @@

8 Heteroskedastic line Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS -Intercept -38.66 12.65 -64.25 -13.47 1.00 3622 3040 -sigma_Intercept 19.17 8.99 0.99 36.23 1.00 4113 3133 -year 0.02 0.01 0.01 0.04 1.00 3642 3040 -sigma_year -0.01 0.00 -0.02 -0.00 1.00 4111 3056 +Intercept -36.37 12.49 -61.25 -10.49 1.00 3412 2842 +sigma_Intercept 19.10 8.69 1.56 35.80 1.00 3818 2899 +year 0.02 0.01 0.01 0.04 1.00 3426 2885 +sigma_year -0.01 0.00 -0.02 -0.00 1.00 3810 2855 Draws were sampled using sample(hmc). For each parameter, Bulk_ESS and Tail_ESS are effective sample size measures, and Rhat is the potential @@ -2118,7 +2226,7 @@

8 Heteroskedastic line

Histogram of b_year and b_sigma_year

as_draws_df(fit_lin_h) |>
   mcmc_areas(pars=c('b_year', 'b_sigma_year'))
-

+

As log(x) is almost linear when x is close to zero, we can see that the sigma is decreasing about 1% per year (95% interval from 0% to 2%).

Plot posterior predictive distribution at each year until 2030 add_predicted_draws() takes the years from the data and uses fit_lin_h to make the predictions.

data_lin |>
@@ -2135,28 +2243,35 @@ 

8 Heteroskedastic line theme(legend.position="none")+ scale_x_continuous(breaks=seq(1950,2030,by=10))

Warning: Removed 32000 rows containing missing values (`geom_point()`).
-

+

+

Make prior sensitivity analysis by powerscaling both prior and likelihood.

+
powerscale_sensitivity(fit_lin_h)$sensitivity |>
+                                mutate(across(where(is.double),  ~num(.x, digits=2)))
+
# A tibble: 4 × 4
+  variable              prior likelihood diagnosis
+  <chr>             <num:.2!>  <num:.2!> <chr>    
+1 b_Intercept            0.03       0.11 -        
+2 b_sigma_Intercept      0.00       0.10 -        
+3 b_year                 0.03       0.11 -        
+4 b_sigma_year           0.00       0.11 -        

We can use leave-one-out cross-validation to compare the expected predictive performance.

LOO comparison shows homoskedastic normal and heteroskedastic normal models have similar performances.

loo_compare(loo(fit_lin), loo(fit_lin_h))
          elpd_diff se_diff
 fit_lin_h  0.0       0.0   
-fit_lin   -1.7       1.6   
+fit_lin -1.6 1.6

9 Heteroskedastic non-linear model

We can test the linearity assumption by using non-linear spline functions, by uing s(year) terms. Sampling is slower as the posterior gets more complex.

-
fit_lin_hs <- brm(bf(temp ~ s(year),
+
fit_spline_h <- brm(bf(temp ~ s(year),
                      sigma ~ s(year)),
                   data = data_lin, family = gaussian(),
                   seed = SEED, refresh = 0)

We get warnings about divergences, and try rerunning with higher adapt_delta, which leads to using smaller step sizes. Often adapt_delta=0.999 leads to very slow sampling, but with this small data, this is not an issue.

-
fit_lin_hs <- update(fit_lin_hs, control = list(adapt_delta=0.999))
-

Check the summary of the posterior and convergence. The b_year posterior looks similar as before. The posterior for sigma_year looks like having mosst of the ma for negative values, indicating decrease in temperature variation around the mean.

-
fit_lin_hs
-
Warning: There were 4 divergent transitions after warmup. Increasing
-adapt_delta above 0.999 may help. See
-http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
+
fit_spline_h <- update(fit_spline_h, control = list(adapt_delta=0.999))
+

Check the summary of the posterior and convergence. We’re not anymore able to make interpretation of the temperature increase based on this summary. For splines, we see prior scales sds for the spline coefficients.

+
fit_spline_h
 Family: gaussian 
   Links: mu = identity; sigma = log 
 Formula: temp ~ s(year) 
@@ -2167,23 +2282,23 @@ 

9 Heteroskedastic non- Smooth Terms: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS -sds(syear_1) 1.10 1.25 0.04 3.87 1.00 1450 1845 -sds(sigma_syear_1) 0.94 0.91 0.03 3.33 1.00 1358 2175 +sds(syear_1) 1.00 0.91 0.04 3.37 1.00 1463 1648 +sds(sigma_syear_1) 0.96 0.95 0.02 3.60 1.00 1225 1585 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS -Intercept 9.42 0.13 9.16 9.68 1.00 5244 2923 -sigma_Intercept 0.04 0.09 -0.13 0.23 1.00 4473 2645 -syear_1 2.93 2.80 -2.82 9.12 1.00 1509 1068 -sigma_syear_1 -1.14 2.49 -6.99 3.80 1.00 1684 1312 +Intercept 9.42 0.13 9.15 9.67 1.00 4559 2617 +sigma_Intercept 0.04 0.09 -0.13 0.22 1.00 4601 2718 +syear_1 2.86 2.54 -2.82 8.08 1.00 2050 1906 +sigma_syear_1 -1.16 2.39 -7.17 3.49 1.00 1634 1042 Draws were sampled using sample(hmc). For each parameter, Bulk_ESS and Tail_ESS are effective sample size measures, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).

-

Plot posterior predictive distribution at each year until 2030 add_predicted_draws() takes the years from the data and uses fit_lin_h to make the predictions.

+

We can still plot posterior predictive distribution at each year until 2030 add_predicted_draws() takes the years from the data and uses fit_lin_h to make the predictions.

data_lin |>
   add_row(year=2023:2030) |>
-  add_predicted_draws(fit_lin_hs) |>
+  add_predicted_draws(fit_spline_h) |>
   # plot data
   ggplot(aes(x=year, y=temp)) +
   geom_point(color=2) +
@@ -2195,20 +2310,68 @@ 

9 Heteroskedastic non- theme(legend.position="none")+ scale_x_continuous(breaks=seq(1950,2030,by=10))

Warning: Removed 32000 rows containing missing values (`geom_point()`).
-

-

We can use leave-one-out cross-validation to compare the expected predictive performance.

+

+

And we can use leave-one-out cross-validation to compare the expected predictive performance.

LOO comparison shows homoskedastic normal linear and heteroskedastic normal spline models have similar performances. There are not enough observations to make clear difference between the models.

-
loo_compare(loo(fit_lin), loo(fit_lin_hs))
-
Warning: Found 1 observations with a pareto_k > 0.7 in model 'fit_lin_hs'. It
+
loo_compare(loo(fit_lin), loo(fit_spline_h))
+
Warning: Found 1 observations with a pareto_k > 0.7 in model 'fit_spline_h'. It
 is recommended to set 'moment_match = TRUE' in order to perform moment matching
 for problematic observations.
-
           elpd_diff se_diff
-fit_lin_hs  0.0       0.0   
-fit_lin    -0.7       1.8   
+
             elpd_diff se_diff
+fit_spline_h  0.0       0.0   
+fit_lin      -0.5       1.8   
+

For spline and other non-parametric models, we can use predictive estimates and predictions to get interpretable quantities. Let’s examine the difference of estimated average temperature in years 1952 and 2022.

+
temp_diff <- posterior_epred(fit_spline_h, newdata=filter(data_lin,year==1952|year==2022)) |>
+  rvar() |>
+  diff() |>
+  as_draws_df() |>
+  set_variables('temp_diff')
+
+temp_diff <- data_lin |>
+  filter(year==1952|year==2022) |>
+  add_epred_draws(fit_spline_h) |>
+  pivot_wider(id_cols=.draw, names_from = year, values_from = .epred) |>
+  mutate(temp_diff = `2022`-`1952`,
+         .chain = (.draw - 1) %/% 1000 + 1,
+         .iteration = (.draw - 1) %% 1000 + 1) |>
+  as_draws_df() |>
+  subset_draws(variable='temp_diff')
+

Posterior distribution for average summer temperature increase from 1952 to 2022

+
temp_diff |>
+  mcmc_hist()
+

+

95% posterior interval for average summer temperature increase from 1952 to 2022

+
temp_diff |>
+  summarise_draws(~quantile(.x, probs = c(0.025, 0.975)),
+                  ~mcse_quantile(.x, probs = c(0.025, 0.975)),
+                  .num_args = list(digits = 2, notation = "dec"))
+
# A tibble: 1 × 5
+  variable  `2.5%` `97.5%` mcse_q2.5 mcse_q97.5
+  <chr>      <dbl>   <dbl>     <dbl>      <dbl>
+1 temp_diff   0.56    2.57      0.03       0.02
+

Make prior sensitivity analysis by powerscaling both prior and likelihood with focus on average summer temperature increase from 1952 to 2022.

+
powerscale_sensitivity(fit_spline_h, prediction = \(x, ...) temp_diff, num_args=list(digits=2)
+                       )$sensitivity |>
+                         filter(variable=='temp_diff') |>
+                         mutate(across(where(is.double),  ~num(.x, digits=2)))
+
# A tibble: 1 × 4
+  variable      prior likelihood diagnosis
+  <chr>     <num:.2!>  <num:.2!> <chr>    
+1 temp_diff      0.01       0.08 -        
+

Probability that the average summer temperature has increased from 1952 to 2022 is 99.5%.

+
temp_diff |>
+  mutate(I_temp_diff_gt_0 = temp_diff>0,
+         temp_diff = NULL) |>
+  subset_draws(variable='I_temp_diff_gt_0') |>
+  summarise_draws(mean, mcse_mean)
+
# A tibble: 1 × 3
+  variable          mean mcse_mean
+  <chr>            <dbl>     <dbl>
+1 I_temp_diff_gt_0 0.998  0.000787

10 Comparison of k groups with hierarchical normal models

-

Load factory data, which contain 5 quality measurements for each of 6 machines. We’re interested in analying are the quality differences between the machines.

+

Load factory data, which contain 5 quality measurements for each of 6 machines. We’re interested in analysing are the quality differences between the machines.

factory <- read.table(url('https://raw.githubusercontent.com/avehtari/BDA_course_Aalto/master/rpackage/data-raw/factory.txt'))
 colnames(factory) <- 1:6
 factory
@@ -2242,13 +2405,61 @@

10 Comparison of k gr

10.1 Pooled model

As comparison make also pooled model

fit_pooled <- brm(quality ~ 1, data = factory, refresh=0)
+

Check the summary of the posterior and convergence.

+
fit_pooled
+
 Family: gaussian 
+  Links: mu = identity; sigma = identity 
+Formula: quality ~ 1 
+   Data: factory (Number of observations: 30) 
+  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
+         total post-warmup draws = 4000
+
+Population-Level Effects: 
+          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
+Intercept    92.95      3.28    86.54    99.40 1.00     2644     2350
+
+Family Specific Parameters: 
+      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
+sigma    18.46      2.60    14.25    24.36 1.00     2771     2129
+
+Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
+and Tail_ESS are effective sample size measures, and Rhat is the potential
+scale reduction factor on split chains (at convergence, Rhat = 1).

10.2 Separate model

As comparison make also seprate model. To make it completely separate we need to have different sigma for each machine, too.

-
fit_separate <- brm(bf(quality ~ machine,
-                       sigma ~ machine),
+
fit_separate <- brm(bf(quality ~ 0 + machine,
+                       sigma ~ 0 + machine),
                     data = factory, refresh=0)
+

Check the summary of the posterior and convergence.

+
fit_separate
+
 Family: gaussian 
+  Links: mu = identity; sigma = log 
+Formula: quality ~ 0 + machine 
+         sigma ~ 0 + machine
+   Data: factory (Number of observations: 30) 
+  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
+         total post-warmup draws = 4000
+
+Population-Level Effects: 
+               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
+machine1          75.53     12.30    50.60    98.69 1.01     2164     1545
+machine2         106.31      7.28    91.55   121.14 1.00     2775     2052
+machine3          87.28      9.04    69.54   104.23 1.00     1502     1144
+machine4         111.54      4.37   102.46   120.46 1.00     2445     1803
+machine5          89.86      6.61    76.45   102.65 1.00     1915     1225
+machine6          86.02     11.82    61.70   108.54 1.00     2109     1822
+sigma_machine1     3.12      0.41     2.47     4.07 1.00     2352     1483
+sigma_machine2     2.61      0.41     1.94     3.53 1.00     2510     1829
+sigma_machine3     2.70      0.42     2.06     3.68 1.00     1838     1355
+sigma_machine4     2.15      0.39     1.51     3.03 1.00     2498     1795
+sigma_machine5     2.52      0.39     1.89     3.43 1.00     1833     1669
+sigma_machine6     3.09      0.39     2.46     3.96 1.00     2357     1717
+
+Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
+and Tail_ESS are effective sample size measures, and Rhat is the potential
+scale reduction factor on split chains (at convergence, Rhat = 1).
@@ -2267,37 +2478,79 @@

11 Common variance hi Group-Level Effects: ~machine (Number of levels: 6) Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS -sd(Intercept) 12.65 5.88 3.02 27.05 1.01 773 605 +sd(Intercept) 12.78 6.02 3.71 27.57 1.00 1038 1397 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS -Intercept 92.84 5.68 81.06 104.18 1.00 1637 1814 +Intercept 92.79 5.70 81.60 104.46 1.00 1400 1299 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS -sigma 15.05 2.26 11.48 20.24 1.00 1541 2260 +sigma 15.09 2.28 11.40 20.37 1.00 2195 2348 Draws were sampled using sample(hmc). For each parameter, Bulk_ESS and Tail_ESS are effective sample size measures, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). -

LOO comparison shows the hierarchical model is the best

+

LOO comparison shows the hierarchical model is the best. The differences are small as the number of observations is small and there is a considerable prediction (aleatoric) uncertainty.

loo_compare(loo(fit_pooled), loo(fit_separate), loo(fit_hier))
-
Warning: Found 3 observations with a pareto_k > 0.7 in model 'fit_separate'. It
+
Warning: Found 2 observations with a pareto_k > 0.7 in model 'fit_separate'. It
 is recommended to set 'moment_match = TRUE' in order to perform moment matching
 for problematic observations.
             elpd_diff se_diff
 fit_hier      0.0       0.0   
-fit_separate -3.5       2.7   
+fit_separate -3.2       2.9   
 fit_pooled   -4.0       2.0   
-

Distributions of quality differences from the mean quality

-
mcmc_areas(as_draws_df(fit_hier), regex_pars='r_machine')
-

-

Posterior predictive distributions for 6 old and 1 new machines

-
posterior_predict(fit_hier, newdata=data.frame(machine=1:7, quality=rep(NA,7)),
-                  allow_new_levels=TRUE) |>
+

Different model posterior distributions for the mean quality. Pooled model ignores the varition between machines. Separate model doesn’t take benefit from the similariy of the machines and has higher uncertainty.

+
ph <- fit_hier |>
+  spread_rvars(b_Intercept, r_machine[machine,]) |>
+  mutate(machine_mean = b_Intercept + r_machine) |>
+  ggplot(aes(xdist=machine_mean, y=machine)) +
+  stat_halfeye() +
+  scale_y_continuous(breaks=1:6) +
+  labs(x='Quality', y='Machine', title='Hierarchical')
+
+ps <- fit_separate |>
   as_draws_df() |>
-  mcmc_areas()
-

+ subset_draws(variable='b_machine', regex=TRUE) |> + set_variables(paste0('b_machine[', 1:6, ']')) |> + as_draws_rvars() |> + spread_rvars(b_machine[machine]) |> + mutate(machine_mean = b_machine) |> + ggplot(aes(xdist=machine_mean, y=machine)) + + stat_halfeye() + + scale_y_continuous(breaks=1:6) + + labs(x='Quality', y='Machine', title='Separate') + +pp <- fit_pooled |> + spread_rvars(b_Intercept) |> + mutate(machine_mean = b_Intercept) |> + ggplot(aes(xdist=machine_mean, y=0)) + + stat_halfeye() + + scale_y_continuous(breaks=NULL) + + labs(x='Quality', y='All machines', title='Pooled') + +(pp / ps / ph) * xlim(c(50,140))
+
Warning: Removed 865 rows containing missing values (`geom_slabinterval()`).
+

+

Make prior sensitivity analysis by powerscaling both prior and likelihood with focus on mean quality of each machine. We see no prior sensitivity.

+
machine_mean <- fit_hier |>
+  as_draws_df() |>
+  mutate(across(matches('r_machine'), ~ .x - b_Intercept)) |>
+  subset_draws(variable='r_machine', regex=TRUE) |>
+  set_variables(paste0('machine_mean[', 1:6, ']'))
+powerscale_sensitivity(fit_hier, prediction = \(x, ...) machine_mean, num_args=list(digits=2)
+                       )$sensitivity |>
+                         filter(str_detect(variable,'machine_mean')) |>
+                         mutate(across(where(is.double),  ~num(.x, digits=2)))
+
# A tibble: 6 × 4
+  variable            prior likelihood diagnosis
+  <chr>           <num:.2!>  <num:.2!> <chr>    
+1 machine_mean[1]      0.03       0.08 -        
+2 machine_mean[2]      0.02       0.07 -        
+3 machine_mean[3]      0.03       0.03 -        
+4 machine_mean[4]      0.03       0.10 -        
+5 machine_mean[5]      0.03       0.02 -        
+6 machine_mean[6]      0.03       0.03 -        

12 Hierarchical binomial model

@@ -2312,17 +2565,141 @@

12 Hierarchical binom 4 Awada 2005 400 1 10 5 Awada 2005 600 7 12 6 Awada 2005 800 1 3 -

Pooled model assumes all studies have the same dose effect

+

Pooled model assumes all studies have the same dose effect (reminder: ~ dose is equivalent to ~ 1 + dose)

fit_pooled <- brm(events | trials(total) ~ dose,
+                  prior = c(prior(student_t(7, 0, 1.5), class='Intercept'),
+                            prior(normal(0, 1), class='b')),
+                  family=binomial(), data=dat.ursino2021)
+

Check the summary of the posterior and convergence

+
fit_pooled
+
 Family: binomial 
+  Links: mu = logit 
+Formula: events | trials(total) ~ dose 
+   Data: dat.ursino2021 (Number of observations: 49) 
+  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
+         total post-warmup draws = 4000
+
+Population-Level Effects: 
+          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
+Intercept    -3.18      0.38    -3.93    -2.44 1.00     1261     1782
+dose          0.00      0.00     0.00     0.01 1.00     2464     2333
+
+Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
+and Tail_ESS are effective sample size measures, and Rhat is the potential
+scale reduction factor on split chains (at convergence, Rhat = 1).
+

Dose coefficient seems to be very small. Looking at the posterior, we see that it is positive with high probability.

+
fit_pooled |>
+  as_draws() |>
+  subset_draws(variable='b_dose') |>
+  summarise_draws(~quantile(.x, probs = c(0.025, 0.975)), ~mcse_quantile(.x, probs = c(0.025, 0.975)))
+
# A tibble: 1 × 5
+  variable  `2.5%` `97.5%` mcse_q2.5 mcse_q97.5
+  <chr>      <dbl>   <dbl>     <dbl>      <dbl>
+1 b_dose   0.00228 0.00525 0.0000358  0.0000365
+

The dose was reported in mg, and most values are in hundreds. It is often sensible to switch to a scale in which the range of values is closer to unit range. In this case it is natural to use g instead of mg.

+
dat.ursino2021 <- dat.ursino2021 |>
+  mutate(doseg = dose/100)
+

Fit the pooled model again uing doseg

+
fit_pooled <- brm(events | trials(total) ~ doseg,
+                  prior = c(prior(student_t(7, 0, 1.5), class='Intercept'),
+                            prior(normal(0, 1), class='b')),
                   family=binomial(), data=dat.ursino2021)
-

Separate model assumes all studies have different dose effect

-
fit_separate <- brm(events | trials(total) ~ dose:study,
-                    family=binomial(), data=dat.ursino2021)
-fit_separate <- update(fit_separate, control=list(init=0.1))
-

Hierarchical model assumes common mean effect and variation round with normal population prior

-
fit_hier <- brm(events | trials(total) ~ dose + (dose | study),
-                family=binomial(), data=dat.ursino2021)
-fit_hier <- update(fit_hier, control=list(adapt_delta=0.99))
+

Check the summary of the posterior and convergence.

+
fit_pooled
+
 Family: binomial 
+  Links: mu = logit 
+Formula: events | trials(total) ~ doseg 
+   Data: dat.ursino2021 (Number of observations: 49) 
+  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
+         total post-warmup draws = 4000
+
+Population-Level Effects: 
+          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
+Intercept    -3.15      0.37    -3.92    -2.46 1.00     2009     2360
+doseg         0.37      0.08     0.22     0.51 1.00     2374     2519
+
+Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
+and Tail_ESS are effective sample size measures, and Rhat is the potential
+scale reduction factor on split chains (at convergence, Rhat = 1).
+

Now it is easier to interpret the presented values. Separate model assumes all studies have different dose effect. It would be a bit complicated to set a different prior on study specific intercepts and other coefficients, so we use the ame prior for all.

+
fit_separate <- brm(events | trials(total) ~ 0 + study + doseg:study,
+                    prior=prior(student_t(7, 0, 1.5), class='b'),
+                    family=binomial(), data=dat.ursino2021)
+

Check the summary of the posterior and convergence.

+
fit_separate
+
 Family: binomial 
+  Links: mu = logit 
+Formula: events | trials(total) ~ 0 + study + doseg:study 
+   Data: dat.ursino2021 (Number of observations: 49) 
+  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
+         total post-warmup draws = 4000
+
+Population-Level Effects: 
+                       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
+studyAwada                -2.64      1.04    -4.92    -0.86 1.00     5046
+studyBorthakurMA          -2.41      1.76    -6.53     0.49 1.00     6379
+studyBorthakurMB          -1.64      1.42    -4.82     0.86 1.00     5944
+studyChen                 -0.80      1.57    -4.14     2.01 1.00     6776
+studyClark                -3.03      1.44    -6.33    -0.76 1.00     5432
+studyCrumpMA              -1.51      1.24    -4.29     0.73 1.00     5751
+studyCrumpMB              -1.88      1.17    -4.44     0.12 1.00     5871
+studyFuruse               -1.68      1.65    -5.41     1.13 1.00     5569
+studyMiller               -1.12      0.81    -2.75     0.37 1.00     6231
+studyMinami               -1.97      1.17    -4.54     0.12 1.00     5900
+studyMoore                -2.33      1.19    -4.91    -0.28 1.00     5140
+studyNabors               -2.92      1.44    -6.20    -0.49 1.00     5632
+studyStrumberg            -1.96      0.85    -3.75    -0.45 1.00     6403
+studyAwada:doseg           0.37      0.20     0.01     0.80 1.00     4815
+studyBorthakurMA:doseg     0.00      0.39    -0.71     0.85 1.00     5594
+studyBorthakurMB:doseg     0.06      0.32    -0.53     0.73 1.00     6176
+studyChen:doseg           -0.65      0.52    -1.68     0.38 1.00     6028
+studyClark:doseg           0.44      0.25     0.01     0.99 1.00     5658
+studyCrumpMA:doseg        -0.31      0.49    -1.27     0.68 1.00     5526
+studyCrumpMB:doseg         0.10      0.29    -0.45     0.67 1.00     5536
+studyFuruse:doseg         -0.53      0.61    -1.77     0.68 1.00     5238
+studyMiller:doseg          0.03      0.28    -0.50     0.57 1.00     6497
+studyMinami:doseg         -0.17      0.36    -0.94     0.49 1.00     5627
+studyMoore:doseg           0.20      0.27    -0.30     0.77 1.00     5079
+studyNabors:doseg          0.31      0.20    -0.04     0.76 1.00     5668
+studyStrumberg:doseg       0.09      0.16    -0.23     0.42 1.00     6673
+                       Tail_ESS
+studyAwada                 2397
+studyBorthakurMA           2218
+studyBorthakurMB           2775
+studyChen                  2492
+studyClark                 2376
+studyCrumpMA               2159
+studyCrumpMB               2693
+studyFuruse                2199
+studyMiller                2690
+studyMinami                2466
+studyMoore                 2732
+studyNabors                2369
+studyStrumberg             2879
+studyAwada:doseg           2153
+studyBorthakurMA:doseg     2410
+studyBorthakurMB:doseg     2712
+studyChen:doseg            2948
+studyClark:doseg           2487
+studyCrumpMA:doseg         2565
+studyCrumpMB:doseg         2904
+studyFuruse:doseg          2360
+studyMiller:doseg          2616
+studyMinami:doseg          2447
+studyMoore:doseg           2561
+studyNabors:doseg          2217
+studyStrumberg:doseg       2602
+
+Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
+and Tail_ESS are effective sample size measures, and Rhat is the potential
+scale reduction factor on split chains (at convergence, Rhat = 1).
+

Hierarchical model assumes common mean effect and variation around with normal population prior (reminder: ~ doseg + (doseg | study) is equivalent to ~ 1 + doseg + (1 + doseg | study))

+
fit_hier <- brm(events | trials(total) ~ doseg + (doseg | study),
+                    prior=c(prior(student_t(7, 0, 1.5), class='Intercept'),
+                            prior(normal(0, 1), class='b')),
+                family=binomial(), data=dat.ursino2021)
+

We seem some divergences and repeat with higher adapt_delta

+
fit_hier <- update(fit_hier, control=list(adapt_delta=0.99))

LOO-CV comparison

loo_compare(loo(fit_pooled), loo(fit_separate), loo(fit_hier))
Warning: Found 15 observations with a pareto_k > 0.7 in model 'fit_separate'.
@@ -2332,49 +2709,70 @@ 

12 Hierarchical binom recommended to set 'moment_match = TRUE' in order to perform moment matching for problematic observations.

             elpd_diff se_diff
-fit_hier      0.0       0.0   
-fit_pooled   -1.4       2.8   
-fit_separate -9.2       3.1   
+fit_hier 0.0 0.0 +fit_pooled -1.3 2.9 +fit_separate -23.8 5.3

We get warnings about Pareto k’s > 0.7 in PSIS-LOO for separate model, but as in that case the LOO-CV estimate is usually overoptimistic and the separate model is the worst, there is no need to use more accurate computation.

Hierarchical model has better elpd than the pooled, but difference is negligible. However, when we look at the study specific parameters, we see that the Miller study has higher intercept (more events).

mcmc_areas(as_draws_df(fit_hier), regex_pars='r_study\\[.*Intercept')
-

+

There are no differences in slopes.

-
mcmc_areas(as_draws_df(fit_hier), regex_pars='r_study\\[.*dose')
-

-

The coefficient for the dose is clearly larger than 0

-
mcmc_areas(as_draws_df(fit_hier), regex_pars='b_dose') +
+
mcmc_areas(as_draws_df(fit_hier), regex_pars='r_study\\[.*doseg')
+

+

The population level coefficient for the dose is clearly larger than 0

+
mcmc_areas(as_draws_df(fit_hier), regex_pars='b_doseg') +
   geom_vline(xintercept=0, linetype='dashed') +
-  xlim(c(0,0.01))
+ xlim(c(0,1))
Warning: Removed 1 rows containing missing values (`geom_segment()`).
-

-

The posterior for the probability of event given certain dose and a new study

+

+

Make prior sensitivity analysis by powerscaling both prior and likelihood.

+
powerscale_sensitivity(fit_hier, variable='b_doseg'
+                       )$sensitivity |>
+                         mutate(across(where(is.double),  ~num(.x, digits=2)))
+
# A tibble: 1 × 4
+  variable     prior likelihood diagnosis
+  <chr>    <num:.2!>  <num:.2!> <chr>    
+1 b_doseg       0.03       0.08 -        
+

The posterior for the probability of event given certain dose and a new study.

data.frame(study='new',
-           dose=seq(100,1000,by=100),
+           doseg=seq(0.1,1,by=0.1),
            total=1) |>
   add_linpred_draws(fit_hier, transform=TRUE, allow_new_levels=TRUE) |>
-  ggplot(aes(x=dose, y=.linpred)) +
+  ggplot(aes(x=doseg, y=.linpred)) +
   stat_lineribbon(.width = c(.95), alpha = 1/2, color=brewer.pal(5, "Blues")[[5]]) +
   scale_fill_brewer()+
-  labs(x= "Dose", y = 'Probability of event') +
+  labs(x= "Dose (g)", y = 'Probability of event') +
+  theme(legend.position="none") +
+  geom_hline(yintercept=0) +
+  scale_x_continuous(breaks=seq(0.1,1,by=0.1))
+

+

If plot individual posterior draws, we see that there is a lot of uncertainty about the overall probability (explained by the variation in Intercept in different studies), but less uncertainty about the slope.

+
data.frame(study='new',
+           doseg=seq(0.1,1,by=0.1),
+           total=1) |>
+  add_linpred_draws(fit_hier, transform=TRUE, allow_new_levels=TRUE, ndraws=100) |>
+  ggplot(aes(x=doseg, y=.linpred)) +
+  geom_line(aes(group=.draw), alpha = 1/2, color = brewer.pal(5, "Blues")[[3]])+
+  scale_fill_brewer()+
+  labs(x= "Dose (g)", y = 'Probability of event') +
   theme(legend.position="none") +
   geom_hline(yintercept=0) +
-  scale_x_continuous(breaks=seq(100,1000,by=100))
-

-

Posterior predictive checking

+ scale_x_continuous(breaks=seq(0.1,1,by=0.1)) +

+

Posterior predictive checking showing the observed and predicted number of events.

pp_check(fit_hier, type = "ribbon_grouped", group="study")
-

+


Licenses

    -
  • Code © 2017-2023, Aki Vehtari, licensed under BSD-3.
  • -
  • Text © 2017-2023, Aki Vehtari, licensed under CC-BY-NC 4.0.
  • +
  • Code © 2017-2024, Aki Vehtari, licensed under BSD-3.
  • +
  • Text © 2017-2024, Aki Vehtari, licensed under CC-BY-NC 4.0.
-
---
title: "Bayesian data analysis - BRMS demos"
author: "Aki Vehtari"
date: "First version 2023-12-05. Last modified `r format(Sys.Date())`."
output:
  html_document:
    fig_caption: yes
    toc: TRUE
    toc_depth: 2
    number_sections: TRUE
    toc_float:
      smooth_scroll: FALSE
    theme: readable
    code_download: true
---
# Setup  {.unnumbered}

```{r setup, include=FALSE}
knitr::opts_chunk$set(cache=FALSE, message=FALSE, error=FALSE, warning=TRUE, comment=NA, out.width='95%')
```

**Load packages**

```{r}
library(tidyr)
library(dplyr)
library(brms)
options(brms.backend = "cmdstanr", mc.cores = 2)
library(posterior)
options(pillar.negative = FALSE)
library(loo)
library(ggplot2)
library(bayesplot)
theme_set(bayesplot::theme_default(base_family = "sans"))
library(tidybayes)
library(ggdist)
library(patchwork)
library(RColorBrewer)
SEED <- 48927 # set random seed for reproducability
```

# Introduction

This notebook contains several examples of how to use [Stan](https://mc-stan.org) in R with [__brms__](https://paul-buerkner.github.io/brms/). This notebook assumes basic knowledge of Bayesian inference and MCMC. The examples are related to [Bayesian data analysis course](https://avehtari.github.io/BDA_course_Aalto/).

# Bernoulli model

Toy data with sequence of failures (0) and successes (1). We would
like to learn about the unknown probability of success.

```{r}
data_bern <- data.frame(y = c(1, 1, 1, 0, 1, 1, 1, 0, 1, 0))
```

brms uses by default student_t(3, 0, 2.5), bu we can assign uniform
prior (beta(1,1)). 

```{r results='hide'}
fit_bern <- brm(y ~ 1, family = bernoulli(), data = data_bern,
                prior = prior("", class='Intercept'),
                seed = SEED, refresh = 0)
```

Check the summary of the posterior and convergence

```{r}
fit_bern
```

Extract the posterior draws

```{r}
draws <- as_draws_df(fit_bern)
```

We can get summary information using summarise_draws()

```{r}
draws |>
  subset_draws(variable='b_Intercept') |>
  summarise_draws()
```

We can compute the probability of success by using plogis which is
equal to inverse-logit function

```{r}
draws <- draws |>
  mutate_variables(theta=plogis(b_Intercept))
```

Summary of theta by using summarise_draws()

```{r}
draws |>
  subset_draws(variable='theta') |>
  summarise_draws()
```

Histogram of theta

```{r}
mcmc_hist(draws, pars='theta') +
  xlab('theta') +
  xlim(c(0,1))
```

We next compare the result to using normal(0, 1) prior on logit
probability. Visualize the prior by drawing samples from it

```{r}
prior_mean <- 0
prior_sd <- 1
prior_draws <- data.frame(
                 theta = plogis(rnorm(20000, prior_mean, prior_sd)))
mcmc_hist(prior_draws) +
  xlim(c(0,1))

```
```{r results='hide'}
fit_bern <- brm(y ~ 1, family = bernoulli(), data = data_bern,
                prior = prior(normal(0, 1), class='Intercept'),
                seed = SEED, refresh = 0)
```

Check the summary of the posterior and convergence

```{r}
fit_bern
```

We can examine the latent parameter

```{r}
draws <- as_draws_df(fit_bern)
draws |>
  subset_draws(variable='b_Intercept') |>
  summarise_draws()
```

We can compute the probability of success by using plogis which is
equal to inverse-logit function

```{r}
draws <- draws |>
  mutate_variables(theta=plogis(b_Intercept))
```

Summary of theta by using summarise_draws()

```{r}
draws |>
  subset_draws(variable='theta') |>
  summarise_draws()
```

Histogram of theta

```{r}
mcmc_hist(draws, pars='theta') +
  xlab('theta') +
  xlim(c(0,1))
```

As the number of observations is small, there is small change in
the posterior mean when the prior is changed. You can experiment
with different priors and varying the number of observations.


# Binomial model

Instead of sequence of 0's and 1's, we can summarize the data with
the number of trials and the number successes and use Binomial
model. The prior is specified in the 'latent space'. The actual
probability of success, theta = plogis(alpha), where plogis is the
inverse of the logistic function.

Visualize the prior by drawing samples from it

```{r}
prior_mean <- 0
prior_sd <- 1
prior_draws <- data.frame(theta = plogis(rnorm(20000, prior_mean, prior_sd)))
mcmc_hist(prior_draws)
```

Binomial model with the same data

```{r}
data_bin <- data.frame(N = c(10), y = c(7))
```
```{r results='hide'}
fit_bin <- brm(y | trials(N) ~ 1, family = binomial(), data = data_bin,
               prior = prior(normal(0,1), class='Intercept'),
               seed = SEED, refresh = 0)
```

Check the summary of the posterior and convergence

```{r}
fit_bin
```

Extract the posterior draws

```{r}
draws <- as_draws_df(fit_bin)
```

We can get summary information using summarise_draws()

```{r}
draws |>
  subset_draws(variable='b_Intercept') |>
  summarise_draws()
```

We can compute the probability of success by using plogis which is
equal to inverse-logit function

```{r}
draws <- draws |>
  mutate_variables(theta=plogis(b_Intercept))
```

Summary of theta by using summarise_draws()

```{r}
draws |>
  subset_draws(variable='theta') |>
  summarise_draws()
```

Histogram of theta

```{r}
mcmc_hist(draws, pars='theta') +
  xlab('theta') +
  xlim(c(0,1))
```

Re-run the model with a new data dataset without recompiling

```{r}
data_bin <- data.frame(N = c(5), y = c(4))
fit_bin <- update(fit_bin, newdata = data_bin)
```

Check the summary of the posterior and convergence

```{r}
fit_bin
```

Extract the posterior draws

```{r}
draws <- as_draws_df(fit_bin)
```

We can get summary information using summarise_draws()

```{r}
draws |>
  subset_draws(variable='b_Intercept') |>
  summarise_draws()
```

We can compute the probability of success by using plogis which is
equal to inverse-logit function

```{r}
draws <- draws |>
  mutate_variables(theta=plogis(b_Intercept))
```

Summary of theta by using summarise_draws()

```{r}
draws |>
  subset_draws(variable='theta') |>
  summarise_draws()
```

Histogram of theta

```{r}
mcmc_hist(draws, pars='theta') +
  xlab('theta') +
  xlim(c(0,1))
```

# Comparison of two groups with Binomial 

An experiment was performed to estimate the effect of beta-blockers
on mortality of cardiac patients. A group of patients were randomly
assigned to treatment and control groups:

- out of 674 patients receiving the control, 39 died
- out of 680 receiving the treatment, 22 died

Data, where `grp2` is an indicator variable defined as a factor
type, which is useful for categorical variables.

```{r}
data_bin2 <- data.frame(N = c(674, 680), y = c(39,22), grp2 = factor(c('control','treatment')))
```

To analyse whether the treatment is useful, we can use Binomial
model for both groups and compute odds-ratio.

```{r}
fit_bin2 <- brm(y | trials(N) ~ grp2, family = binomial(), data = data_bin2,
                prior = prior(normal(0,1), class='Intercept'),
                seed = SEED, refresh = 0)
```

Check the summary of the posterior and convergence. brms is using
the first factor level `control` as the baseline and thus reports
the coefficient (population-level effect) for `treatment` (shown s
`grp2treatment`)

```{r}
fit_bin2
```

Compute theta for each group and the odds-ratio

```{r}
draws_bin2 <- as_draws_df(fit_bin2) |>
  mutate(theta_control = plogis(b_Intercept),
         theta_treatment = plogis(b_Intercept + b_grp2treatment),
         oddsratio = (theta_treatment/(1-theta_treatment))/(theta_control/(1-theta_control)))
```

Plot oddsratio

```{r}
mcmc_hist(draws_bin2, pars='oddsratio') +
  scale_x_continuous(breaks=seq(0.2,1.6,by=0.2))+
  geom_vline(xintercept=1, linetype='dashed')
```

Probability that the oddsratio<1

```{r}
draws_bin2 |>
  mutate(poddsratio = oddsratio<1) |>
  subset(variable='poddsratio') |>
  summarise_draws(mean, mcse_mean)
```

oddratio 95% posterior interval

```{r}
draws_bin2 |>
  subset(variable='oddsratio') |>
  summarise_draws(~quantile(.x, probs = c(0.025, 0.975)), ~mcse_quantile(.x, probs = c(0.025, 0.975)))
```

# Linear Gaussian model

Use the Kilpisjärvi summer month temperatures 1952--2022 data from `aaltobda` package

```{r}
load(url('https://github.com/avehtari/BDA_course_Aalto/raw/master/rpackage/data/kilpisjarvi2022.rda'))
data_lin <- data.frame(year = kilpisjarvi2022$year,
                       temp = kilpisjarvi2022$temp.summer)
```

Plot the data

```{r}
data_lin |>
  ggplot(aes(year, temp)) +
  geom_point(color=2) +
  labs(x= "Year", y = 'Summer temp. @Kilpisjärvi') +
  guides(linetype = "none")
```

To analyse has there been change in the average summer month
temperature we use a linear model with Gaussian model for the
unexplained variation. By default brms uses uniform prior for the
coefficients.

`temp ~ year` means temp depends on the intercept and `temp`.
The model could also be defined as `temp ~ 1 + year` which explicitly shows the
intercept part. The corresponding regression model is
temp ~ normal(b_Intercept*1 + b_year*year, sigma)

```{r}
fit_lin <- brm(temp ~ year, data = data_lin, family = gaussian(),
               seed = SEED, refresh = 0)
```

We can check the all the priors used. In general it is good to use
proper priors, but sometimes flat priors are fine and produce
proper posterior.

```{r}
prior_summary(fit_lin)
```

Check the summary of the posterior and convergence

```{r}
fit_lin
```

Extract the posterior draws and check the summaries

```{r}
draws_lin <- as_draws_df(fit_lin) 
draws_lin |> summarise_draws()
```

If one of the columns is hidden we can force printing all columns

```{r}
draws_lin |> summarise_draws() |> print(width=Inf)
```

Histogram of b_year

```{r}
draws_lin |>
  mcmc_hist(pars='b_year') +
  xlab('Average temperature increase per year')
```

Probability that the coefficient b_year > 0 and the corresponding MCSE

```{r}
draws_lin |>
  mutate(I_b_year_gt_0 = b_year>0) |>
  subset_draws(variable='I_b_year_gt_0') |>
  summarise_draws(mean, mcse_mean)
```

95% posterior interval for temperature increase per 100 years

```{r}
draws_lin |>
  mutate(b_year_100 = b_year*100) |>
  subset_draws(variable='b_year_100') |>
  summarise_draws(~quantile(.x, probs = c(0.025, 0.975)),
                  ~mcse_quantile(.x, probs = c(0.025, 0.975)),
                  .num_args = list(digits = 2, notation = "dec"))
```

Plot posterior draws of the linear function values at each year.
`add_linpred_draws()` takes the years from the data and uses `fit_lin` to make
the predictions.

```{r}
data_lin |>
  add_linpred_draws(fit_lin) |>
  # plot data
  ggplot(aes(x=year, y=temp)) +
  geom_point(color=2) +
  # plot lineribbon for the linear model
  stat_lineribbon(aes(y = .linpred), .width = c(.95), alpha = 1/2, color=brewer.pal(5, "Blues")[[5]]) +
  # decoration
  scale_fill_brewer()+
  labs(x= "Year", y = 'Summer temp. @Kilpisjärvi') +
  theme(legend.position="none")+
  scale_x_continuous(breaks=seq(1950,2020,by=10))
```

Alternativelly plot a spaghetti plot for 100 draws

```{r}
data_lin |>
  add_linpred_draws(fit_lin, ndraws=100) |>
  # plot data
  ggplot(aes(x=year, y=temp)) +
  geom_point(color=2) +
  # plot a line for each posterior draw
  geom_line(aes(y=.linpred, group=.draw), alpha = 1/2, color = brewer.pal(5, "Blues")[[3]])+
  # decoration
  scale_fill_brewer()+
  labs(x= "Year", y = 'Summer temp. @Kilpisjärvi') +
  theme(legend.position="none")+
  scale_x_continuous(breaks=seq(1950,2020,by=10))
```

Plot posterior predictive distribution at each year until 2030
`add_predicted_draws()` takes the years from the data and uses
`fit_lin` to make the predictions.

```{r}
data_lin |>
  add_row(year=2023:2030) |>
  add_predicted_draws(fit_lin) |>
  # plot data
  ggplot(aes(x=year, y=temp)) +
  geom_point(color=2) +
  # plot lineribbon for the linear model
  stat_lineribbon(aes(y = .prediction), .width = c(.95), alpha = 1/2, color=brewer.pal(5, "Blues")[[5]]) +
  # decoration
  scale_fill_brewer()+
  labs(x= "Year", y = 'Summer temp. @Kilpisjärvi') +
  theme(legend.position="none")+
  scale_x_continuous(breaks=seq(1950,2030,by=10))
```

# Linear Student's t model

The temperatures used in the above analyses are averages over three
months, which makes it more likely that they are normally
distributed, but there can be extreme events in the feather and we
can check whether more robust Student's t observation model woul
give different results.


```{r results='hide'}
fit_lin_t <- brm(temp ~ year, data = data_lin, family = student(),
                 seed = SEED, refresh = 0)
```

Check the summary of the posterior and convergence. The b_year
posterior looks similar as before and the posterior for degrees of
freedom `nu` has most of the posterior mas for quite large values
indicating there is no strong support for thick tailed variation in
temperature.

```{r}
fit_lin_t
```

# Pareto-smoothed importance-sampling leave-one-out cross-validation (PSIS-LOO)

We can use leave-one-out cross-validation to compare the expected predictive performance.

LOO comparison shows normal and Student's t model have similar performance.

```{r}
loo_compare(loo(fit_lin), loo(fit_lin_t))
```

# Heteroskedastic linear model

Heteroskedasticity assumes that the variation around the linear
mean can also vary. We can allow sigma to depend on year, too.
Although the additional component is written as `sigma ~ year`, the
log link function is used and the model is for log(sigma). `bf()` allows
listing several formulas.


```{r results='hide'}
fit_lin_h <- brm(bf(temp ~ year,
                    sigma ~ year),
                 data = data_lin, family = gaussian(),
                 seed = SEED, refresh = 0)
```

Check the summary of the posterior and convergence. The b_year
posterior looks similar as before. The posterior for sigma_year
looks like having mosst of the ma for negative values, indicating
decrease in temperature variation around the mean.

```{r}
fit_lin_h
```

Histogram of b_year and b_sigma_year

```{r}
as_draws_df(fit_lin_h) |>
  mcmc_areas(pars=c('b_year', 'b_sigma_year'))
```

As log(x) is almost linear when x is close to zero, we can see that the
sigma is decreasing about 1% per year (95% interval from 0% to 2%).

Plot posterior predictive distribution at each year until 2030
`add_predicted_draws()` takes the years from the data and uses
`fit_lin_h` to make the predictions.

```{r}
data_lin |>
  add_row(year=2023:2030) |>
  add_predicted_draws(fit_lin_h) |>
  # plot data
  ggplot(aes(x=year, y=temp)) +
  geom_point(color=2) +
  # plot lineribbon for the linear model
  stat_lineribbon(aes(y = .prediction), .width = c(.95), alpha = 1/2, color=brewer.pal(5, "Blues")[[5]]) +
  # decoration
  scale_fill_brewer()+
  labs(x= "Year", y = 'Summer temp. @Kilpisjärvi') +
  theme(legend.position="none")+
  scale_x_continuous(breaks=seq(1950,2030,by=10))
```

We can use leave-one-out cross-validation to compare the expected predictive performance.

LOO comparison shows homoskedastic normal and heteroskedastic
normal models have similar performances.

```{r}
loo_compare(loo(fit_lin), loo(fit_lin_h))
```

# Heteroskedastic non-linear model

We can test the linearity assumption by using non-linear spline
functions, by uing `s(year)` terms. Sampling is slower as the
posterior gets more complex.


```{r results='hide'}
fit_lin_hs <- brm(bf(temp ~ s(year),
                     sigma ~ s(year)),
                  data = data_lin, family = gaussian(),
                  seed = SEED, refresh = 0)
```

We get warnings about divergences, and try rerunning with higher
adapt_delta, which leads to using smaller step sizes. Often
`adapt_delta=0.999` leads to very slow sampling, but with this
small data, this is not an issue.

```{r}
fit_lin_hs <- update(fit_lin_hs, control = list(adapt_delta=0.999))
```

Check the summary of the posterior and convergence. The b_year
posterior looks similar as before. The posterior for sigma_year
looks like having mosst of the ma for negative values, indicating
decrease in temperature variation around the mean.

```{r}
fit_lin_hs
```

Plot posterior predictive distribution at each year until 2030
`add_predicted_draws()` takes the years from the data and uses
`fit_lin_h` to make the predictions.

```{r}
data_lin |>
  add_row(year=2023:2030) |>
  add_predicted_draws(fit_lin_hs) |>
  # plot data
  ggplot(aes(x=year, y=temp)) +
  geom_point(color=2) +
  # plot lineribbon for the linear model
  stat_lineribbon(aes(y = .prediction), .width = c(.95), alpha = 1/2, color=brewer.pal(5, "Blues")[[5]]) +
  # decoration
  scale_fill_brewer()+
  labs(x= "Year", y = 'Summer temp. @Kilpisjärvi') +
  theme(legend.position="none")+
  scale_x_continuous(breaks=seq(1950,2030,by=10))
```

We can use leave-one-out cross-validation to compare the expected predictive performance.

LOO comparison shows homoskedastic normal linear and
heteroskedastic normal spline models have similar
performances. There are not enough observations to make clear
difference between the models.

```{r}
loo_compare(loo(fit_lin), loo(fit_lin_hs))
```


# Comparison of k groups with hierarchical normal models

Load factory data, which contain 5 quality measurements for each of
6 machines. We're interested in analying are the quality differences
between the machines.

```{r}
factory <- read.table(url('https://raw.githubusercontent.com/avehtari/BDA_course_Aalto/master/rpackage/data-raw/factory.txt'))
colnames(factory) <- 1:6
factory
```

We pivot the data to long format

```{r}
factory <- factory |>
  pivot_longer(cols = everything(),
               names_to = 'machine',
               values_to = 'quality')
factory
```

## Pooled model

As comparison make also pooled model

```{r}
fit_pooled <- brm(quality ~ 1, data = factory, refresh=0)
```

## Separate model

As comparison make also seprate model. To make it completely
separate we need to have different sigma for each machine, too.

```{r}
fit_separate <- brm(bf(quality ~ machine,
                       sigma ~ machine),
                    data = factory, refresh=0)
```

# Common variance hierarchical model (ANOVA)


```{r}
fit_hier <- brm(quality ~ 1 + (1 | machine),
                data = factory, refresh = 0)
```

Check the summary of the posterior and convergence.

```{r}
fit_hier
```

LOO comparison shows the hierarchical model is the best

```{r}
loo_compare(loo(fit_pooled), loo(fit_separate), loo(fit_hier))
```

Distributions of quality differences from the mean quality

```{r}
mcmc_areas(as_draws_df(fit_hier), regex_pars='r_machine')
```

Posterior predictive distributions for 6 old and 1 new machines

```{r}
posterior_predict(fit_hier, newdata=data.frame(machine=1:7, quality=rep(NA,7)),
                  allow_new_levels=TRUE) |>
  as_draws_df() |>
  mcmc_areas()
```


# Hierarchical binomial model

[Sorafenib Toxicity Dataset in `metadat` R package](https://wviechtb.github.io/metadat/reference/dat.ursino2021.html)
includes results frm 13 studies investigating the occurrence of
dose limiting toxicities (DLTs) at different doses of Sorafenib.

Load data

```{r}
load(url('https://github.com/wviechtb/metadat/raw/master/data/dat.ursino2021.rda'))
head(dat.ursino2021)
```

Pooled model assumes all studies have the same dose effect

```{r}
fit_pooled <- brm(events | trials(total) ~ dose,
                  family=binomial(), data=dat.ursino2021)
```

Separate model assumes all studies have different dose effect

```{r}
fit_separate <- brm(events | trials(total) ~ dose:study,
                    family=binomial(), data=dat.ursino2021)
fit_separate <- update(fit_separate, control=list(init=0.1))
```

Hierarchical model assumes common mean effect and variation round with normal population prior

```{r}
fit_hier <- brm(events | trials(total) ~ dose + (dose | study),
                family=binomial(), data=dat.ursino2021)
fit_hier <- update(fit_hier, control=list(adapt_delta=0.99))
```

LOO-CV comparison

```{r}
loo_compare(loo(fit_pooled), loo(fit_separate), loo(fit_hier))
```

We get warnings about Pareto k's > 0.7 in PSIS-LOO for separate
model, but as in that case the LOO-CV estimate is usually
overoptimistic and the separate model is the worst, there is no
need to use more accurate computation.

Hierarchical model has better elpd than the pooled, but difference
is negligible. However, when we look at the study specific
parameters, we see that the Miller study has higher intercept (more
events).

```{r}
mcmc_areas(as_draws_df(fit_hier), regex_pars='r_study\\[.*Intercept')
```


There are no differences in slopes.

```{r}
mcmc_areas(as_draws_df(fit_hier), regex_pars='r_study\\[.*dose')
```


The coefficient for the dose is clearly larger than 0

```{r}
mcmc_areas(as_draws_df(fit_hier), regex_pars='b_dose') +
  geom_vline(xintercept=0, linetype='dashed') +
  xlim(c(0,0.01))
```

The posterior for the probability of event given certain dose and a new study

```{r}
data.frame(study='new',
           dose=seq(100,1000,by=100),
           total=1) |>
  add_linpred_draws(fit_hier, transform=TRUE, allow_new_levels=TRUE) |>
  ggplot(aes(x=dose, y=.linpred)) +
  stat_lineribbon(.width = c(.95), alpha = 1/2, color=brewer.pal(5, "Blues")[[5]]) +
  scale_fill_brewer()+
  labs(x= "Dose", y = 'Probability of event') +
  theme(legend.position="none") +
  geom_hline(yintercept=0) +
  scale_x_continuous(breaks=seq(100,1000,by=100))
```

Posterior predictive checking

```{r}
pp_check(fit_hier, type = "ribbon_grouped", group="study")
```

<br />

# Licenses {.unnumbered}

* Code &copy; 2017-2023, Aki Vehtari, licensed under BSD-3.
* Text &copy; 2017-2023, Aki Vehtari, licensed under CC-BY-NC 4.0.

+
---
title: "Bayesian data analysis - BRMS demos"
author: "Aki Vehtari"
date: "First version 2023-12-05. Last modified `r format(Sys.Date())`."
output:
  html_document:
    fig_caption: yes
    toc: TRUE
    toc_depth: 2
    number_sections: TRUE
    toc_float:
      smooth_scroll: FALSE
    theme: readable
    code_download: true
---
# Setup  {.unnumbered}

```{r setup, include=FALSE}
knitr::opts_chunk$set(cache=FALSE, message=FALSE, error=FALSE, warning=TRUE, comment=NA, out.width='95%')
```

**Load packages**

```{r}
library(tidyr)
library(dplyr)
library(tibble)
library(pillar)
library(stringr)
library(brms)
options(brms.backend = "cmdstanr", mc.cores = 2)
library(posterior)
options(pillar.negative = FALSE)
library(loo)
library(priorsense)
library(ggplot2)
library(bayesplot)
theme_set(bayesplot::theme_default(base_family = "sans"))
library(tidybayes)
library(ggdist)
library(patchwork)
library(RColorBrewer)
SEED <- 48927 # set random seed for reproducability
```

# Introduction

This notebook contains several examples of how to use [Stan](https://mc-stan.org) in R with [__brms__](https://paul-buerkner.github.io/brms/). This notebook assumes basic knowledge of Bayesian inference and MCMC. The examples are related to [Bayesian data analysis course](https://avehtari.github.io/BDA_course_Aalto/).

# Bernoulli model

Toy data with sequence of failures (0) and successes (1). We would
like to learn about the unknown probability of success.

```{r}
data_bern <- data.frame(y = c(1, 1, 1, 0, 1, 1, 1, 0, 1, 0))
```

As usual in case of generalizd linear models, (GLMs) brms defines
the priors on the latent model parameters. With Bernoulli the
default link function is logit, and thus the prior is set on
logit(theta). As there are no covariates logit(theta)=Intercept.
The brms default prior for Intercept is student_t(3, 0, 2.5), but
we use student_t(7, 0, 1.5) which is close to logistic
distribution, and thus makes the prior near-uniform for theta.
We can simulate from these priors to check the implied prior on theta.
We next compare the result to using normal(0, 1) prior on logit
probability. We visualize the implied priors by sampling from the priors.

```{r}
data.frame(theta = plogis(ggdist::rstudent_t(n=20000, df=3, mu=0, sigma=2.5))) |>
  mcmc_hist() +
  xlim(c(0,1)) +
  labs(title='Default brms student_t(3, 0, 2.5) prior on Intercept')
data.frame(theta = plogis(ggdist::rstudent_t(n=20000, df=7, mu=0, sigma=1.5))) |>
  mcmc_hist() +
  xlim(c(0,1)) +
  labs(title='student_t(7, 0, 1.5) prior on Intercept')
```

Almost uniform prior on theta could be obtained also with normal(0,1.5)

```{r}
data.frame(theta = plogis(rnorm(n=20000, mean=0, sd=1.5))) |>
  mcmc_hist() +
  xlim(c(0,1)) +
  labs(title='normal(0, 1.5) prior on Intercept')
```

Formula `y ~ 1` corresponds to a model $\mathrm{logit}(\theta) =

```{r}
#\alpha\times 1 = \alpha$. `brms? denotes the $\alpha$ as `Intercept`.
```
```{r results='hide'}
fit_bern <- brm(y ~ 1, family = bernoulli(), data = data_bern,
                prior = prior(student_t(7, 0, 1.5), class='Intercept'),
                seed = SEED, refresh = 0)
```

Check the summary of the posterior and convergence

```{r}
fit_bern
```

Extract the posterior draws

```{r}
draws <- as_draws_df(fit_bern)
```

We can get summary information using summarise_draws()

```{r}
draws |>
  subset_draws(variable='b_Intercept') |>
  summarise_draws()
```

We can compute the probability of success by using plogis which is
equal to inverse-logit function

```{r}
draws <- draws |>
  mutate_variables(theta=plogis(b_Intercept))
```

Summary of theta by using summarise_draws()

```{r}
draws |>
  subset_draws(variable='theta') |>
  summarise_draws()
```

Histogram of theta

```{r}
mcmc_hist(draws, pars='theta') +
  xlab('theta') +
  xlim(c(0,1))
```

Make prior sensitivity analysis by powerscaling both prior and
likelihood. Focus on theta which is the quantity of interest.

```{r}
theta <- draws |>
  subset_draws(variable='theta')
powerscale_sensitivity(fit_bern, prediction = \(x, ...) theta, num_args=list(digits=2)
                       )$sensitivity |>
                         filter(variable=='theta') |>
                         mutate(across(where(is.double),  ~num(.x, digits=2)))
```


# Binomial model

Instead of sequence of 0's and 1's, we can summarize the data with
the number of trials and the number successes and use Binomial
model. The prior is specified in the 'latent space'. The actual
probability of success, theta = plogis(alpha), where plogis is the
inverse of the logistic function.

Binomial model with the same data and prior

```{r}
data_bin <- data.frame(N = c(10), y = c(7))
```

Formula `y | trials(N) ~ 1` corresponds to a model
$\mathrm{logit}(\theta) = \alpha$, and the number of trials for
each observation is provided by `| trials(N)`

```{r results='hide'}
fit_bin <- brm(y | trials(N) ~ 1, family = binomial(), data = data_bin,
               prior = prior(student_t(7, 0,1.5), class='Intercept'),
               seed = SEED, refresh = 0)
```

Check the summary of the posterior and convergence

```{r}
fit_bin
```

The diagnostic indicates prior-data conflict, that is, both prior
and likelihood are informative. If there is true strong prior
information that would justify the normal(0,1) prior, then this is
fine, but otherwise more thinking is required (goal is not adjust
prior to remove diagnostic warnings withoyt thinking). In this toy
example, we proceed with this prior.

Extract the posterior draws

```{r}
draws <- as_draws_df(fit_bin)
```

We can get summary information using summarise_draws()

```{r}
draws |>
  subset_draws(variable='b_Intercept') |>
  summarise_draws()
```

We can compute the probability of success by using plogis which is
equal to inverse-logit function

```{r}
draws <- draws |>
  mutate_variables(theta=plogis(b_Intercept))
```

Summary of theta by using summarise_draws()

```{r}
draws |>
  subset_draws(variable='theta') |>
  summarise_draws()
```

Histogram of theta

```{r}
mcmc_hist(draws, pars='theta') +
  xlab('theta') +
  xlim(c(0,1))
```

Re-run the model with a new data dataset without recompiling

```{r}
data_bin <- data.frame(N = c(5), y = c(4))
fit_bin <- update(fit_bin, newdata = data_bin)
```

Check the summary of the posterior and convergence

```{r}
fit_bin
```

Extract the posterior draws

```{r}
draws <- as_draws_df(fit_bin)
```

We can get summary information using summarise_draws()

```{r}
draws |>
  subset_draws(variable='b_Intercept') |>
  summarise_draws()
```

We can compute the probability of success by using plogis which is
equal to inverse-logit function

```{r}
draws <- draws |>
  mutate_variables(theta=plogis(b_Intercept))
```

Summary of theta by using summarise_draws()

```{r}
draws |>
  subset_draws(variable='theta') |>
  summarise_draws()
```

Histogram of theta

```{r}
mcmc_hist(draws, pars='theta') +
  xlab('theta') +
  xlim(c(0,1))
```

# Comparison of two groups with Binomial 

An experiment was performed to estimate the effect of beta-blockers
on mortality of cardiac patients. A group of patients were randomly
assigned to treatment and control groups:

- out of 674 patients receiving the control, 39 died
- out of 680 receiving the treatment, 22 died

Data, where `grp2` is an indicator variable defined as a factor
type, which is useful for categorical variables.

```{r}
data_bin2 <- data.frame(N = c(674, 680), y = c(39,22), grp2 = factor(c('control','treatment')))
```

To analyse whether the treatment is useful, we can use Binomial
model for both groups and compute odds-ratio. To recreate the model
as two independent (separate) binomial models, we use formula `y |
trials(N) ~ 0 + grp2`, which corresponds to a model
$\mathrm{logit}(\theta) = \alpha \times 0 +
\beta_\mathrm{control}\times x_\mathrm{control} +
\beta_\mathrm{treatment}\times x_\mathrm{treatment} =
\beta_\mathrm{control}\times x_\mathrm{control} +
\beta_\mathrm{treatment}\times x_\mathrm{treatment}$, where
$x_\mathrm{control}$ is a vector with 1 for control and 0 for
treatment, and $x_\mathrm{treatemnt}$ is a vector with 1 for
treatemnt and 0 for control. As only of the vectors have 1, this
corresponds to separate models
$\mathrm{logit}(\theta_\mathrm{control}) = \beta_\mathrm{control}$
and $\mathrm{logit}(\theta_\mathrm{treatment}) =
\beta_\mathrm{treatment}$.  We can provide the same prior for all
$\beta$'s by setting the prior with `class='b'`. With prior
`student_t(7, 0,1.5)`, both $\beta$'s are shrunk towards 0, but
independently.

```{r}
fit_bin2 <- brm(y | trials(N) ~ 0 + grp2, family = binomial(), data = data_bin2,
                prior = prior(student_t(7, 0,1.5), class='b'),
                seed = SEED, refresh = 0)
```

Check the summary of the posterior and convergence. brms is using
the first factor level `control` as the baseline and thus reports
the coefficient (population-level effect) for `treatment` (shown s
`grp2treatment`)
Check the summary of the posterior and convergence. With `~ 0 +
grp2` there is no `Intercept` and \beta_\mathrm{control} and
\beta_\mathrm{treatment} are presented as `grp2control` and
`grp2treatment`.

```{r}
fit_bin2
```

Compute theta for each group and the odds-ratio. `brms` uses
bariable names `b_grp2control` and `b_grp2treatment` for
$\beta_\mathrm{control}$ and $\beta_\mathrm{treatment}$
respectively.

```{r}
draws_bin2 <- as_draws_df(fit_bin2) |>
  mutate(theta_control = plogis(b_grp2control),
         theta_treatment = plogis(b_grp2treatment),
         oddsratio = (theta_treatment/(1-theta_treatment))/(theta_control/(1-theta_control)))
```

Plot oddsratio

```{r}
mcmc_hist(draws_bin2, pars='oddsratio') +
  scale_x_continuous(breaks=seq(0.2,1.6,by=0.2))+
  geom_vline(xintercept=1, linetype='dashed')
```

Probability that the oddsratio<1

```{r}
draws_bin2 |>
  mutate(poddsratio = oddsratio<1) |>
  subset(variable='poddsratio') |>
  summarise_draws(mean, mcse_mean)
```

oddsratio 95% posterior interval

```{r}
draws_bin2 |>
  subset(variable='oddsratio') |>
  summarise_draws(~quantile(.x, probs = c(0.025, 0.975)), ~mcse_quantile(.x, probs = c(0.025, 0.975)))
```

Make prior sensitivity analysis by powerscaling both prior and
likelihood.  Focus on oddsratio which is the quantity of
interest. We see that the likelihood is much more informative than
the prior, and we would expect to see a different posterior only
with a highly informative prior (possibly based on previous similar
experiments).

```{r}
oddsratio <- draws_bin2 |>
  subset_draws(variable='oddsratio')
powerscale_sensitivity(fit_bin2, prediction = \(x, ...) oddsratio, num_args=list(digits=2)
                       )$sensitivity |>
                         filter(variable=='oddsratio') |>
                         mutate(across(where(is.double),  ~num(.x, digits=2)))
```

Above we used formula `y | trials(N) ~ 0 + grp2` to have separate
model for control and treatment group. An alternative model `y |
trials(N) ~ grp2` which is equal to `y | trials(N) ~ 1 + grp2`,
would correspond to a model $\mathrm{logit}(\theta) = \alpha \times
1 + \beta_\mathrm{treatment}\times x_\mathrm{treatment} = \alpha +
\beta_\mathrm{treatment}\times x_\mathrm{treatment}. Now $\alpha$
models the probability of death (via logistic link) in the control
group and $\alpha + \beta_\mathrm{treatment}$ models the
probability of death (via logistic link) in the treatment
group. Now the models for the groups are connected. Furthermore, if
we set independent `student_t(7, 0, 1.5)` priors on $\alpha$ and
$\beta_\mathrm{treatment}$, the implied priors on
$\theta_\mathrm{control}$ and $\theta_\mathrm{treatment}$ are
different. We can verify this with a prior simulation.


```{r}
data.frame(theta_control = plogis(ggdist::rstudent_t(n=20000, df=7, mu=0, sigma=1.5))) |>
  mcmc_hist() +
  xlim(c(0,1)) +
  labs(title='student_t(7, 0, 1.5) prior on Intercept') +
data.frame(theta_treatment = plogis(ggdist::rstudent_t(n=20000, df=7, mu=0, sigma=1.5))+
             plogis(ggdist::rstudent_t(n=20000, df=7, mu=0, sigma=1.5))) |>
  mcmc_hist() +
  xlim(c(0,1)) +
  labs(title='student_t(7, 0, 1.5) prior on Intercept and b_grp2treatment')
```

In this case, with relatively big treatment and control group, the
likelihood is informative, and the difference between using `y |
trials(N) ~ 0 + grp2` or `y | trials(N) ~ grp2` is negligible.

Third option would be a hierarchical model with formula `y |
trials(N) ~ 1 + (1 | grp2)`, which is equivalent to `y | trials(N)
~ 1 + (1 | grp2)`, and corresponds to a model
$\mathrm{logit}(\theta) = \alpha \times 1 +
\beta_\mathrm{control}\times x_\mathrm{control} +
\beta_\mathrm{treatment}\times x_\mathrm{treatment}$, but now the
prior on $\beta_\mathrm{control}$ and $\beta_\mathrm{treatment}$ is
$\mathrm{normal}(0, \sigma_\mathrm{grp})$. The default `brms` prior
for $\sigma_\mathrm{grp}$ is `student_t(3, 0, 2.5)`. Now $\alpha$
models the overall probablity of death (via logistic link), and
$\beta_\mathrm{control}$ and $\beta_\mathrm{treatment}$ model the
difference from that having the same prior. Prior for
$\beta_\mathrm{control}$ and $\beta_\mathrm{treatment}$ includes
unknown scale $\sigma_\mathrm{grp}$. If the there is not difference
between control and treatment groups, the posterior of
$\sigma_\mathrm{grp}$ has more mass near 0, and bigger the
difference between control and treatment groups are, more mass
there is away from 0. With just two groups, there is not much
information about $\sigma_\mathrm{grp}$, and unless there is a
informative prior on $\sigma_\mathrm{grp}$, two group hierarchical
model is not that useful. Hierarchical models are more useful with
more than two groups. In the following, we use the previously used
`student_t(7, 0,1.5)` prior on intercept and the default `brms`
prior `student_t(3, 0, 2.5)` on $\sigma_\mathrm{grp}$.

```{r results='hide'}
fit_bin2 <- brm(y | trials(N) ~ 1 + (1 | grp2), family = binomial(), data = data_bin2,
                prior = prior(student_t(7, 0,1.5), class='Intercept'),
                seed = SEED, refresh = 0, control=list(adapt_delta=0.99))
```

Check the summary of the posterior and convergence. The summary
reports that there are Group-Level Effects: `~grp2` with 2 levels
(control and treatment), with `sd(Intercept)` denoting
$\sigma_\mathrm{grp}$. In addition, the summary lists
Population-Level Effects: `Intercept` ($\alpha$) as in the prevous
non-hierarchical models.

```{r}
fit_bin2
```

We can also look at the variable names `brms` uses internally

```{r}
as_draws_rvars(fit_bin2)
```

Although there is no difference, illustrate how to compute the
oddsratio from hierarchical model

```{r}
draws_bin2 <- as_draws_df(fit_bin2)
oddsratio <- draws_bin2 |>
  mutate_variables(theta_control = plogis(b_Intercept + `r_grp2[control,Intercept]`),
                   theta_treatment = plogis(b_Intercept + `r_grp2[treatment,Intercept]`),
                   oddsratio = (theta_treatment/(1-theta_treatment))/(theta_control/(1-theta_control))) |>
  subset_draws(variable='oddsratio')
oddsratio |> mcmc_hist() +
  scale_x_continuous(breaks=seq(0.2,1.6,by=0.2))+
  geom_vline(xintercept=1, linetype='dashed')
```

Make also prior sensitivity analysis with focus on oddsratio.

```{r}
powerscale_sensitivity(fit_bin2, prediction = \(x, ...) oddsratio, num_args=list(digits=2)
                       )$sensitivity |>
                         filter(variable=='oddsratio') |>
                         mutate(across(where(is.double),  ~num(.x, digits=2)))
```

# Linear Gaussian model

Use the Kilpisjärvi summer month temperatures 1952--2022 data from `aaltobda` package

```{r}
load(url('https://github.com/avehtari/BDA_course_Aalto/raw/master/rpackage/data/kilpisjarvi2022.rda'))
data_lin <- data.frame(year = kilpisjarvi2022$year,
                       temp = kilpisjarvi2022$temp.summer)
```

Plot the data

```{r}
data_lin |>
  ggplot(aes(year, temp)) +
  geom_point(color=2) +
  labs(x= "Year", y = 'Summer temp. @Kilpisjärvi') +
  guides(linetype = "none")
```

To analyse has there been change in the average summer month
temperature we use a linear model with Gaussian model for the
unexplained variation. By default brms uses uniform prior for the
coefficients.

Formula `temp ~ year` corresponds to model $\mathrm{temp} ~
\mathrm{normal}(\alpha + \beta \times \mathrm{temp}, \sigma).  The
model could also be defined as `temp ~ 1 + year` which explicitly
shows the intercept ($\alpha$) part. Using the variable names
`brms` uses the model can be written also as `temp ~
normal(b_Intercept*1 + b_year*year, sigma)`. We start with the
default priors to see some tricks that `brms` does behind the
curtain.

```{r}
fit_lin <- brm(temp ~ year, data = data_lin, family = gaussian(),
               seed = SEED, refresh = 0)
```

Check the summary of the posterior and convergence.

```{r}
fit_lin
```

Convergence diagnostics look good. We see that posterior mean of
`Intercept` is -34.7, which may sound strange, but that is the
intercept at year 0, that is, very far from the data range, and
thus doesn't have meaningful interpretation directly. The posterior
mean of `year` coefficient is 0.02, that is, we estimate that the
summer temperature is increasing 0.02°C per year (which would make
1°C in 50 years).

We can check $R^2$ which corresponds to the proporion of variance
explained by the model. The linear model explains 0.16=16% of the
total data variance.

```{r}
bayes_R2(fit_lin) |> round(2)
```

We can check the all the priors used. 

```{r}
prior_summary(fit_lin)
```

We see that `class=b` and `coef=year` have `flat`, that is,
improper uniform prior, `Intercept` has `student_t(3, 9.5, 2.5)`,
and `sigma` has `student_t(3, 0, 2.5)` prior.  In general it is
good to use proper priors, but sometimes flat priors are fine and
produce proper posterior (like in this case). Important part here
is that by default, `brms` sets the prior on Intercept after
centering the covariate values (design matrix). In this case,
`brms` uses `temp - mean(temp) = temp - 1987` instead of original
years. This in general improves the sampling efficiency. As the
`Intercept` is now defined at the middle of the data, the default
`Intercept` prior is centered on median of the target (here target
is `year`). If we would like to set informative priors, we need to
set the informative prior on `Intercept` given the centered
covariate values. We can turn of the centering by setting argument
`center=FALSE`, and we can set the prior on original intercept by
using a formula `temp ~ 0 + Intercept + year`. In this case, we are
happy with the default prior for the intercept. In this specific
casse, the flat prior on coefficient is also fine, but we add an
weakly informative prior just for the illustration. Let's assume we
expect the temperature to change less than 1°C in 10 years. With
`student_t(3, 0, 0.03)` about 95% prior mass has less than 0.1°C
change in year, and with low degrees of freedom (3) we have thick
tails making the likelihood dominate in case of prior-data
conflict. In real life, we do have much more information about the
temperature change, and naturally a hierarchical spatio-temporal
model with all temperature measurement locations would be even
better.

```{r}
fit_lin <- brm(temp ~ year, data = data_lin, family = gaussian(),
               prior = prior(student_t(3, 0, 0.03), class='b'),
               seed = SEED, refresh = 0)
```

Check the summary of the posterior and convergence

```{r}
fit_lin
```

Make prior sensitivity analysis by powerscaling both prior and likelihood.

```{r}
powerscale_sensitivity(fit_lin)$sensitivity |>
                                mutate(across(where(is.double),  ~num(.x, digits=2)))
```

Our weakly informative proper prior has negligible sensitivity, and
the likelihood is informative.
Extract the posterior draws and check the summaries

```{r}
draws_lin <- as_draws_df(fit_lin) 
draws_lin |> summarise_draws()
```

If one of the columns is hidden we can force printing all columns

```{r}
draws_lin |> summarise_draws() |> print(width=Inf)
```

Histogram of b_year

```{r}
draws_lin |>
  mcmc_hist(pars='b_year') +
  xlab('Average temperature increase per year')
```

Probability that the coefficient b_year > 0 and the corresponding MCSE

```{r}
draws_lin |>
  mutate(I_b_year_gt_0 = b_year>0) |>
  subset_draws(variable='I_b_year_gt_0') |>
  summarise_draws(mean, mcse_mean)
```

All posterior draws have `b_year>0`, the probability gets rounded
to 1, and MCSE is not available as the obserevd posterior variance
is 0.

95% posterior interval for temperature increase per 100 years

```{r}
draws_lin |>
  mutate(b_year_100 = b_year*100) |>
  subset_draws(variable='b_year_100') |>
  summarise_draws(~quantile(.x, probs = c(0.025, 0.975)),
                  ~mcse_quantile(.x, probs = c(0.025, 0.975)),
                  .num_args = list(digits = 2, notation = "dec"))
```

Plot posterior draws of the linear function values at each year.
`add_linpred_draws()` takes the years from the data and uses `fit_lin` to make
the predictions.

```{r}
data_lin |>
  add_linpred_draws(fit_lin) |>
  # plot data
  ggplot(aes(x=year, y=temp)) +
  geom_point(color=2) +
  # plot lineribbon for the linear model
  stat_lineribbon(aes(y = .linpred), .width = c(.95), alpha = 1/2, color=brewer.pal(5, "Blues")[[5]]) +
  # decoration
  scale_fill_brewer()+
  labs(x= "Year", y = 'Summer temp. @Kilpisjärvi') +
  theme(legend.position="none")+
  scale_x_continuous(breaks=seq(1950,2020,by=10))
```

Alternativelly plot a spaghetti plot for 100 draws

```{r}
data_lin |>
  add_linpred_draws(fit_lin, ndraws=100) |>
  # plot data
  ggplot(aes(x=year, y=temp)) +
  geom_point(color=2) +
  # plot a line for each posterior draw
  geom_line(aes(y=.linpred, group=.draw), alpha = 1/2, color = brewer.pal(5, "Blues")[[3]])+
  # decoration
  scale_fill_brewer()+
  labs(x= "Year", y = 'Summer temp. @Kilpisjärvi') +
  theme(legend.position="none")+
  scale_x_continuous(breaks=seq(1950,2020,by=10))
```

Plot posterior predictive distribution at each year until 2030
`add_predicted_draws()` takes the years from the data and uses
`fit_lin` to make the predictions.

```{r}
data_lin |>
  add_row(year=2023:2030) |>
  add_predicted_draws(fit_lin) |>
  # plot data
  ggplot(aes(x=year, y=temp)) +
  geom_point(color=2) +
  # plot lineribbon for the linear model
  stat_lineribbon(aes(y = .prediction), .width = c(.95), alpha = 1/2, color=brewer.pal(5, "Blues")[[5]]) +
  # decoration
  scale_fill_brewer()+
  labs(x= "Year", y = 'Summer temp. @Kilpisjärvi') +
  theme(legend.position="none")+
  scale_x_continuous(breaks=seq(1950,2030,by=10))
```

# Linear Student's $t$ model

The temperatures used in the above analyses are averages over three
months, which makes it more likely that they are normally
distributed, but there can be extreme events in the feather and we
can check whether more robust Student's $t$ observation model would
give different results.


```{r results='hide'}
fit_lin_t <- brm(temp ~ year, data = data_lin, family = student(),
                 prior = prior(student_t(3, 0, 0.03), class='b'),
                 seed = SEED, refresh = 0)
```

Check the summary of the posterior and convergence. The b_year
posterior looks similar as before and the posterior for degrees of
freedom `nu` has most of the posterior mass for quite large values
indicating there is no strong support for thick tailed variation in
average summer temperatures.

```{r}
fit_lin_t
```

# Pareto-smoothed importance-sampling leave-one-out cross-validation (PSIS-LOO)

We can use leave-one-out cross-validation to compare the expected predictive performance.

LOO comparison shows normal and Student's $t$ model have similar performance.

```{r}
loo_compare(loo(fit_lin), loo(fit_lin_t))
```

# Heteroskedastic linear model

Heteroskedasticity assumes that the variation around the linear
mean can also vary. We can allow sigma to depend on year, too.
Although the additional component is written as `sigma ~ year`, the
log link function is used and the model is for log(sigma). `bf()` allows
listing several formulas.


```{r results='hide'}
fit_lin_h <- brm(bf(temp ~ year,
                    sigma ~ year),
                 data = data_lin, family = gaussian(),
                 prior = prior(student_t(3, 0, 0.03), class='b'),
                 seed = SEED, refresh = 0)
```

Check the summary of the posterior and convergence. The b_year
posterior looks similar as before. The posterior for sigma_year
looks like having mosst of the ma for negative values, indicating
decrease in temperature variation around the mean.

```{r}
fit_lin_h
```

Histogram of b_year and b_sigma_year

```{r}
as_draws_df(fit_lin_h) |>
  mcmc_areas(pars=c('b_year', 'b_sigma_year'))
```

As log(x) is almost linear when x is close to zero, we can see that the
sigma is decreasing about 1% per year (95% interval from 0% to 2%).

Plot posterior predictive distribution at each year until 2030
`add_predicted_draws()` takes the years from the data and uses
`fit_lin_h` to make the predictions.

```{r}
data_lin |>
  add_row(year=2023:2030) |>
  add_predicted_draws(fit_lin_h) |>
  # plot data
  ggplot(aes(x=year, y=temp)) +
  geom_point(color=2) +
  # plot lineribbon for the linear model
  stat_lineribbon(aes(y = .prediction), .width = c(.95), alpha = 1/2, color=brewer.pal(5, "Blues")[[5]]) +
  # decoration
  scale_fill_brewer()+
  labs(x= "Year", y = 'Summer temp. @Kilpisjärvi') +
  theme(legend.position="none")+
  scale_x_continuous(breaks=seq(1950,2030,by=10))
```

Make prior sensitivity analysis by powerscaling both prior and likelihood.

```{r}
powerscale_sensitivity(fit_lin_h)$sensitivity |>
                                mutate(across(where(is.double),  ~num(.x, digits=2)))
```

We can use leave-one-out cross-validation to compare the expected predictive performance.

LOO comparison shows homoskedastic normal and heteroskedastic
normal models have similar performances.

```{r}
loo_compare(loo(fit_lin), loo(fit_lin_h))
```

# Heteroskedastic non-linear model

We can test the linearity assumption by using non-linear spline
functions, by uing `s(year)` terms. Sampling is slower as the
posterior gets more complex.


```{r results='hide'}
fit_spline_h <- brm(bf(temp ~ s(year),
                     sigma ~ s(year)),
                  data = data_lin, family = gaussian(),
                  seed = SEED, refresh = 0)
```

We get warnings about divergences, and try rerunning with higher
adapt_delta, which leads to using smaller step sizes. Often
`adapt_delta=0.999` leads to very slow sampling, but with this
small data, this is not an issue.

```{r}
fit_spline_h <- update(fit_spline_h, control = list(adapt_delta=0.999))
```

Check the summary of the posterior and convergence. We're not
anymore able to make interpretation of the temperature increase
based on this summary. For splines, we see prior scales `sds` for
the spline coefficients.

```{r}
fit_spline_h
```

We can still plot posterior predictive distribution at each year
until 2030 `add_predicted_draws()` takes the years from the data
and uses `fit_lin_h` to make the predictions.

```{r}
data_lin |>
  add_row(year=2023:2030) |>
  add_predicted_draws(fit_spline_h) |>
  # plot data
  ggplot(aes(x=year, y=temp)) +
  geom_point(color=2) +
  # plot lineribbon for the linear model
  stat_lineribbon(aes(y = .prediction), .width = c(.95), alpha = 1/2, color=brewer.pal(5, "Blues")[[5]]) +
  # decoration
  scale_fill_brewer()+
  labs(x= "Year", y = 'Summer temp. @Kilpisjärvi') +
  theme(legend.position="none")+
  scale_x_continuous(breaks=seq(1950,2030,by=10))
```

And we can use leave-one-out cross-validation to compare the
expected predictive performance.

LOO comparison shows homoskedastic normal linear and
heteroskedastic normal spline models have similar
performances. There are not enough observations to make clear
difference between the models.

```{r}
loo_compare(loo(fit_lin), loo(fit_spline_h))
```

For spline and other non-parametric models, we can use predictive
estimates and predictions to get interpretable quantities. Let's
examine the difference of estimated average temperature in years
1952 and 2022.

```{r}
temp_diff <- posterior_epred(fit_spline_h, newdata=filter(data_lin,year==1952|year==2022)) |>
  rvar() |>
  diff() |>
  as_draws_df() |>
  set_variables('temp_diff')

temp_diff <- data_lin |>
  filter(year==1952|year==2022) |>
  add_epred_draws(fit_spline_h) |>
  pivot_wider(id_cols=.draw, names_from = year, values_from = .epred) |>
  mutate(temp_diff = `2022`-`1952`,
         .chain = (.draw - 1) %/% 1000 + 1,
         .iteration = (.draw - 1) %% 1000 + 1) |>
  as_draws_df() |>
  subset_draws(variable='temp_diff')
```

Posterior distribution for average summer temperature increase from 1952 to 2022

```{r}
temp_diff |>
  mcmc_hist()
```

95% posterior interval for average summer temperature increase from 1952 to 2022

```{r}
temp_diff |>
  summarise_draws(~quantile(.x, probs = c(0.025, 0.975)),
                  ~mcse_quantile(.x, probs = c(0.025, 0.975)),
                  .num_args = list(digits = 2, notation = "dec"))
```

Make prior sensitivity analysis by powerscaling both prior and
likelihood with focus on average summer temperature increase from
1952 to 2022.

```{r}
powerscale_sensitivity(fit_spline_h, prediction = \(x, ...) temp_diff, num_args=list(digits=2)
                       )$sensitivity |>
                         filter(variable=='temp_diff') |>
                         mutate(across(where(is.double),  ~num(.x, digits=2)))
```

Probability that the average summer temperature has increased from
1952 to 2022 is 99.5%.

```{r}
temp_diff |>
  mutate(I_temp_diff_gt_0 = temp_diff>0,
         temp_diff = NULL) |>
  subset_draws(variable='I_temp_diff_gt_0') |>
  summarise_draws(mean, mcse_mean)
```


# Comparison of k groups with hierarchical normal models

Load factory data, which contain 5 quality measurements for each of
6 machines. We're interested in analysing are the quality differences
between the machines.

```{r}
factory <- read.table(url('https://raw.githubusercontent.com/avehtari/BDA_course_Aalto/master/rpackage/data-raw/factory.txt'))
colnames(factory) <- 1:6
factory
```

We pivot the data to long format

```{r}
factory <- factory |>
  pivot_longer(cols = everything(),
               names_to = 'machine',
               values_to = 'quality')
factory
```

## Pooled model

As comparison make also pooled model

```{r results='hide'}
fit_pooled <- brm(quality ~ 1, data = factory, refresh=0)
```

Check the summary of the posterior and convergence.

```{r}
fit_pooled
```

## Separate model

As comparison make also seprate model. To make it completely
separate we need to have different sigma for each machine, too.

```{r results='hide'}
fit_separate <- brm(bf(quality ~ 0 + machine,
                       sigma ~ 0 + machine),
                    data = factory, refresh=0)
```

Check the summary of the posterior and convergence.

```{r}
fit_separate
```

# Common variance hierarchical model (ANOVA)

```{r results='hide'}
fit_hier <- brm(quality ~ 1 + (1 | machine),
                data = factory, refresh = 0)
```

Check the summary of the posterior and convergence.

```{r}
fit_hier
```

LOO comparison shows the hierarchical model is the best. The
differences are small as the number of observations is small and
there is a considerable prediction (aleatoric) uncertainty.

```{r}
loo_compare(loo(fit_pooled), loo(fit_separate), loo(fit_hier))
```

Different model posterior distributions for the mean
quality. Pooled model ignores the varition between
machines. Separate model doesn't take benefit from the similariy of
the machines and has higher uncertainty.

```{r}
ph <- fit_hier |>
  spread_rvars(b_Intercept, r_machine[machine,]) |>
  mutate(machine_mean = b_Intercept + r_machine) |>
  ggplot(aes(xdist=machine_mean, y=machine)) +
  stat_halfeye() +
  scale_y_continuous(breaks=1:6) +
  labs(x='Quality', y='Machine', title='Hierarchical')

ps <- fit_separate |>
  as_draws_df() |>
  subset_draws(variable='b_machine', regex=TRUE) |>
  set_variables(paste0('b_machine[', 1:6, ']')) |>
  as_draws_rvars() |>
  spread_rvars(b_machine[machine]) |>
  mutate(machine_mean = b_machine) |>
  ggplot(aes(xdist=machine_mean, y=machine)) +
  stat_halfeye() +
  scale_y_continuous(breaks=1:6) +
  labs(x='Quality', y='Machine', title='Separate')

pp <- fit_pooled |>
  spread_rvars(b_Intercept) |>
  mutate(machine_mean = b_Intercept) |>
  ggplot(aes(xdist=machine_mean, y=0)) +
  stat_halfeye() +
  scale_y_continuous(breaks=NULL) +
  labs(x='Quality', y='All machines', title='Pooled')

(pp / ps / ph) * xlim(c(50,140))
```

Make prior sensitivity analysis by powerscaling both prior and
likelihood with focus on mean quality of each machine. We see no
prior sensitivity.

```{r}
machine_mean <- fit_hier |>
  as_draws_df() |>
  mutate(across(matches('r_machine'), ~ .x - b_Intercept)) |>
  subset_draws(variable='r_machine', regex=TRUE) |>
  set_variables(paste0('machine_mean[', 1:6, ']'))
powerscale_sensitivity(fit_hier, prediction = \(x, ...) machine_mean, num_args=list(digits=2)
                       )$sensitivity |>
                         filter(str_detect(variable,'machine_mean')) |>
                         mutate(across(where(is.double),  ~num(.x, digits=2)))
```


# Hierarchical binomial model

[Sorafenib Toxicity Dataset in `metadat` R package](https://wviechtb.github.io/metadat/reference/dat.ursino2021.html)
includes results frm 13 studies investigating the occurrence of
dose limiting toxicities (DLTs) at different doses of Sorafenib.

Load data

```{r}
load(url('https://github.com/wviechtb/metadat/raw/master/data/dat.ursino2021.rda'))
head(dat.ursino2021)
```

Pooled model assumes all studies have the same dose effect
(reminder: `~ dose` is equivalent to `~ 1 + dose`)

```{r results='hide'}
fit_pooled <- brm(events | trials(total) ~ dose,
                  prior = c(prior(student_t(7, 0, 1.5), class='Intercept'),
                            prior(normal(0, 1), class='b')),
                  family=binomial(), data=dat.ursino2021)
```

Check the summary of the posterior and convergence

```{r}
fit_pooled
```

Dose coefficient seems to be very small. Looking at the posterior,
we see that it is positive with high probability. 

```{r}
fit_pooled |>
  as_draws() |>
  subset_draws(variable='b_dose') |>
  summarise_draws(~quantile(.x, probs = c(0.025, 0.975)), ~mcse_quantile(.x, probs = c(0.025, 0.975)))
```

The dose was reported in mg, and most values are in hundreds. It is
often sensible to switch to a scale in which the range of values is
closer to unit range. In this case it is natural to use g instead
of mg.

```{r}
dat.ursino2021 <- dat.ursino2021 |>
  mutate(doseg = dose/100)
```

Fit the pooled model again uing `doseg`

```{r results='hide'}
fit_pooled <- brm(events | trials(total) ~ doseg,
                  prior = c(prior(student_t(7, 0, 1.5), class='Intercept'),
                            prior(normal(0, 1), class='b')),
                  family=binomial(), data=dat.ursino2021)
```

Check the summary of the posterior and convergence.

```{r}
fit_pooled
```

Now it is easier to interpret the presented values. 
Separate model assumes all studies have different dose effect.
It would be a bit complicated to set a different prior on study specific
intercepts and other coefficients, so we use the ame prior for all.

```{r results='hide'}
fit_separate <- brm(events | trials(total) ~ 0 + study + doseg:study,
                    prior=prior(student_t(7, 0, 1.5), class='b'),
                    family=binomial(), data=dat.ursino2021)
```

Check the summary of the posterior and convergence.

```{r}
fit_separate
```

Hierarchical model assumes common mean effect and variation around with normal population prior
(reminder: `~ doseg + (doseg | study)` is equivalent to `~ 1 + doseg + (1 + doseg | study)`)

```{r results='hide'}
fit_hier <- brm(events | trials(total) ~ doseg + (doseg | study),
                    prior=c(prior(student_t(7, 0, 1.5), class='Intercept'),
                            prior(normal(0, 1), class='b')),
                family=binomial(), data=dat.ursino2021)
```

We seem some divergences and repeat with higher adapt_delta

```{r results='hide'}
fit_hier <- update(fit_hier, control=list(adapt_delta=0.99))
```

LOO-CV comparison

```{r}
loo_compare(loo(fit_pooled), loo(fit_separate), loo(fit_hier))
```

We get warnings about Pareto k's > 0.7 in PSIS-LOO for separate
model, but as in that case the LOO-CV estimate is usually
overoptimistic and the separate model is the worst, there is no
need to use more accurate computation.

Hierarchical model has better elpd than the pooled, but difference
is negligible. However, when we look at the study specific
parameters, we see that the Miller study has higher intercept (more
events).

```{r}
mcmc_areas(as_draws_df(fit_hier), regex_pars='r_study\\[.*Intercept')
```


There are no differences in slopes.

```{r}
mcmc_areas(as_draws_df(fit_hier), regex_pars='r_study\\[.*doseg')
```


The population level coefficient for the dose is clearly larger than 0

```{r}
mcmc_areas(as_draws_df(fit_hier), regex_pars='b_doseg') +
  geom_vline(xintercept=0, linetype='dashed') +
  xlim(c(0,1))
```

Make prior sensitivity analysis by powerscaling both prior and likelihood.

```{r}
powerscale_sensitivity(fit_hier, variable='b_doseg'
                       )$sensitivity |>
                         mutate(across(where(is.double),  ~num(.x, digits=2)))
```

The posterior for the probability of event given certain dose and a new study.

```{r}
data.frame(study='new',
           doseg=seq(0.1,1,by=0.1),
           total=1) |>
  add_linpred_draws(fit_hier, transform=TRUE, allow_new_levels=TRUE) |>
  ggplot(aes(x=doseg, y=.linpred)) +
  stat_lineribbon(.width = c(.95), alpha = 1/2, color=brewer.pal(5, "Blues")[[5]]) +
  scale_fill_brewer()+
  labs(x= "Dose (g)", y = 'Probability of event') +
  theme(legend.position="none") +
  geom_hline(yintercept=0) +
  scale_x_continuous(breaks=seq(0.1,1,by=0.1))
```

If plot individual posterior draws, we see that there is a lot of
uncertainty about the overall probability (explained by the
variation in Intercept in different studies), but less uncertainty
about the slope.

```{r}
data.frame(study='new',
           doseg=seq(0.1,1,by=0.1),
           total=1) |>
  add_linpred_draws(fit_hier, transform=TRUE, allow_new_levels=TRUE, ndraws=100) |>
  ggplot(aes(x=doseg, y=.linpred)) +
  geom_line(aes(group=.draw), alpha = 1/2, color = brewer.pal(5, "Blues")[[3]])+
  scale_fill_brewer()+
  labs(x= "Dose (g)", y = 'Probability of event') +
  theme(legend.position="none") +
  geom_hline(yintercept=0) +
  scale_x_continuous(breaks=seq(0.1,1,by=0.1))
```

Posterior predictive checking showing the observed and predicted number of events.

```{r}
pp_check(fit_hier, type = "ribbon_grouped", group="study")
```

<br />

# Licenses {.unnumbered}

* Code &copy; 2017-2024, Aki Vehtari, licensed under BSD-3.
* Text &copy; 2017-2024, Aki Vehtari, licensed under CC-BY-NC 4.0.
