Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise compare_models() for Bayesian models #716

Open
DominiqueMakowski opened this issue Apr 23, 2024 · 5 comments · May be fixed by #780
Open

Revise compare_models() for Bayesian models #716

DominiqueMakowski opened this issue Apr 23, 2024 · 5 comments · May be fixed by #780
Assignees
Labels
Consistency 🍏 🍎 Expected output across functions could be more similar Enhancement 💥 Implemented features can be improved or revised

Comments

@DominiqueMakowski
Copy link
Member

The current output would benefit from some streamlining.

> performance::compare_performance(model, model2, model3)
# Comparison of Model Performance Indices

Name   |   Model |     ELPD | ELPD_SE | LOOIC (weights) | LOOIC_SE | WAIC (weights) |    R2 | R2 (adj.) |  RMSE | Sigma
-----------------------------------------------------------------------------------------------------------------------
model  | brmsfit | -104.387 |   9.532 |   208.8 (0.092) |   19.063 |  208.8 (<.001) | 0.927 |     0.926 | 0.475 | 0.487
model2 | brmsfit |  -89.212 |  10.543 |   178.4 (<.001) |   21.085 |  178.4 (<.001) | 0.941 |     0.940 | 0.425 | 0.467
model3 | brmsfit |  -64.580 |  11.569 |   129.2 (0.908) |   23.137 |  129.1 (>.999) | 0.958 |     0.957 | 0.353 | 0.363

Suggestions:

  1. Remove RMSE and Sigma by default (not exactly typically useful or what you would expect when comparing models)
  2. Move R2 and R2 adj. first, after the model column
  3. My current understanding is that LOO and WAIC are both methods to estimate ELPD, which can be used to compare models (in particular, by looking at the ELPD-diff (see [Feature] Add report.compare.loo report#419 for the report support recently added). As such, in the interest of computation, I would display directly the ELPD_DIFF + (SE), by default computed using LOO (although we could add the option for WAIC for faster computation), and that's it, drop the LOOIC, WAIC and the raw ELPD which are redundant indices and simply add noise.

What do you think?

@bwiernik
Copy link
Contributor

I agree other than that I think we should keep LOOIC+SE (rather than ELPD). LOOIC is -2*ELPD, which puts it in the deviance metric that is also used in frequentist models with AIC, etc, and this allows us to show the stacking weights with our existing formatting functions

@DominiqueMakowski
Copy link
Member Author

Fair
So we'd go with ELPD_DIFF (SE); and LOOIC (weights) +SE? Or just the latter? is the SE for LOOIC super necessary since we have the weights?

Should we also add some details in the documentation about how to interpret the weights, because I've been asked by students and I wasn't too clear in my answer... any rules of thumb?

@DominiqueMakowski DominiqueMakowski changed the title Reviser compare_models() for Bayesian models Revise compare_models() for Bayesian models Apr 23, 2024
@DominiqueMakowski
Copy link
Member Author

DominiqueMakowski commented Apr 23, 2024

Moreover, in easystats/report#419, we compute the z-transformed ELPD diff (i.e., normalized by SE). But I think we might as well directly convert it to a p-value given that its asymptotic distribution, it would be the most easy and "intuitive" to read (rather than the z-value). In order to avoid duplicated code, I suggest we add a parameters::parameters() method for brms::loo_compare() outputs that would simply format and compute the columns we need, we could then use that in report as well as here. I'll open a PR

EDIT: forget what I said about the duplicated code it wouldn't work

@DominiqueMakowski
Copy link
Member Author

Here is a skeleton, maybe it would make more sense to add it as a method of test_performance() rather than compare_performance() (?)

  • maybe test_performance() for Bayesian models could be a watered-down version (mostly focusing on the diff) of compare_performance (which would return the IC values + R2 etc. etc.)
.test_performance_bayesian <- function(objects, criterion="loo") {

  # Compute differences ----
  if (criterion %in% c("loo", "looic")) {
    diff <- brms::loo_compare(lapply(objects, loo::loo))
    col <- "looic"
  } else if (criterion == "waic") {
    diff <- brms::loo_compare(lapply(objects, loo::waic))
    col <- "waic"
  } else {
    stop("Criterion not supported")
  }

  # Format ----
  out <- as.data.frame(diff)
  out$Name <- rownames(out)
  # Select columns
  out <- out[, c("Name", col, "elpd_diff", "se_diff")]

  # Compute p-value ----
  # ELPD difference is asympatotically normal, hence the z-score is computed
  z <- out$elpd_diff[-1] / out$se_diff[-1]
  out$p <- c(NA, 2 * pnorm(-abs(z)))

  # Return
  out
}


m1 <- brms::brm(wt ~ mpg, data=mtcars, refresh=0)
#> Compiling Stan program...
#> Start sampling
m2 <- brms::brm(wt ~ mpg + hp, data=mtcars, refresh=0)
#> Compiling Stan program...
#> Start sampling
m3 <- brms::brm(wt ~ mpg + hp + qsec, data=mtcars, refresh=0)
#> Compiling Stan program...
#> Start sampling
objects <- insight::ellipsis_info(m1, m2, m3, only_models = TRUE)

.test_performance_bayesian(objects)
#> Warning: Found 1 observations with a pareto_k > 0.7 in model 'X[[i]]'. We
#> recommend to set 'moment_match = TRUE' in order to perform moment matching for
#> problematic observations.

#> Warning: Found 1 observations with a pareto_k > 0.7 in model 'X[[i]]'. We
#> recommend to set 'moment_match = TRUE' in order to perform moment matching for
#> problematic observations.
#>    Name    looic elpd_diff  se_diff          p
#> m3   m3 45.99388  0.000000 0.000000         NA
#> m1   m1 50.78136 -2.393738 2.318327 0.30182476
#> m2   m2 53.35469 -3.680406 2.208913 0.09568128

Created on 2024-04-23 with reprex v2.0.2

@bwiernik
Copy link
Contributor

I suggest we do LOOIC (with weights) and SE. If folks want ELPD or WAIC, they can request those instead.

The SE is useful in the same way as for ELPD, whereas the weights are more like a transformed point estimate.

@strengejacke strengejacke added Enhancement 💥 Implemented features can be improved or revised Consistency 🍏 🍎 Expected output across functions could be more similar labels May 20, 2024
@strengejacke strengejacke self-assigned this Nov 24, 2024
@strengejacke strengejacke linked a pull request Nov 24, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Consistency 🍏 🍎 Expected output across functions could be more similar Enhancement 💥 Implemented features can be improved or revised
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants