Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More details about equivocal zones and the new class of factor predictions #5

Closed
topepo opened this issue Apr 25, 2024 · 2 comments
Closed

Comments

@topepo
Copy link
Member

topepo commented Apr 25, 2024

          I'll need to add a Details section to explain this as well as what to look out for. The new class predictions have a different class (similar to a factor) that, to date, works with our tidymodels functions.

Originally posted by @topepo in #1 (comment)

@simonpcouch
Copy link
Contributor

Related TODO:

new_data[[est_nm]] <- cls_pred # todo convert to factor?

@simonpcouch
Copy link
Contributor

For context, here's the issue:

library(tailor)
library(dplyr)
library(modeldata)

head(two_class_example)
#>    truth      Class1       Class2 predicted
#> 1 Class2 0.003589243 0.9964107574    Class2
#> 2 Class1 0.678621054 0.3213789460    Class1
#> 3 Class2 0.110893522 0.8891064779    Class2
#> 4 Class1 0.735161703 0.2648382969    Class1
#> 5 Class2 0.016239960 0.9837600397    Class2
#> 6 Class1 0.999275071 0.0007249286    Class1

# `predicted` gives hard class predictions based on probabilities
two_class_example %>% count(predicted)
#>   predicted   n
#> 1    Class1 277
#> 2    Class2 223

# when probabilities are within (.25, .75), consider them equivocal
tlr <-
  tailor() %>%
  adjust_equivocal_zone(value = 1 / 4)

tlr
#> 
#> ── tailor ──────────────────────────────────────────────────────────────────────
#> A binary postprocessor with 1 adjustment:
#> 
#> • Add equivocal zone of size 0.25.

# fit by supplying column names. situate in a modeling workflow
# with `workflows::add_tailor()` to avoid having to do so manually
tlr_fit <- fit(
  tlr,
  two_class_example,
  outcome = c(truth),
  estimate = c(predicted),
  probabilities = c(Class1, Class2)
)

tlr_fit
#> 
#> ── tailor ──────────────────────────────────────────────────────────────────────
#> A binary postprocessor with 1 adjustment:
#> 
#> • Add equivocal zone of size 0.25. [trained]

# adjust hard class predictions
predict(tlr_fit, two_class_example) %>% count(predicted)
#> # A tibble: 3 × 2
#>    predicted     n
#>   <clss_prd> <int>
#> 1       [EQ]    86
#> 2     Class1   229
#> 3     Class2   185

predict(tlr_fit, two_class_example) %>% pull(predicted) %>% head()
#> [1] Class2 [EQ]   Class2 [EQ]   Class2 Class1
#> Levels: Class1 Class2
#> Reportable: 66.7%

Created on 2024-12-11 with reprex v2.1.1

The resulting object is a "class_pred" < "vctrs_vctr", that gives us a factor-ish thing with the original factor levels as well as an "[EQ]" possible entry (that is not a level).

The question is whether we use this object or convert to a factor where [EQ] is a level "in between" Class1 and Class2.

Interesting "tidymodels prediction guarantee" question here... this is not the same type of object returned in other binary prediction contexts, but neither would be a 3-level factor. The latter object type would introduce issues with yardstick metrics, but it sounds like this class_pred object doesn't?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants