More details about equivocal zones and the new class of factor predictions #5

topepo · 2024-04-25T13:07:58Z

          I'll need to add a Details section to explain this as well as what to look out for. The new class predictions have a different class (similar to a factor) that, to date, works with our tidymodels functions.

Originally posted by @topepo in #1 (comment)

The text was updated successfully, but these errors were encountered:

simonpcouch · 2024-11-12T21:33:08Z

Related TODO:

tailor/R/adjust-equivocal-zone.R

Line 126 in 66bfa85

new_data[[est_nm]] <- cls_pred # todo convert to factor?

simonpcouch · 2024-12-11T15:26:42Z

For context, here's the issue:

library(tailor)
library(dplyr)
library(modeldata)

head(two_class_example)
#>    truth      Class1       Class2 predicted
#> 1 Class2 0.003589243 0.9964107574    Class2
#> 2 Class1 0.678621054 0.3213789460    Class1
#> 3 Class2 0.110893522 0.8891064779    Class2
#> 4 Class1 0.735161703 0.2648382969    Class1
#> 5 Class2 0.016239960 0.9837600397    Class2
#> 6 Class1 0.999275071 0.0007249286    Class1

# `predicted` gives hard class predictions based on probabilities
two_class_example %>% count(predicted)
#>   predicted   n
#> 1    Class1 277
#> 2    Class2 223

# when probabilities are within (.25, .75), consider them equivocal
tlr <-
  tailor() %>%
  adjust_equivocal_zone(value = 1 / 4)

tlr
#> 
#> ── tailor ──────────────────────────────────────────────────────────────────────
#> A binary postprocessor with 1 adjustment:
#> 
#> • Add equivocal zone of size 0.25.

# fit by supplying column names. situate in a modeling workflow
# with `workflows::add_tailor()` to avoid having to do so manually
tlr_fit <- fit(
  tlr,
  two_class_example,
  outcome = c(truth),
  estimate = c(predicted),
  probabilities = c(Class1, Class2)
)

tlr_fit
#> 
#> ── tailor ──────────────────────────────────────────────────────────────────────
#> A binary postprocessor with 1 adjustment:
#> 
#> • Add equivocal zone of size 0.25. [trained]

# adjust hard class predictions
predict(tlr_fit, two_class_example) %>% count(predicted)
#> # A tibble: 3 × 2
#>    predicted     n
#>   <clss_prd> <int>
#> 1       [EQ]    86
#> 2     Class1   229
#> 3     Class2   185

predict(tlr_fit, two_class_example) %>% pull(predicted) %>% head()
#> [1] Class2 [EQ]   Class2 [EQ]   Class2 Class1
#> Levels: Class1 Class2
#> Reportable: 66.7%

^{Created on 2024-12-11 with reprex v2.1.1}

The resulting object is a "class_pred" < "vctrs_vctr", that gives us a factor-ish thing with the original factor levels as well as an "[EQ]" possible entry (that is not a level).

The question is whether we use this object or convert to a factor where [EQ] is a level "in between" Class1 and Class2.

Interesting "tidymodels prediction guarantee" question here... this is not the same type of object returned in other binary prediction contexts, but neither would be a 3-level factor. The latter object type would introduce issues with yardstick metrics, but it sounds like this class_pred object doesn't?

topepo mentioned this issue Apr 25, 2024

first pass at the post-processing container #1

Merged

simonpcouch closed this as completed in b6ddea4 Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More details about equivocal zones and the new class of factor predictions #5

More details about equivocal zones and the new class of factor predictions #5

topepo commented Apr 25, 2024

simonpcouch commented Nov 12, 2024

simonpcouch commented Dec 11, 2024

More details about equivocal zones and the new class of factor predictions #5

More details about equivocal zones and the new class of factor predictions #5

Comments

topepo commented Apr 25, 2024

simonpcouch commented Nov 12, 2024

simonpcouch commented Dec 11, 2024