From c874ff3a3cfbeae35640353ed504f83be8a4589a Mon Sep 17 00:00:00 2001 From: simonpcouch Date: Thu, 12 Sep 2024 20:30:00 +0000 Subject: [PATCH] =?UTF-8?q?Deploying=20to=20gh-pages=20from=20@=20tidymode?= =?UTF-8?q?ls/tailor@9d5c87c82a29fd39552cc6ed2495b7c3870ede65=20?= =?UTF-8?q?=F0=9F=9A=80?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- authors.html | 4 +- index.html | 90 ++++++++++++++++++++++++++++++++--- pkgdown.yml | 2 +- reference/tailor-package.html | 6 +-- search.json | 2 +- 5 files changed, 91 insertions(+), 13 deletions(-) diff --git a/authors.html b/authors.html index 9b8973d..431aa07 100644 --- a/authors.html +++ b/authors.html @@ -60,11 +60,11 @@

Citation

Source: DESCRIPTION

Couch S, Frick H, HvitFeldt E, Kuhn M (2024). -tailor: Sandbox for a postprocessor object. +tailor: Iterative Steps for Postprocessing Model Predictions. R package version 0.0.0.9001, https://tailor.tidymodels.org, https://github.com/tidymodels/tailor.

@Manual{,
-  title = {tailor: Sandbox for a postprocessor object},
+  title = {tailor: Iterative Steps for Postprocessing Model Predictions},
   author = {Simon Couch and Hannah Frick and Emil HvitFeldt and Max Kuhn},
   year = {2024},
   note = {R package version 0.0.0.9001, https://tailor.tidymodels.org},
diff --git a/index.html b/index.html
index fbc6561..b090172 100644
--- a/index.html
+++ b/index.html
@@ -5,14 +5,14 @@
 
 
 
-Sandbox for a postprocessor object • tailor
+Iterative Steps for Postprocessing Model Predictions • tailor
 
 
 
 
-
-
-
+
+
+
 
 
     Skip to contents
@@ -50,8 +50,14 @@
 
 
 
-

The goal of tailor is to provide a tailor for postprocessing.

-

This is going to undergo massive changes (especially the name), so please treat it as experimental and don’t depend on the syntax staying the same.

+

Postprocessors refine predictions outputted from machine learning models to improve predictive performance or better satisfy distributional limitations. This package introduces ‘tailor’ objects, which compose iterative adjustments to model predictions. In addition to utilities to create new adjustments, the package provides a number of pre-written ones:

+ +

Tailors are tightly integrated with the tidymodels framework. For greatest ease of use, situate tailors in model workflows with add_tailor().

+

The package is under active development; please treat it as experimental and don’t depend on the syntax staying the same.

Installation

@@ -59,6 +65,78 @@

Installation
 pak::pak("tidymodels/tailor")

+
+

Example +

+ +

The two_class_example dataset from modeldata gives the true value of an outcome variable truth as well as predicted probabilities (Class1 and Class2). The hard class predictions, in predicted, are "Class1" if the probability assigned to "Class1" is above .5, and "Class2" otherwise.

+
+library(dplyr)
+#> 
+#> Attaching package: 'dplyr'
+#> The following objects are masked from 'package:stats':
+#> 
+#>     filter, lag
+#> The following objects are masked from 'package:base':
+#> 
+#>     intersect, setdiff, setequal, union
+library(modeldata)
+
+head(two_class_example)
+#>    truth      Class1       Class2 predicted
+#> 1 Class2 0.003589243 0.9964107574    Class2
+#> 2 Class1 0.678621054 0.3213789460    Class1
+#> 3 Class2 0.110893522 0.8891064779    Class2
+#> 4 Class1 0.735161703 0.2648382969    Class1
+#> 5 Class2 0.016239960 0.9837600397    Class2
+#> 6 Class1 0.999275071 0.0007249286    Class1
+

The model predicts "Class1" more often than it does "Class2".

+
+two_class_example %>% count(predicted)
+#>   predicted   n
+#> 1    Class1 277
+#> 2    Class2 223
+

If we wanted the model to predict "Class2" more often, we could increase the probability threshold assigned to "Class1" above which the hard class prediction will be "Class1". In the tailor package, this adjustment is implemented in adjust_probability_threshold(), which can be situated in a tailor object.

+
+post_obj <-
+  tailor() %>%
+  adjust_probability_threshold(threshold = .9)
+
+post_obj
+#> 
+#> ── tailor ──────────────────────────────────────────────────────────────────────
+#> A postprocessor with 1 adjustment:
+#> 
+#> • Adjust probability threshold to 0.9.
+

tailors must be fitted before they can predict on new data. For adjustments like adjust_probability_threshold(), there’s no training that actually happens at the fit() step besides recording the name and type of relevant variables. For other adjustments, like probability calibration with adjust_probability_calibration(), parameters are actually estimated at the fit() step and separate data should be used to train the postprocessor and evaluate its performance.

+

In this case, though, we can fit() on the whole dataset. The resulting object is still a tailor, but is now flagged as trained.

+
+post_res <- fit(
+  post_obj,
+  two_class_example,
+  outcome = c(truth),
+  estimate = c(predicted),
+  probabilities = c(Class1, Class2)
+)
+
+post_res
+#> 
+#> ── tailor ──────────────────────────────────────────────────────────────────────
+#> A binary postprocessor with 1 adjustment:
+#> 
+#> • Adjust probability threshold to 0.9. [trained]
+

When used with a model workflow via add_tailor(), the arguments to fit() a tailor will be set automatically (in addition to the data splitting needed for postprocessors that require training).

+

Now, when passed new data, the trained tailor will determine the outputted class based on whether the probability assigned to the level "Class1" is above .9, resulting in more predictions of "Class2" than before.

+
+predict(post_res, two_class_example) %>% count(predicted)
+#> # A tibble: 2 × 2
+#>   predicted     n
+#>   <fct>     <int>
+#> 1 Class1      180
+#> 2 Class2      320
+

Tailors compose adjustments; when several adjust_*() functions are called iteratively, tailors will apply them in order at fit() and predict() time.

+