forked from tidyverse/design
-
Notifications
You must be signed in to change notification settings - Fork 0
/
argument-clutter.qmd
130 lines (91 loc) · 5.4 KB
/
argument-clutter.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
# Reduce argument clutter with an options object {#sec-argument-clutter}
```{r}
#| include = FALSE
source("common.R")
```
## What's the problem?
If you have a large number of optional arguments that control the fine details of the operation of a function, it might be worth lumping them all together into a separate "options" object created by a helper function.
Having a large number of less important arguments makes it harder to see the most important.
By moving rarely used and less important arguments to a secondary function, you can more easily draw attention to what is most important.
## What are some examples?
- Many base R modelling functions like `loess()`, `glm()`, and `nls()` have a `control` argument that are paired with a function like `loess.control()`, `glm.control()`, and `nls.control()`.
These allow you to modify rarely used defaults, including the number of iterations, the stopping criteria, and some debugging options.
`optim()` uses a less formal version of this structure --- while it has a `control` argument, it doesn't have a matching `optim.control()` helper.
Instead, you supply a named list with components described in `?optim`.
A helper function is more convenient than a named list because it checks the argument names for free and gives nicer autocomplete to the user.
- This pattern is common in other modelling packages, e.g. `tune::fit_resamples()` + `tune::control_resamples()`, `tune::control_bayes()`, `tune::control_grid()`, and `caret::train()` + `caret::trainControl()`
- `readr::read_delim()` and friends take a `locale` argument which is paired with the `readr::locale()` helper.
This object bundles together a bunch of options related to parsing numbers, dates, and times that vary from country to country.
- `readr::locale()` itself has a `date_names` argument that's paired with `readr::date_names()` and `readr::date_names_lang()` helpers.
You typically use the argument by supplying a two letter locale (which `date_names_lang()` uses to look up common languages), but if your language isn't supported you can use `readr::date_names()` to individually supply full and abbreviated month and day of week names.
On the other hand, some functions with many arguments that would benefit from this technique include:
- `readr::read_delim()` has a lot of options that control rarely needed details of file parsing (e.g. `escape_backslash`, `escape_double`, `quoted_na`, `comment`, `trim_ws)`.
These make the function specification very long and might well be better in a details object.
- `ggplot2::geom_smooth()` fits a smooth line to your data.
Most of the time you only want to pick the `model` and `formula` used, but `geom_smooth()` (via `ggplot2::stat_smooth()`) also provides `n`, `fullrange`, `span`, `level`, and `method.args` to control details of the fit.
I think these would be better in their own details object.
## How do I use this pattern?
The simplest implementation is just to write a helper function that returns a list:
```{r}
my_fun_opts <- function(opt1 = 1, opt2 = 2) {
list(
opt1 = opt1,
opt2 = opt2
)
}
```
This alone is nice because you can document the individual arguments, you get name checking for free, and auto-complete will remind the user what these less important options include.
### Better error messages
An optional extra is to add a unique class to the list:
```{r}
my_fun_opts <- function(opt1 = 1, opt2 = 2) {
structure(
list(
opt1 = opt1,
opt2 = opt2
),
class = "mypackage_my_fun_opts"
)
}
```
This then allows you to create more informative error messages:
```{r}
#| error: true
my_fun_opts <- function(..., opts = my_fun_opts()) {
if (!inherits(opts, "mypackage_my_fun_opts")) {
cli::cli_abort("{.arg opts} must be created by {.fun my_fun_opts}.")
}
}
my_fun_opts(opts = 1)
```
If you use this option in many places, you should consider pulling out the repeated code into a `check_my_fun_opts()` function.
## How do I remediate past mistakes?
Typically you notice this problem only after you have created too many options so you'll need to carefully remediate by introducing a new options argument and paired helper function.
For example, if your existing function looks like this:
```{r}
my_fun <- function(x, y, opt1 = 1, opt2 = 2) {
}
```
If you want to keep the existing function specification you could add a new `opts` argument that uses the values of `opt1` and `opt2:`
```{r}
my_fun <- function(x, y, opts = NULL, opt1 = 1, opt2 = 2) {
opts <- opts %||% my_fun_opts(opt1 = opt1, opt2 = opt2)
}
```
However, that introduces a dependency between the arguments: if you specify both `opts` and `opt1`/`opt2`, `opts` will win.
You could certainly add extra code to pick up on this problem and warn the user, but I think it's just cleaner to deprecate the old arguments so that you can eventually remove them:
```{r}
my_fun <- function(x, y, opts = my_fun_opts(), opt1 = deprecated(), opt2 = deprecated()) {
if (lifecycle::is_present(opt1)) {
lifecycle::deprecate_warn("1.0.0", "my_fun(opt1)", "my_fun_opts(opt1)")
opts$opt1 <- opt1
}
if (lifecycle::is_present(opt2)) {
lifecycle::deprecate_warn("1.0.0", "my_fun(opt2)", "my_fun_opts(opt2)")
opts$opt2 <- opt2
}
}
```
Then you can remove the old arguments in a future release.
## See also
- @sec-strategy-objects is a similar pattern when you have multiple options function that each encapsulate a different strategy.