forked from MilesMcBain/ssa_targets_workshop
-
Notifications
You must be signed in to change notification settings - Fork 0
/
debugging.qmd
204 lines (148 loc) · 7.76 KB
/
debugging.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
# Debugging {targets} with new access panels
In her highly recommended talk [**Object of type 'closure' is not subsettable**](https://www.youtube.com/watch?v=vgYS-F8opgE), Jenny Bryan discusses
leaving yourself 'access panels', like options or arguments that turn on features that help your future self in debugging endeavours. As we shall see `{targets}` powerful new debugging access panels.
First we look at how problems present and then we review a spectrum of increasingly powerful debugging techniques made available by `{targets}`
## What it looks like when things go bad
### Errors
By default, when an error occurs in targets the pipeline stops. It should be pretty clear from `{targets}`' output which target has thrown the error:
```
▶ dispatched target species_classification_model
✖ errored target species_classification_model
✖ errored pipeline [0.184 seconds]
Error:
! Error running targets::tar_make()
Error messages: targets::tar_meta(fields = error, complete_only = TRUE)
Debugging guide: https://books.ropensci.org/targets/debugging.html
How to ask for help: https://books.ropensci.org/targets/help.html
Last error message:
I'm broken
Last error traceback:
fit_final_species_classification_model(training_data = training(test_tra...
stop("I'm broken")
.handleSimpleError(function (condition) { state$error <- build_mess...
h(simpleError(msg, call))
```
### Warnings
If your problem results in warnings appearing they won't stop the pipeline. Instead you'll see something like:
```
▶ dispatched target species_classification_model
● completed target species_classification_model [3.249 seconds]
✔ skipped target species_model_validation_data
✔ skipped target base_plot_model_roc_object
✔ skipped target gg_species_class_accuracy_hexes
✔ skipped target report
▶ ended pipeline [5.439 seconds]
Warning messages:
1: I'm warning you
2: 1 targets produced warnings. Run targets::tar_meta(fields = warnings, complete_only = TRUE) for the messages.
NULL
```
So although we don't immediately see which target threw the warnings, `{targets}` does tell us how to find that out. If we run the suggested code:
```{r}
#| eval: false
targets::tar_meta(fields = warnings, complete_only = TRUE)
```
we get precisely the metadata we need:
```
# A tibble: 1 × 2
name warnings
<chr> <chr>
1 species_classification_model Im warning you
```
### If we get neither
If we just got some nonsense results we might have to work a bit harder to
figure out where to start looking for the problem. The process we described for
peer reviewing the pipeline in the 'targets plan' section is similar to how we could
approach finding the logic problem efficiently.
## The debugging arsenal
### Call `tar_load()` and tinker
You'll very quickly be able to populate all a target's inputs in your global environment by using `tar_load()`.
- This is why having functions that use the same argument names as the targets they take as arguments is quite beneficial.
- If this is not the case you might enjoy loading all the input targets and then calling `debugonce`, before manually running the problematic target's expression interactively.
### Use browser()
R's classic can be brought to bear!
- Just one obstacle, the targets are typically built in a separate session that we don't have interactive access to!
- We can actually run the pipeline in the current interactive R session.
- Just make sure the session is pretty 'fresh' or you may create more problems than you solve.
By way of example:
1. put `browser()` on the first line off `fit_final_species_classification_model()`
2. To build the pipeline run `tar_make(callr_function = NULL)`
- We're saying "Don't use {callr}" which is the method of creating child sessions for our pipeline execution.
End up interactively debugging the target:
```
✔ skipped target gg_species_distribution_hexes
✔ skipped target gg_species_distribution_months
✔ skipped target test_train_split
✔ skipped target species_classification_model_training_summary
▶ dispatched target species_classification_model
Called from: fit_final_species_classification_model(training_data = training(test_train_split),
species_classification_model_training_summary)
Browse[1]>
```
### Use the 'debug' option
This behaves like using `browser()` above, but is a bit better since you don't have to make a change to your code that you could forget to undo!
- Does anyone else commit `browser()` to repos embarrassingly frequently?
If you add to `tar_option_set()` in `_targets.R`
```{r}
#| eval: false
tar_option_set(
seed = 2048,
debug = "species_classification_model"
)
```
Then you can call `tar_make()` and the pipeline will pause for interactive debugging when `species_classification_model` is reached.
If you'd like to speed things up by skipping processing any other targets you can do:
```{r}
#| eval: false
tar_make(species_classification_model, callr_function = NULL, shortcut = TRUE)
```
And `{targets}` will immediately begin debugging this target.[^1]
[^1]: It may be tempting to use `shortcut` more frequently to speed things up, but using `shortcut` is equivalent to running a numbered pipeline stage script without running the prior scripts in the 'classic R project' we started with. Do it too often and you'll have reproducibility debt that needs to be paid down in bulk.
Being able to name a target to debug increases in usefulness once we understand a more advanced concept called 'branching'.
### Use the `workspace` option
This is my personal go-to when things just aren't making sense. A 'workspace' is
the set of all of a target's inputs. Since targets should be pure functions,
this should be all the state we need to investigate, reproduce, and fix bugs
occurring in that target.
The first way to use workspaces is to set an option that automatically saves them on error:
```{r}
#| eval: false
tar_option_set(
seed = 2048,
workspace_on_error = TRUE
)
```
When an error occurs we will get a slightly different output:
```
✔ skipped target test_train_split
✔ skipped target species_classification_model_training_summary
▶ dispatched target species_classification_model
▶ recorded workspace species_classification_model
✖ errored target species_classification_model
✖ errored pipeline [0.215 seconds]
```
If we call `tar_workspace(species_classification_model)`, all of the dependencies of `species_classification_model`
will be loaded into the global environment. These are:
- `test_train_split`
- `species_classification_model_training_summary`
But isn't this just the same as calling `tar_load`?
- Hopefully / Mostly yes!
- But occasionally through contrived circumstances you may not be `tar_load`ing what you think you are. In this case there's no way for that mistake to happen.
- There are also circumstances where you might not know the names of a specific target's inputs, and so cannot `tar_load` them at all.
- More on this when we talk about 'branching'
There's also another way to use workspaces, when you might not be getting an error, but you want record a workspace to check on suspicious behaviour. We can instead do:
```{r}
#| eval: false
tar_option_set(
seed = 2048,
workspaces = c("species_classification_model", "occurrences_weather_hexes")
)
```
And workspaces for these targets will be recorded, whether they error or not.
# In practice
In my personal experience > 90% of targets bugs can be quickly dispatched by the
'Call `tar_load()` and tinker' approach.
If that fails I reach straight for workspaces. When I am using this mysterious
'branching' thing I keep referring to I'll rely on workspaces more frequently.
So if you take one thing from this section it should be:
- There's this 'workspaces' concept that will probably help if you're having a hard time debugging something.