generated from movestore/Template_R_Function_App
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.qmd
425 lines (303 loc) · 13.9 KB
/
README.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
---
format: gfm
editor: source
---
# Fetch and Merge Accelerometer data to Locations
MoveApps
Github repository: <https://github.com/dmpstats/Fetch_and_Bind_Acceleration_to_Locations>
## Description
Downloads available accelerometer data for individuals and time window comprised
in the input location data and merges acceleration measurements to each observed
location event based on recording time.
## Documentation
Given the individuals and time-span of recorded events in the input location
data, this App is tasked with downloading and merging available (non-location)
acceleration data to recorded locations. Merging is time-based whithin
animals/tracks, and the underlying data processing can be outlined as follows:
1. Extract animal/track IDs and start-end timestamps of location events for
individuals present in input data;
2. Check user's download permissions and availability of accelerometer (ACC) data
for each individual;
3. Download sought ACC data via `move2::download_study_data()`;
4. Process ACC data into a standardized format. ACC samples can be recorded in
Movebank in plain format (one ACC measurement on each active axis per row) or
nested format (burst of consecutive AAC values stored per row). In this step:
- Downloaded data is reshaped to return bursts of ACC values nested in a
list-column named `acc_bursts`. Therefore, each row of processed ACC data
represents an event at which a burst of ACC values (per active axis) was
recorded, with the timestamp column specifying the starting time of the ACC
event.
- The type (raw or non-raw) and tag origin (eobs or non-eobs) of ACC samples
are also added as data attributes.
- ACC data can optionally be thinned down to only retain a single ACC burst
event within a given time interval via the parameter `acc_timefilter` (or
details in the [Parameters](#parameters) section below). This feature can be
useful to avoid long computations, and potentially lack of memory issues,
when processing high-frequency ACC data (e.g. 1 burst every couple of
seconds).
5. Merge processed ACC events to location events of the same animal/track using
recording timestamps for alignment. Location event times act as reference
points, while ACC event starting times are applied as allocable variables. The
merging process governed by two alternative integration rules:
- `'latest'`: ACC events are allocated to the most recent location event
occurring prior to the ACC sampling start time.
- `'nearest'`: ACC events are assigned to the closest-in-time location event,
which can occur either before or after the ACC sampling start time.
ACC events are joined to corresponding location events in a list-column
named `acc_dt`. Data nesting is required here as, depending on the frequency
of ACC events relative to location events, consecutive ACC events may be
allocated to the same location event.
6. Prepare merged data for output, where `output` is a `move2` location object
containing the entire input data along with the following additions:
- *event data*: the list-column `acc_dt` comprising the downloaded and merged
ACC events
- *track data*: columns `acc_dwn_start_time`, `acc_dwn_end_time` and
`acc_in_timespan` providing information about the ACC downloading step.
- *object attributes*:
- `acc_merging_rule` specifying the selected merging rule[^1]
- `acc_track_data` providing the track data component of the merged
ACC data
[^1]: retrievable via `attr(output, "acc_merging_rule")`
<br />
::: {.callout-note}
The creation of this App was prompted by a current limitation in MoveApps, which
allows Workflows to be initiated from either location or non-location data but
not both simultaneously. MoveApps has plans to ease this restriction in the
future, such as enabling multiple data inputs on Apps. Consequently, it is
possible that this App may become obsolete in the future.
:::
<br />
### Input data
`move2` location object.
### Output data
`move2` location object. See point 6. of [Documentation](#documentation) for
additional context.
Section [Using the Output data](#using-the-output-data) demonstrates how to
access and manipulate the merged data in the output object.
`NULL` entries in column `acc_dt` indicate no ACC data associated with the
location events (given the chosen merging rule) or that there is no ACC data
collected for that animal (i.e. no accelerometer sensor on the tag).
### Artefacts
`downloaded_acc_data.rds`: a `tibble` object with downloaded and processed ACC
data, with ACC events and track data nested by animal in, respectively,
list-columns `acc_dt` and `acc_trk_dt`.
### Parameters
**Movebank username** (`usr`): character string, the Movebank username. Default: `NULL`.
**Movebank password** (`pwd`): character string, the Movebank password. Default: `NULL`.
**Merging Rule** (`merging_rule`): specify how downloaded Accelerometer
data is merged to location data. ACC events can allocated to either:
- the most recent location event recorded prior to the ACC sampling start time
(`'latest'`, the default), or
- the closest-in-time recorded location (`'nearest'`)
**Store ACC track information** (`store_acc_track_info`): check-box to choose
whether to store the track attribute table from merged ACC data as an attribute
of the output `move2` object. To reduce data redundancy, ACC track data is
dropped from the ACC data prior to the nesting step in the merging process. This
option allows to store the ACC's track attribute table for potential future use
or reference. Default: `FALSE`.
**Filter downloaded ACC data by time interval** (`acc_timefilter`): an integer
defining the time interval, in minutes, for thinning the temporal resolution of
the downloaded ACC data. Must be between 0 (no filtering) and 30. Unit:
`minutes`; default: `0`
### Most common errors
None yet
### Null or error handling
**data**: the App will skip the merging step and return a modified version of the input
data with an empty column `acc_dt` if:
- Accelerometer data is not collected for any of the animals in the input data,
- the user does not have downloading permission for the study
An informative warning is also issued.
**Movebank username** (`usr`) and **Movebank password** (`pwd`): If one of the
credentials are either NULL or connection to Movebank is voided due to invalid
log-in details, the app will return an error reminding the user to provide valid
credentials.
**Filter downloaded ACC data by time interval** (`acc_timefilter`): The App will
throw an error if the specified interval is not between 0 and 30.
### Using the output data
The purpose of this App is to retrieve and combine available non-location
Accelerometer (ACC) data with the provided input location data. The format of
the returned output data is structured with data nesting, providing a versatile
approach for handling the merged information in subsequent applications. The
trade-off for this flexibility is the slightly increased level of data
manipulation needed to work with nested datasets.
Here we suggest some sample code to inspect, manipulate and apply the output
data effectively in e.g. downstream Apps within a MoveApps Workflow. Let's start
by loading the dependencies required to run this demonstration.
```{r}
#| message: false
#| warning: false
# load dependencies
library(move2)
library(dplyr)
library(tidyr)
library(purrr)
library(ggplot2)
library(stringr)
library(gganimate)
source("RFunction.R")
source(file.path("src/common/logger.R"))
source(file.path("src/io/app_files.R"))
source(file.path("src/io/io_handler.R"))
```
```{r}
#| echo: false
#| include: false
options(pillar.width = 500)
# movebank credentials stored in HOME .Renviron file, set up via `usethis::edit_r_environ()`
# CAUTION: DO NOT USE YOUR ACTUAL LOGIN DETAILS HERE, OTHERWISE YOU MAY EXPOSE YOUR CREDENTIALS INADVERTENTLY
usr <- Sys.getenv("vult_usr")
pwd <- Sys.getenv("vult_pwd")
data <- readRDS("data/raw/gps_vult_sa.rds") |> # Data created via "dev/generating_example_gps_datasets.r"
filter_track_data(individual_local_identifier == "Bateleur_8887") |>
slice(1:10)
```
Now we run the App's underpinning `rFunction` for a given input `data`
using the `'latest'` merging rule.
```{r}
output <- rFunction(data, usr = usr, pwd = pwd, merging_rule = 'latest')
```
The output data is a `move2` location object, where the merged Accelerometer
(ACC) data is provided as `tibble` objects nested in list-column `acc_dt`.
```{r}
output <- output |>
dplyr::select(where(~all(!is.na(.)))) # dropping redundant columns
output
```
#### Accessing nested ACC data
ACC events recorded e.g. between the second and third location points can
retrieved by
```{r}
output |>
as_tibble() |>
slice(2) |>
dplyr::select(acc_dt) |>
unnest(acc_dt)
```
Acceleration measurements recorded in each ACC event are nested in list-column
`acc_burst`. Thus, the burst of ACC values in each active axis sampled in the
first ACC event between the second and third location points is accessible via
```{r}
output |>
as_tibble() |>
slice(2) |>
dplyr::select(acc_dt) |>
unnest(acc_dt) |>
slice(1) |>
mutate(acc_burst = map(acc_burst, as_tibble)) |> # required for nicer unnesting compared to the original matrix format
dplyr::select(acc_burst)|>
unnest(acc_burst)
```
#### `'latest'` versus `'nearest'` merging
@fig-merging-rule provide a graphical representation of the difference between
the two available options for merging the ACC data to location data (via
parameter `merging_rule`).
```{r}
#| echo: false
#| include: false
dt <- readRDS("data/raw/gps_vult_gaia.rds") |>
mt_filter_per_interval(unit = "3 secs") |>
filter_track_data(individual_local_identifier == "V051") |>
slice(1:10)
# merging via the nearest rule
output_nearest <- rFunction(dt, usr = usr, pwd = pwd, merging_rule = 'nearest') |>
dplyr::select(where(~all(!is.na(.)))) |> # dropping redundant columns
as_tibble() |>
mutate(merging = 'nearest')
output_latest <- rFunction(dt, usr = usr, pwd = pwd, merging_rule = 'latest') |>
dplyr::select(where(~all(!is.na(.)))) |> # dropping redundant columns
as_tibble() |>
mutate(merging = 'latest')
output_merg <- bind_rows(output_nearest, output_latest) |>
mutate(event_id = str_sub(event_id,-5,-1)) |>
group_by(merging)
```
```{r}
#| code-fold: true
#| out-width: 100%
#| fig-width: 10
#| fig-height: 7
#| label: fig-merging-rule
#| fig-cap: Comparing the `'latest'` and `'nearest'` rules for merging ACC data with location data. Vertical lines mark the timestamps of six location events, while points represent acceleration values sampled in ACC events. Points are colored according to their assigned location given the selected merging rule.
output_merg_unpacked <- output_merg |>
as_tibble() |>
mutate(acc_dt = map(acc_dt, \(acc_events){
if(!is.null(acc_events)){
acc_events |>
mutate(
acc_burst = pmap(
list(eobs_acceleration_sampling_frequency_per_axis, timestamp, acc_burst),
\(freq, start_time, acc){
# downsize to only first 5 measurements on each acc event, for plotting clarity
n <- 5
acc |>
as_tibble() |>
slice(floor(seq(1, nrow(acc), length.out = n))) |>
mutate(
acc_timestamp = start_time,
#acc_timestamp = seq.POSIXt(from = start_time, by = 1/freq, length.out = n),
.before = 1
)
})
) |>
unnest(acc_burst) |>
rename(acc_event_id = event_id) |>
dplyr::select(acc_event_id, acc_timestamp, matches("acc_[xyz]"))
} else NULL
})) |>
unnest(acc_dt)
location_times <- output_merg |>
ungroup() |>
distinct(event_id, geometry, timestamp) |>
mutate(timestamp = timestamp + 0.15) # tiny nudge to ease visualization (checked that merging is correct)
p <- output_merg_unpacked |>
pivot_longer(cols = c(acc_x, acc_y, acc_z), names_to = "acc_axis", values_to = "acc_value") |>
ggplot() +
geom_point(aes(x = acc_timestamp, y = acc_value, col = factor(event_id), group = merging), size = 2) +
facet_wrap(~acc_axis, ncol = 1, scales = "free_y") +
geom_vline(aes(xintercept = timestamp, col = factor(event_id)), data = location_times, linewidth = 1) +
labs(title = "merging_rule: 'latest'", y = "acceleration", x = "timestamp",
col = "Location event ID") +
scale_color_brewer(type = "qual", palette = "Paired") +
theme_light() +
theme(legend.position = "bottom")
p +
transition_states(merging, transition_length = 1, state_length = 3) +
ggtitle("Merging Rule: '{closest_state}'") +
enter_fade() +
exit_shrink() +
ease_aes('sine-in-out')
```
#### Summarising ACC measurements relative to location events
Summarizing acceleration values between consecutive location points may be
required in downstream applications such as animal behavior classifiers.
For instance, the variance of all acceleration values recorded
between each location event (per active ACC axis) can be computed as follows.
```{r}
loc_acc_var <- output |>
mutate(
var = map(acc_dt, \(acc_events){
if(!is.null(acc_events$acc_burst)){
# bind list of matrices (one per ACC event) by row
all_bursts <- do.call(rbind, acc_events$acc_burst)
# variance in each active ACC axis
apply(all_bursts, 2, var)
} else{
NULL
}
})
) |>
unnest_wider(var, names_sep = "_")
loc_acc_var
```
Note that `loc_acc_var` is not a `move2` object anymore. This is a by-product of
the un-nesting process, which drops the `move2` and `sf` classes and their
attributes. However, back-conversion is straightforward using `{move2}`
functionality and the original `output` object, as e.g.
```{r}
loc_acc_var <- loc_acc_var |>
mt_as_move2(
time_column = mt_time_column(output),
track_id_column = mt_track_id_column(output)
)
mt_is_move2(loc_acc_var)
```