-
Notifications
You must be signed in to change notification settings - Fork 1
/
Provenance retrieval.Rmd
165 lines (138 loc) · 9.53 KB
/
Provenance retrieval.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
---
title: "Provenance retrieval"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
##
runs are linked via common files
- the following listing shows all script runs that have been recorded.
- some of the runs are for scripts that are part of a processing workflow
- some of the runs are for scripts are standalone in that they have no relation
to any other script, for example script names 'EmCoverage.R', and the console log.
```
rc <- new("Recordr")
```
this script listing shows all script executions that have been recorded and are
displayed in reverse chronological order.
```
listRuns(rc)
Seq Script Tag Start Time Run Id Published Time
12 estimate_by_year.R ohi, estimate by year 2017-05-01 16:41:33 PDT ...1c5559afbe NA
11 Console log: console.log second console run 2017-03-01 14:53:26 PST ...a5dee3c795 NA
10 EmCoverage.R forth recordr test 2017-03-01 15:53:08 PST ...2e94aa9f49 NA
9 combine_inland_and_offshore.R ohi, combine 2017-05-01 12:41:32 PDT ...82ece83285 NA
8 EmCoverage.R third recordr test 2017-05-01 10:40:49 PST ...bf1c2750c5 NA
7 summarize_zonal_stats.R ohi, summarize zonal stats 2017-04-30 12:41:30 PDT ...b92a5bc6e1 NA
6 EmCoverage.R second recordr test 2017-04-30 10:37:15 PST ...a1d0a54060 NA
5 lsp_zonal_stats.R ohi, lsp zonal stats 2017-04-28 12:41:28 PDT ...9317a45564 NA
4 EmCoverage.R first recordr test 2017-04-28 10:41:48 PST ...f01176ae80 NA
3 rasterize_HS_WDPA_and_PEP.R ohi, rasterize, WDPA 2017-04-26 12:41:07 PDT ...b5143ff913 NA
2 get_analysis_rasters.R ohi, get reasters, ohibc 2017-04-26 12:41:12 PDT ...e7c8fa91f7 NA
1 setup_water_shed_raster.R ohi, setup 2017-04-26 10:41:15 PDT ...1b4496098d NA
```
we are going to use `listRuns` to retrieve executions that are part of a processing workflow
first, the linked runs can be viewed using the plotRuns command. This command begins the trace at the
specified run (seq=1) proceeds in both a forward (downstream) and reverse (upstream) direction. A forward
search causes the trace to proceed to runs that are descendants of the current run. For a forward search,
each file generated by a run is inspected and runs that used the file are considered for inclusion in the trace
as potential descendants. A reverse search looks for ancestores of the current execution. These two trace
methods can be combined by specifying a trace direction of "both".
```
plotRuns(rc, seq=1, direction="both")
```
![Provenance trace example.](./trace_06.pdf)
Next, the listRuns command will be used to retrieve and display the provenance trace, but tracing only in
the 'forward' direction. Notice that because we are searching in the forward direction, the runs
"get_analysis_rasters.R" and "rasterize_HS_WDPA_and_PEP.R" are not included.
```
listRuns(rc, seq=1, direction="forward")
Seq Script Tag Start Time Run Id Published Time
12 estimate_by_year.R ohi, estimate by year 2017-05-01 16:41:33 PDT ...1c5559afbe NA
9 combine_inland_and_offshore.R ohi, combine 2017-05-01 12:41:32 PDT ...82ece83285 NA
7 summarize_zonal_stats.R ohi, summarize zonal stats 2017-04-30 12:41:30 PDT ...b92a5bc6e1 NA
5 lsp_zonal_stats.R ohi, lsp zonal stats 2017-04-28 12:41:28 PDT ...9317a45564 NA
1 setup_water_shed_raster.R ohi, setup 2017-04-26 10:41:15 PDT ...1b4496098d NA
```
To see all runs for the OHI workflow, the trace is started at run #1 and proceeds in "both" directions,
as was done with the 'plotRuns' example:
```
listRuns(rc, seq=1, direction="both")
Seq Script Tag Start Time Run Id Published Time
12 estimate_by_year.R ohi, estimate by year 2017-05-01 16:41:33 PDT ...1c5559afbe NA
9 combine_inland_and_offshore.R ohi, combine 2017-05-01 12:41:32 PDT ...82ece83285 NA
7 summarize_zonal_stats.R ohi, summarize zonal stats 2017-04-30 12:41:30 PDT ...b92a5bc6e1 NA
5 lsp_zonal_stats.R ohi, lsp zonal stats 2017-04-28 12:41:28 PDT ...9317a45564 NA
3 rasterize_HS_WDPA_and_PEP.R ohi, rasterize, WDPA 2017-04-26 12:41:07 PDT ...b5143ff913 NA
2 get_analysis_rasters.R ohi, get reasters, ohibc 2017-04-26 12:41:12 PDT ...e7c8fa91f7 NA
1 setup_water_shed_raster.R ohi, setup 2017-04-26 10:41:15 PDT ...1b4496098d NA
```
As the trace proceeds, each ancestore or descendant encounted is considered a "level", analogous to a
"generation".
```
listRuns(rc, seq=1, direction="forward", levels=1)
Seq Script Tag Start Time Run Id Published Time
5 lsp_zonal_stats.R ohi, lsp zonal stats 2017-04-28 12:41:28 PDT ...9317a45564 NA
1 setup_water_shed_raster.R ohi, setup 2017-04-26 10:41:15 PDT ...1b4496098d NA
```
Detailed listings for each run in a trace can be viewed using the same arguments for listRuns.
The following command will retrieve all linked runs and display each run
```
viewRuns(rc, seq=1, direction="forward")o
[details]: Run details
----------------------
“/Users/slaughter/R/x86_64-darwin-...y/OHI-Scienct/setup_watershed_raster.R” was executed on 2017-04-26 10:41:15 PST
Tag: “ohi, setup”
Run sequence #: 1
Publish date: Not published
Published to: NA
Published Id: NA
View at: NA
Run by user: slaughter
Account subject: NA
Run Id: urn:uuid:cfca952c-16f1-4e41-b022-1b4496098d
Data package Id: urn:uuid:2d2420c4-3fb4-4dce-b299-131d10851753
HostId: Peters-MBP.domain
Operating system: x86_64-apple-darwin13.4.0
R version: R version 3.3.2 (2016-10-31)
Dependencies: stats, graphics, grDevices, utils, datasets, methods, base, hash_2.2.6, Rcpp_0.12.9, knitr_1.15.1, magrittr_1.5, roxygen2_5.0.1, rappdirs_0.3.1, munsell_0.4.3, uuid_0.1-2, colorspace_1.3-2, R6_2.2.0, stringr_1.2.0, httr_1.2.1, plyr_1.8.4, tools_3.3.2, grid_3.3.2, redland_1.0.17-9, gtable_0.2.0, parsedate_1.1.1, DBI_0.5-1, htmltools_0.3.5, assertthat_0.1, lazyeval_0.2.0, yaml_2.1.14, rprojroot_1.2, digest_0.6.12, tibble_1.2, base64enc_0.1-3, datapack_1.1.0.9000, evaluate_0.10, memoise_1.0.0, RSQLite_1.1-2, rmarkdown_1.3, stringi_1.1.2, scales_0.4.1, backports_1.0.5, XML_3.98-1.5, jsonlite_1.2, ggplot2_2.2.1, recordr_1.0.3.9000, EML_1.0.1.1, dataone_2.0.1.9000
Run start time: 2017-04-26 10:41:15 PST
Run end time: 2017-04-26 10:41:48 PST
[used]: 1 items used by this run
-----------------------------------
Location Size (kb) Modified time
/Users/slaughter/R/x86_64-da...tdata/OHI-Science/ohibc_rgn_raster_500m.tif 138365 2017-02-25 14:58:19
[generated]: 1 items generated by this run
-----------------------------------------
Location Size (kb) Modified time
/private/var/folders/zb/y107...gGjbo/OHI-Science/howe_sound_watershed_500m.tif 8052 2017-02-26 10:41:48
enter <return> to view next run, or "q"<return> to quit.
```
Executions from a trace can be published to a repository using the publishRuns command. In the following example
the publishRuns command arguments used perform the same provenance trace as the plotRuns command shown earlier,
so that all runs for the OHI processing are published. These runs are published one at a time, so
eight separate packages will be uploaed.
```
publishRuns(rc, seq=1, direction="both", quiet=FALSE)
Published run "...1c5559afbe" (ohi, estimate by year ) to https://mn-dev-ucsb-1.test.dataone.org
Published run "...82ece83285" (ohi, combine) to https://mn-dev-ucsb-1.test.dataone.org
Published run "...b92a5bc6e1" (ohi, summarize zonal stats) to https://mn-dev-ucsb-1.test.dataone.org
Published run "...9317a45564" (ohi, lsp zonal stats) to https://mn-dev-ucsb-1.test.dataone.org
Published run "...b5143ff913" (ohi, rasterize, WDPA) to https://mn-dev-ucsb-1.test.dataone.org
Published run "...e7c8fa91f7" (ohi, get reasters, ohibc) to https://mn-dev-ucsb-1.test.dataone.org
Published run "...1b4496098d" (ohi, setup) to https://mn-dev-ucsb-1.test.dataone.org
```
Alternatively, all runs can be combined in a single package:
```
publishRuns(rc, seq=1, direction="both", quiet=FALSE, combinePackages=TRUE)
Publishign all runs into a single package:
Including run "...1c5559afbe" (ohi, estimate by year ) to https://mn-dev-ucsb-1.test.dataone.org
Including run "...82ece83285" (ohi, combine) to https://mn-dev.ucsb-1.test.dataone.org
Including run "...b92a5bc6e1" (ohi, summarize zonal stats) to https://mn-dev.ucsb-1.test.dataone.org
Including run "...9317a45564" (ohi, lsp zonal stats) to https://mn-dev.ucsb-1.test.dataone.org
Including run "...b5143ff913" (ohi, rasterize, WDPA) to https://mn-dev.ucsb-1.test.dataone.org
Including run "...e7c8fa91f7" (ohi, get reasters, ohibc) to https://mn-dev.ucsb-1.test.dataone.org
Including run "...1b4496098d" (ohi, setup) to https://mn-dev.ucsb-1.test.dataone.orgP
Publishing Package blah to https://mn-dev-ucsb-1.test.dataone.org
```