-
Notifications
You must be signed in to change notification settings - Fork 17
/
README.Rmd
294 lines (213 loc) · 8.21 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
)
hook_output <- knitr::knit_hooks$get("output")
knitr::knit_hooks$set(output = function(x, options) {
if (!is.null(n <- options$out.lines)) {
x <- xfun::split_lines(x)
if (length(x) > n) {
if (x[length(x)] == "") {
x <- c(head(x, n / 2), "....", tail(x, n / 2 + 1))
} else {
x <- c(head(x, n / 2), "....", tail(x, n / 2))
}
}
}
hook_output(x, options)
})
```
# paleobioDB
<!-- badges: start -->
[![CRAN status](https://www.r-pkg.org/badges/version/paleobioDB)](https://CRAN.R-project.org/package=paleobioDB)
[![rstudio mirror downloads](https://cranlogs.r-pkg.org/badges/paleobioDB)](https://github.com/r-hub/cranlogs.app)
[![R-CMD-check](https://github.com/ropensci/paleobioDB/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/ropensci/paleobioDB/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
`paleobioDB` is a package for downloading, visualizing and processing
data from the [Paleobiology Database](https://paleobiodb.org/).
## Installation
Install the latest release from CRAN:
```r
install.packages("paleobioDB")
```
Alternatively, you can install the development version of paleobioDB
from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("ropensci/paleobioDB")
```
## General overview
`paleobioDB` has 19 functions to wrap most endpoints of the PaleobioDB
API, plus 8 functions to visualize and process the fossil data. The
API documentation for the Paleobiology Database can be found
[here](https://paleobiodb.org/data1.2/).
## Download fossil occurrences from PaleobioDB
### `pbdb_occurrences`
Here is an example of how to download all fossil occurrences that
belong to the family Canidae in the Quaternary:
```{r canidae}
library(paleobioDB)
canidae <- pbdb_occurrences(
base_name = "canidae",
interval = "Quaternary",
show = c("coords", "classext"),
vocab = "pbdb",
limit = "all"
)
dim(canidae)
head(canidae, 3)
```
Note that if the plotting and analysis functions of this package are
going to be used (as demonstrated in the sections below), it is
necessary to specify the parameter `show = c("coords", "classext")` in
the `pbdb_occurrences()` function. This returns taxonomic and
geographic information for the occurrences that is required by these
functions.
### Caution with the raw data
Beware of synonyms and errors, they could twist your estimations about
species richness, evolutionary and extinction rates, etc. `paleobioDB`
users should be critical about the raw data downloaded from the
database and filter the data before analyzing it.
For instance, when using `base_name` for downloading information with
the function `pbdb_occurrences()`, check out the synonyms and errors
that could appear in `accepted_name`, `genus`, etc. If they are not
corrected or eliminated, they will increase the richness of genera.
## Map the fossil records
### `pbdb_map`
Plots a map showing fossil occurrences and invisibly returns a data
frame with the number of occurrences per coordinate.
```{r pbdb_map, out.lines = 12}
(pbdb_map(canidae))
```
### `pbdb_map_occur`
Returns a map and a raster object with the sampling effort (number of
fossil records per cell). The user can change the resolution of the
cells.
```{r pbdb_map_occur}
pbdb_map_occur(canidae, res = 5)
```
### `pbdb_map_richness`
Returns a map and a raster object with the number of different
species, genera, family, etc. per cell. As with `pbdb_map_occur()`,
the user can change the resolution of the cells.
```{r pbdb_map_richness}
pbdb_map_richness(canidae, res = 5, rank = "species")
```
If you do not need the plot and you are only interested in obtaining a
richness raster for some other purposes, you could use the argument
`do_plot = FALSE`. For instance, this returns the same raster object
as above (a `SpatRaster` object from the `terra` package) without
plotting it:
```{r pbdb_map_richness_no_plot}
pbdb_map_richness(canidae, res = 5, rank = "species", do_plot = FALSE)
```
The `do_plot` argument is available in all the functions in the
package that produce a plot and return an object. This means that the
plot is optional in all the other plotting functions that are
described here. Check their documentation for more details.
## Explore your fossil data
### `pbdb_temp_range`
Returns a data frame and a plot with the time span of the species,
genera, families, etc. in your query. Make sure that enough vertical
space is provided in the graphics device used to do the plotting if
there are many taxa of the specified rank in your data set.
```{r pbdb_temp_range, fig.dim = c(7, 10), out.lines = 12}
pbdb_temp_range(canidae, rank = "species")
```
### `pbdb_richness`
Returns a data frame and a plot with the number of species (or genera,
families, etc.) across time. You should set the temporal extent and
the temporal resolution for the steps.
```{r pbdb_richness, out.lines = 12}
pbdb_richness(canidae, rank = "species", temporal_extent = c(0, 5), res = 0.5)
```
### `pbdb_orig_ext`
Returns a data frame and a plot with the number of appearances and
dissapearances of taxa between consecutive time intervals in the data
you provide. These time intervals are defined by the temporal extent
(`temporal_extent`) and resolution (`res`) arguments. This is another
way of visualizing the same information that is shown in the
`pbdb_temp_range()` plot. `orig_ext = 1` plots new appearances:
```{r pbdb_orig_ext_1, out.lines = 12}
pbdb_orig_ext(
canidae,
rank = "species",
orig_ext = 1, temporal_extent = c(0, 5), res = 0.5
)
```
And `orig_ext = 2` plots disappearances of taxa between time intervals
in the provided data frame.
```{r pbdb_orig_ext_2, out.lines = 12}
pbdb_orig_ext(
canidae,
rank = "species",
orig_ext = 2, temporal_extent = c(0, 5), res = 0.5
)
```
### `pbdb_subtaxa`
Returns a plot and a data frame with the number of species, genera,
families, etc. in your dataset.
```{r pbdb_subtaxa}
pbdb_subtaxa(canidae)
```
### `pbdb_temporal_resolution`
Returns a plot and a list with a summary of the temporal resolution of
the fossil records.
```{r pbdb_temporal_resolution, out.lines = 12}
pbdb_temporal_resolution(canidae)
```
## Docker
We include a Dockerfile to ease working on the package as it fulfills
all its system dependencies.
How to load the package with Docker:
1. Install Docker. Reference here: https://docs.docker.com/get-started/
2. Build the docker image. From the root folder of this
repository, type:
```bash
docker build -t rpbdb Docker
```
This command will create a docker image in your system based on some
of the [rocker/tidyverse](https://hub.docker.com/r/rocker/tidyverse/)
images. You can see the new image with `docker image ls`.
3. Start a container for this image. Type the following command
picking some `<password>` of your choice.
```bash
docker run -d --name="rpbdb_rstudio" --rm -p 8787:8787 \
-e PASSWORD=<password> -v $PWD:/home/rstudio rpbdb
```
This will start a container with access to your current folder where
all the code of the package is. Inside the container, the code will
be located in `/home/rstudio`. It also exposes the port 8787 of the
container so you may access the RStudio web application which is
bundled in the Rocker base image.
4. Then, you can either:
- Navigate to `http://localhost:8787`. Enter with username
"rstudio" and the password you used in the command above.
- Or you can enter the container via console with:
```bash
docker exec -ti -u rstudio -w /home/rstudio rpbdb_rstudio R
```
Either from RStudio or from within the container you can install the
package from source with:
```r
install.packages(".", repos = NULL, type = "source")
```
5. When you are finished, you can stop the container:
```bash
docker container stop rpbdb_rstudio
```
## Meta
Please report any [issues or bugs](https://github.com/ropensci/paleobioDB/issues).
License: GPL-2
```{r, echo = FALSE}
citation("paleobioDB")
```
---
This package is part of the [rOpenSci](https://ropensci.org/packages) project.
[![](https://ropensci.org/public_images/github_footer.png)](https://ropensci.org)