-
Notifications
You must be signed in to change notification settings - Fork 19
/
01-spatial-data.Rmd
410 lines (273 loc) · 16.4 KB
/
01-spatial-data.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
```{r, echo=FALSE, purl=FALSE, message=FALSE}
knitr::opts_chunk$set(results='hide', comment = "#>", purl = FALSE)
## libraries needed for R code examples
library(terra)
library(sf)
library(tidyverse)
```
# Introduction to spatial data in R {#intro}
> Learning Objectives
>
> * Read table with geo coordinates into `sf` object
> * Read shapefiles into `sf` object
> * Examine `sf` objects
> * Use base plot with `sf` objects and attribute data
> * Read GeoTiff single and multiband into a `SpatRaster` object
> * Examine `SpatRaster` objects
--------------
## The `sf` package
The [`sf`](https://cran.r-project.org/package=sf)[^3] package was first released on CRAN in late October 2016, and has in the mean time superseded the original R Package for storing and manipulating spatial data, [`sp`](https://CRAN.R-project.org/package=sp ), which was first released in 2005. `sp` is still actively maintained, but less often used now, so you should be aware of it, but we will not teach it here.
```{r spsf-dl, echo=FALSE, results='asis', out.width='100%', fig.cap='sf vs sp downloads on CRAN'}
knitr::include_graphics('img/spsf_downloads.png')
```
`sf` implements a formal standard called ["Simple Features"](https://en.wikipedia.org/wiki/Simple_Features) that specifies a storage and access model of spatial geometries (point, line, polygon). A feature geometry is called simple when it consists of points connected by straight line pieces, and does not intersect itself. This standard has been adopted widely, not only by spatial databases such as PostGIS, but also more recent standards such as GeoJSON.
[^3]: E. Pebesma & R. Bivand (2016)[Spatial data in R: simple features and
future perspectives](http://pebesma.staff.ifgi.de/pebesma_sfr.pdf)
If you work with PostGis or GeoJSON you may have come across the [WKT (well-known text)](https://en.wikipedia.org/wiki/Well-known_text) format (Fig 1.1 and 1.2)
```{r wkt-primitives, echo=FALSE, results='asis', out.width='100%', fig.cap='Well-Known-Text Geometry primitives (wikipedia)'}
knitr::include_graphics('img/wkt_primitives.png')
```
```{r wkt-multipart, echo=FALSE, results='asis', out.width='100%', fig.cap='Well-Known-Text Multipart geometries (wikipedia)'}
knitr::include_graphics('img/wkt_multipart.png')
```
`sf` implements this standard natively in R. In `sf` spatial objects are stored as a tabular format (data frame) with a special column that contains the information for the geometry coordinates. That special column holds a list with the same length as the number of rows in the data frame. Each of the individual list elements then can be of any length needed to hold the coordinates that correspond to an individual feature.
`sf` objects are built up using the following structures:
1. `sfg` - simple feature geometry (one feature)
2. `sfc` - simple feature collection (a collection of `sfg`)
3. `sf` - simple feature object (`sfc` with data attributes)
So to create a spatial `sf` object manually the basic steps would be:
> **I. Create geometric objects (topology)**
Geometric objects (simple features) can be created from a numeric vector, matrix or a list with the coordinates. They are called `sfg` objects for Simple Feature Geometry.b There are functions that help create simple feature geometries, like `st_point()`, `st_linestring()`, `st_polygon()` and more.
> **II. Combine all individual single feature objects for the special column.**
The feature geometries are then combined into a Simple Feature Collection with `st_sfc()`. which is nothing other than a simple feature geometry list-column. The `sfc` object also holds the bounding box and the projection information.
> **III. Add attributes**.
Lastly, we add the attributes to the the simple feature collection with the `st_sf()` function. This function extends the well known data frame in R with a column that holds the simple feature collection.
To create a network of highways we would first generate LINESTRINGs as simple feature geometries out of a matrix with coordinates:
```{r sfg-line, results='show'}
lnstr_sfg1 <- st_linestring(matrix(runif(6), ncol=2))
lnstr_sfg2 <- st_linestring(matrix(runif(6), ncol=2))
class(lnstr_sfg1)
```
We would then combine this into a simple feature collection :
```{r sfc-line, results='show'}
(lnstr_sfc <- st_sfc(lnstr_sfg1, lnstr_sfg2))
```
And lastly create a data frame from above to generate the `sf` object:
```{r sf-line, results='show'}
dfr <- data.frame(id = c("hwy1", "hwy2"),
cars_per_hour = c(78, 22))
(lnstr_sf <- st_sf(dfr , lnstr_sfc))
```
There are many methods available in the `sf` package, to find out use
```{r sf-methods, results='show', warning=FALSE}
methods(class="sf")
```
Here are some of the other highlights of `sf` you might be interested in:
- provides **fast** I/O, particularly relevant for large files
- spatial functions that rely on GEOS and GDAL and PROJ external libraries are directly linked into the package, so no need to load additional external packages (like in `sp`)
- `sf` objects can be plotted directly with `ggplot`
- `sf` directly reads from and writes to spatial **databases** such as PostGIS
- `sf` is compatible with the [`tidyvderse` approach](https://www.tidyverse.org/), (but see some [pitfalls here](https://geocompr.github.io/geocompkg/articles/tidyverse-pitfalls.html))
Note that `sp` and `sf` are not the only way spatial objects are conceptualized in R. Other spatial packages may use their own class definitions for spatial data (for example `spatstat`).
There are packages specifically for the [GeoJSON](https://tools.ietf.org/html/rfc7946) and for that reason are more lightweight, for example [`geojson`](https://cran.r-project.org/package=geojson)
Usually you can find functions that convert objects to and from these formats.
> <h3>Challenge</h3>
>
> Generate an `sf` point object.
>
> 1. Create a matrix `pts` of random numbers with two columns and as many rows as you like. These are your points.
> 2. Create a dataframe `attrib_df` with the same number of rows as your `pts` matrix and a column that holds an attribute. You can make up any attribute.
> 3. Use the appropriate commands and `pts` an `sf` object with a gemoetry column of class `sfc_POINT`.
> 4. Try to subset your spatial object using the attribute you have added and the way you are used to from regular data frames.
> 5. How do you determine the bounding box of your spatial object?
```{r eval=FALSE, echo=FALSE}
# Answer
pts <- matrix(runif(20), ncol=2) # the matrix with the points
attrib_df <- data.frame(an_attribute = rep(LETTERS[1:5], each = 2)) # attribute table
## sf approach ##
pts_sfg_list <- lapply(seq_len(nrow(pts)), function(i) st_point(pts[i,])) # a simple feature geometry
pts_sfc <- st_sfc(pts_sfg_list) # a simple feature collection
pts_sf <- st_sf(pts_sfc, attrib_df) # an sf object
pts_sf
# Subsetting
pts_sf$an_attribute # column with attribute only -- this is a vecor
subset(pts_sf, an_attribute == "A") # subset with attribute A -- this is an SF object
# bounding box:
st_bbox(pts_sf)
```
## Creating a spatial object from a lat/lon table
Often in your research might have a spreadsheet that contains latitude, longitude and perhaps some attribute values. You know how to read the spreadsheet into a tabular format (tibble) with `dplyr::read_table` or `dplyr::read_csv`. We can then very easily convert the table into a spatial object in R.
An `sf` object can be created from a data frame in the following way. We take advantage of the `st_as_sf()` function which converts any foreign object into an `sf` object. Similarly to above, it requires an argument `coords`, which in the case of point data needs to be a vector that specifies the data frame's columns for the longitude and latitude (x,y) coordinates.
my_sf_object <- st_as_sf(myDataframe, coords)
`st_as_sf()` creates a new object and leaves the original data frame untouched.
We use `read_csv()` to read `philly_homicides.csv` into a tibble in R and name it `philly_homicides_df`.
```{r read-df, results='show'}
philly_homicides_df <- read_csv("data/philly_homicides.csv")
```
We convert the `philly_homicides_df ` data frame into an `sf` object with `st_as_sf()`
```{r df-to-sf, results='show'}
library(sf)
philly_homicides_sf <- st_as_sf(philly_homicides_df, coords = c("POINT_X", "POINT_Y"))
names(philly_homicides_sf)
```
Note the additional **geometry** list-column which now holds the simple feature collection with the coordinates of all the points. We now use `st_crs()` to check on the projection.
```{r sf-nocrs, results='show'}
st_crs(philly_homicides_sf)
```
To make it a complete geographical object we use `st_crs()` to assign the WGS84 projection, which has the EPSG code 4326:
```{r sf-crs, results='show'}
st_crs(philly_homicides_sf) <- 4326
st_crs(philly_homicides_sf)
```
Wow this is long. It is usually more helpful to just retrieve the proj4string:
```{r sf-crs-short, results='show'}
st_crs(philly_homicides_sf)$proj4string
```
We will save this object as a shapefile on our hard drive for later use. (Note that by default `st_write` checks if the file already exists, and if so it will not overwrite it. If you need to force it to overwrite use the option `delete_layer = TRUE`.)
```{r sf-save-homicides, eval=FALSE}
st_write(philly_homicides_sf, "data/PhillyHomicides", driver = "ESRI Shapefile")
# to force the save:
st_write(philly_homicides_sf, "data/PhillyHomicides", driver = "ESRI Shapefile", delete_layer = TRUE)
```
## Loading shape files into R
`sf` relies on the powerful [GDAL library](http://gdal.org), which is automatically linked in when loading `sf`. The GDAL provides the functionality to read and write spatial files of many formats. For shape files we can use `st_read()`, which simply takes the path of the directory with the shapefile as argument.
```{r read-shp-sf, eval=TRUE, results='show'}
# read in
philly_sf <- st_read("data/Philly/")
# take a look at what we've got
str(philly_sf) # note again the geometry column
```
Two more words about the geometry column: Though it is not recommended, you can name this column any way you wish. Secondly, you can remove this column and revert to a regular, non-spatial data frame at any dime with `st_drop_geometry()`.
The default `plot` of an `sf` object is a multi-plot of the first attributes, with a warning if not all can be plotted:
```{r plot-shp-sf, eval=TRUE, results='show'}
plot(philly_sf)
```
In order to only plot the polygon boundaries we need to directly use the geometry column. We use the `st_geometry()` function. It extracts the `sfc` object (the simple feature geometry list column):
```{r plot-shp-sfg, eval=TRUE, results='show'}
plot(st_geometry(philly_sf))
```
Let's add a subset of polygons with only the census tracts where the median houshold income (*medHHinc*) is more than $60,000. We can extract elements from an `sf` object based on attributes using the dplyr filter function (base R subsetting also works) and add the census tracts to the plot in a different color.
```{r pipe-subset-plot-shp-sfg, eval=TRUE}
plot(st_geometry(philly_sf))
philly_sf %>%
filter(medHHinc > 60000) %>% # filter for high income
st_geometry() %>% # extract the geometry for plotting
plot(col="red", add=T) # add to the plot
```
## Raster data in R
Raster files, as you might know, have a more compact data structure than vectors. Because of their regular structure the coordinates do not need to be recorded for each pixel or cell in the rectangular extent. A raster is defined by:
- a CRS
- coordinates of its origin
- a distance or cell size in each direction
- a dimension or numbers of cells in each direction
- an array of cell values
Given this structure, coordinates for any cell can be computed and don't need to be stored.
`terra` was first released in 2020 and now replaces the `raster` package which was first released in 2010. `terra` has greater functionality, is faster and easier to use.
```{r raster-terraDownloads, echo=FALSE, results='asis', out.width='100%', fig.cap='raster vs terra downloads on CRAN'}
knitr::include_graphics('img/raster_terraDownloads.png')
```
The `terra` package has functions for creating, reading, manipulating, and writing raster data. The package also implements raster algebra and many other functions for raster data manipulation.
The package works with `SpatRaster` objects. The `rast()` function is used to create these objects. For example, to create a raster object from scratch we would do the following:
```{r tidy=F, results='show'}
library(terra)
r <- rast(nrows=20, ncols=20, # number of cells in x and y dimension
xmin=0, xmax=360) # min and max x coordinates (left-right borders)
r
```
From the output above we know that:
- the object is of class `SpatRaster`
- its dimensions are 20x20 cells
- it has one layer (band)
- the extent of the raster
- it __has a CRS defined!__ If the crs argument is missing when creating the SpatRaster object, if the x coordinates are within -360 and 360 and the y coordinates are within -90 and 90, the WGS84 projection is used by default.
Good to know.
There are functions to look at individual properties of the raster object. For examle for the number of cells:
```{r raster-cells, results='show'}
ncell(r)
```
Or we can retrieve just the number of bands using the `nlyr()` function.
```{r n-bands, results='show'}
nlyr(r)
```
We can also find out about the Coordinate Reference System (CRS) with the `crs` function. The default output looks a little messy:
```{r raster-crs, results='show'}
crs(r)
```
We can make this easier to read by setting the proj argument:
```{r raster-crsprj, results='show'}
crs(r, proj = TRUE) # return the PROJ-string notation
```
Let's try and plot this.
```{r tidy=F, eval=FALSE}
plot(r)
```
The canvas is empty! This is because even though we have a layer, the cells do not have any values.
```{r tidy=F, eval=FALSE}
values(r)
```
To add some random values to the cells we can take advantage of the `ncells()` function and do this:
```{r tidy=F, results='show'}
values(r) <- runif(ncell(r))
r
```
In addition to the output above, we now see:
- the source, which indicates where the cell values are stored (here they are in memory)
- the range of the cell values (min value adn max value) now added.
- the name, of the layer which is by default lyr.1.
This now plots successfully:
```{r tidy=F, results='show'}
plot(r)
```
(The [`rasterVis` package](https://cran.r-project.org/web/packages/rasterVis/index.html) provides a set of methods for enhanced visualization and interaction for more advanced plotting of raster objects.)
`SpatRaster` objects can also be created from a matrix.
```{r volcano-raster, results='show'}
class(volcano)
volcano.r <- rast(volcano)
class(volcano.r)
```
We also use the `rast()` function to read in a raster file. This raster is generated as part of the [NEON Harvard Forest field site](https://www.neonscience.org/field-sites/field-sites-map/HARV).
```{r read-raster, results='show'}
HARV <- rast("data/HARV_RGB_Ortho.tif")
```
Typing the name of the object will give us what's in there:
```{r show-raster, results='show'}
HARV
```
> <h3>Challenge</h3>
>
> Based on the output above answer the following questions:
>
> 1. How many bands?
> 2. What are the names of the bands)?
> 3. Where are the cell values stored?
> 4. What is the CRS?
We can plot the object like this:
```{r plot-raster, results='show'}
plot(HARV)
```
Or to plot a single band:
```{r plot-raster-single, results='show'}
plot(HARV, 3)
```
We can also use the `rast()` function to import one single band:
```{r one-multiband, results='show'}
HARV_Band2 <- rast("data/HARV_RGB_Ortho.tif", lyrs = 2)
plot(HARV_Band2)
```
Let's now explore the distribution of values contained within our raster using the `hist()` function which produces a histogram. Histograms are often useful in identifying outliers and bad data values in our raster data.
```{r n-hist, results='show'}
hist(HARV)
```
Notice that a warning message is produced when R creates the histogram. By default the maximum cells processed per band is 1,000,000. This maximum value is to ensure processing efficiency as our data become larger. We can force the `hist` function to use all cell values.
```{r n-hist-allvals, results='show'}
ncell(HARV)
hist(HARV, maxcell = ncell(HARV))
```
At times it may be useful to explore raster metadata before loading them into R. This can be done with the function `describe()`
```{r describe, results='show'}
describe("data/HARV_RGB_Ortho.tif")
```
For the many functions available for working with such an object see:
```{r raster-methods, results='show'}
methods(class="SpatRaster")
```