-
Notifications
You must be signed in to change notification settings - Fork 13
Pedro R. Andrade, Rodrigo Avancini
Neighborhoods describe how spatial entities are connected. They summarize (possibly complex) geospatial relations into a simple representation that makes the simulation faster. This tutorial describes how to create Generalized Proximity Matrixes (GPM) within TerraME. GPM is based on the idea that Euclidean spaces are not enough to describe the relations that take place within the geographical space GPM is composed by a set of strategies that try to capture such spatial warp, computing operations over sets of spatial data.
In the next sections, we will describe the basic structure of the implementation and present some examples of creating proximity matrixes. For more information about GPM, see Aguiar et al. (2003).
In this tutorial, we use some spatial data available within gpm
package (documentation available here):
- a set of lines, representing
roads
; - a set of points, representing the center of
communities
; - a set of polygons, representing
farms
. - a set of
cells
(created fromfarms
using scriptcells.lua
within the package itself).
import("gis")
proj = Project{
file = "gpm-test.tview",
clean = true,
roads = filePath("roads.shp", "gpm"),
communities = filePath("communities.shp", "gpm"),
farms = filePath("farms.shp", "gpm"),
cells = filePath("cells.shp", "gpm")
}
roads = CellularSpace{project = proj, layer = "roads"}
communities = CellularSpace{project = proj, layer = "communities"}
farms = CellularSpace{project = proj, layer = "farms"}
cells = CellularSpace{project = proj, layer = "cells"}
Figure below shows them.
This section presents the strategies that use Euclidean spaces to create GPM. All the code below uses gpm
package, that can be loaded using:
import("gpm")
Neighborhoods can be built from any vectorial data. Different strategies are available according to the spatial representation of the origin and destination to be connected. They are summarized in the table below.
From | To Points | To Lines | To Polygons |
---|---|---|---|
Points | Distance, Network | Distance, Network | Distance, Network |
Lines | Distance, Network | Distance, Network | Distance, Network |
Polygons | Contains, Distance, Network | Distance, Length, Network | Area, Border, Distance, Network |
The first example of GPM starts with a strategy that uses the intersection area to create relations
between polygons. Two spatial objects are connected if they have some intersection area.
We can declare a GPM
using cells
as origin
, farms
as destination
, and area
as strategy
to create relations based on the intersection area.
gpm = GPM{
origin = cells,
strategy = "area",
destination = farms
}
While gpm
is being created, TerraME shows messages on the screen indicating which object is being processed.
To avoid such messages, it is possible to set argument progress = false
.
The GPM stored in object gpm
can then be saved as a .gal
file by using save()
, presented
in more details in last section.
GPM
can also be used to create new attributes for the CellularSpace
using fill()
. For example, if we want to count how many polygons cover each cell, we can use strategy = "count"
. In this case, we will also use an optional argument max = 5
to indicate that, if a given cell has more than five neighbors, it will use five as the output value.
gpm:fill{
strategy = "count",
attribute = "quantity",
max = 5
}
This function creates an attribute named quantity
in the origin
of the gpm
(i.e., cells
).
We can then create a Map
to visualize the output.
Map{
target = cells,
select = "quantity",
min = 0,
max = 5,
slices = 6,
color = "Reds"
}
The output is shown below. Note that this attibute was not yet saved into a file. To do so, it is necessary to call CellularSpace:save() explicitly. For example:
cells:save("gpm", "quantity") -- create layer "gpm" pointing to file "gpm.shp"
Some fill
strategies have the option to copy attributes from the destination to the origin. For example, if we
want to create an attribute with the maximum
intersection area, as it is related to a single destination, we can
copy an attribute of this destination to the origin. The code below uses this strategy and copies the attribute
"id"
from the destination to the origin. This attribute will be called "farm"
in the origin.
gpm:fill{
strategy = "maximum",
attribute = "max",
copy = {farm = "id"}
}
To paint them with different colors, we use the rest of division of the id by 9. In the code below, we replace
the value of farms
with this content and then create a Map
with nine colors to visualize the output.
forEachCell(gpm.origin, function(cell)
cell.farm = tonumber(cell.farm) % 9
end)
Map{
target = gpm.origin,
select = "farm",
min = 0,
max = 8,
slices = 9,
color = "Set1"
}
The output is shown below. Note that it is possible to recognize the largest farms in this figure. If one wants to better represent the relations to smaller farms, it is necessary to use cells with a smaller resolution.
The strategy "distance"
, as default, computes the Euclidean distance between each pair of object to
create relations. It can be used with any geometry because it uses the centroids of the objects.
The example below creates relations between cells
and communities
based on their distances.
gpm = GPM{
origin = cells,
destination = communities,
strategy = "distance"
}
An attribute can then be created using the community closest to each cell using the code below.
This funcion also copies the attribute "LOCALIDADE"
of the closest community to the cells
.
gpm:fill{
strategy = "minimum",
attribute = "dist",
copy = "LOCALIDADE"
}
Map{
target = cells,
select = "dist",
slices = 8,
min = 0,
max = 7000,
color = "YlOrRd",
invert = true
}
The output is shown below.
It is also possible to visualize the closest community using the copied attribute
"LOCALIDADE"
.
Map{
target = cells,
select = "LOCALIDADE",
value = {"Palhauzinho", "Santa Rosa", "Garrafao", "Mojui dos Campos"},
color = "Set1"
}
As we created an attribute using all destinations, it is possible to create one attribute
per destination in the origin
, using the distances as values. For example, the code below
creates four attributes, "d_0"
,"d_1"
, "d_2"
, and "d_3"
, with the distance
to the respective points.
gpm:fill{
strategy = "all",
attribute = "d"
}
The code and image below shows the values of attribute "d_0"
using eight slices.
Map{
target = cells,
select = "d_0",
slices = 8,
min = 0,
max = 10000,
color = "YlOrRd",
invert = true
}
Another option to compute distance relations is to use a threshold with a given a maximum distance. The code below creates relations only when the distance is at most 4000m.
gpm = GPM{
origin = cells,
destination = communities,
distance = 4000
}
Note that now cells can have different quantities of neighbors. The code below creates an attribute counting how many neighbors each cell has.
gpm:fill{
strategy = "count",
attribute = "quantity"
}
Map{
target = cells,
select = "quantity",
min = 0,
max = 5,
slices = 6,
color = "RdPu"
}
To plot the nearest neighbor, it is necessary to compute the closest neighbor first.
The code below creates attribute "dist"
, copying attribute "LOCALIDADE"
and
using value 7000 in those cells that do not have any nearest neighbor (those that
are far from 4000m from the closest community).
gpm:fill{
strategy = "minimum",
attribute = "dist",
dummy = 7000,
copy = "LOCALIDADE"
}
As this operation can create some nil values, those cells that do not have attribute LOCALIDADE
need to be filled with some value. The code below fills them with a value called "<none>"
,
and draws a Map using "LOCALIDADE"
. The cells painted with orange have this value.
forEachCell(cells, function(cell)
if not cell.LOCALIDADE then
cell.LOCALIDADE = "<none>"
end
end)
Map{
target = cells,
select = "LOCALIDADE",
value = {"Palhauzinho", "Santa Rosa", "Garrafao", "Mojui dos Campos", "<none>"},
color = "Set1"
}
The strategy presented in this section computes neighborhoods based on the intersection between
lines and cells. Each cell is connected with the line segments that intersects it by using strategy
"length"
. The code below creates
a neighborhood between "cells" and "rodovias".
gpm = GPM{
origin = cells,
strategy = "length",
destination = roads
}
To visualize the output, the code below counts the number of neighbors, using a maximum of one (any cell with more than one neighbor will have one as attribute value).
gpm:fill{
strategy = "count",
attribute = "quantity",
max = 1
}
Map{
target = cells,
select = "quantity",
value = {0, 1},
label = {"0", "1 or more"},
color = {"gray", "blue"}
}
In this section, we present a function that computes neighborhoods between a layer of polygons and a layer of points based on spatial relation "contains". A cell is connected to the points located inside its area. The code below creates a neighborhood between the layer "cells" and the layer "comunidades".
gpm = GPM{
origin = cells,
strategy = "contains",
destination = communities
}
It then creates an attribute "quantity"
that counts the number of points within each polygon and
draws this attribute within a map.
gpm:fill{
strategy = "count",
attribute = "quantity"
}
Map{
target = cells,
select = "quantity",
value = {0, 1},
color = {"lightGray", "blue"}
}
The output is shown in the figure below.
The strategy "border"
computes relations between polygons based on their common border. The example below
uses a set of five polygons stored in file "partofbrazil.shp"
, within gpm
package, represented by object states
.
It creates a GPM
from states
to themselves by omitting the destination
.
local states = CellularSpace{
file = filePath("partofbrazil.shp", "gpm"),
geometry = true
}
local gpm = GPM{
origin = states,
strategy = "border",
progress = false
}
This example manipulates the relations stored within gpm
directly.
This object has an attribute neighbor
, a table whose indexes are
unique identifiers from the origin
, and whose value are named tables
mapping the neighbors (indexes) to the weight (values). Code below
shows how to retrieve such values from a GPM
.
forEachOrderedElement(gpm.neighbor, function(idx, neigh)
print(states:get(idx).name)
forEachOrderedElement(neigh, function(midx, weight)
print("\t"..states:get(midx).name.." ("..string.format("%.2f", weight)..")")
end)
end)
This script will produce the following output, showing some Brazilian states and their neighbors. The value between parenthesis indicates the percentage of the origin's border shared with the given destination.
MINAS GERAIS
RIO DE JANEIRO (0.10)
SAO PAULO (0.25)
ESPIRITO SANTO (0.12)
PARANA
SAO PAULO (0.32)
RIO DE JANEIRO
MINAS GERAIS (0.29)
SAO PAULO (0.14)
ESPIRITO SANTO (0.09)
SAO PAULO
MINAS GERAIS (0.38)
PARANA (0.26)
RIO DE JANEIRO (0.07)
ESPIRITO SANTO
MINAS GERAIS (0.48)
RIO DE JANEIRO (0.11)
In human-environment models, usually Euclidean spaces are not enough to describe the relations that take place in the geographic space. Networks can then be used to better compute such relations.
- The original data has to be very well represented, with the starting and ending points of two lines being connected to one another only when they share the same position in space.
- A given path from an origin to a destination enters in the
Network
only once. Because of this, theNetwork
must be fully connected.
In this type of network, it is possible to enter and leave the roads in any position.
The type Network
is used to generate the network. It takes as arguments
the destination (reference) points, the lines that will be used to represent the network, and a function that
computes the distance on the network given the length of the lines and their id. The code below creates a Network
that reduces the distance within the network by one fifth of the Euclidean distance for paved roads and
by half on the others. The attribute "STATUS"
of the table connected to the layer of lines indicates
whether the road is paved or not.
When we use Networks, the created attributes are not distances (unless the weight and outside functions return the distance itself). Usually it has a meaning of cost.
network = Network{
target = communities,
lines = roads,
weight = function(distance, cell)
if cell.STATUS == "paved" then
return distance / 5
else
return distance / 2
end
end,
outside = function(distance) return distance * 4 end
}
The algorithm to create a Network
verifies if the data is valid. Whenever there is some problem, TerraME stops and
prompts the lines with problem. It is possible to fix the Network
using software like QGIS. Some functionalities of
this QGIS are interesting to solve geometric problems. First, it is possble to visualize the features with problems.
Click in the icon Select features using an expression (Ctrl + F3), as shown below.
Then write the number of the feature using the expression $id = <number>
. The figure below selects the
feature with id 1. It is then possible to Zoom to the selected feature to investigate the problem.
It is possible to simply remove the lines with problems, as long
as they do not affect the connectivity of the Network
. To do so, use the following icons:
If the lines must be edited, sometimes it is interesting to visualize all the points of the lines, which can be computed using Extract vertices, shown below.
After editing the data, just save it and run the script again in TerraME to check if everything is now working properly.
A GPM
can then be created using the Network
as destination
.
gpm = GPM{
destination = network,
origin = cells
}
gpm:fill{
strategy = "minimum",
attribute = "dist",
copy = "LOCALIDADE"
}
Map{
target = cells,
select = "dist",
slices = 10,
min = 0,
max = 14000,
color = "YlOrBr"
}
Map{
target = cells,
select = "LOCALIDADE",
value = {"Palhauzinho", "Santa Rosa", "Garrafao", "Mojui dos Campos"},
color = "Set1"
}
Figure below shows the polygons drawn with the color of the closest point through the network.
Nighborhoods created from geospatial data usually takes a considerable time to be computed.
As it is normally considered as an exogenous property of the model and will not change
along a simulation, it is usually recommended to save them into files to be loaded in
the beginning of the simulation.
Once we have created a GPM through one of the strategies presented above, we can save it in
a file, which can be a gal file (".gal"
), a gwt file (".gwt"
), of a gpm file (".gpm"
)
through the function save()
. The only argument of this function is the file name to be saved.
gpm:save("myfile.gal")
This neighborhood can then be loaded into a simulation using CellularSpace:loadNeighborhood()
.
cs:loadNeighborhood{file = "myfile.gal"}
The three available file extensions are summarized below and described as follows.
gal | gwt | gpm | |
---|---|---|---|
Number of attributes | Zero | One | Zero or more |
How many layers does it connect? | One | One | One or two |
Author | Anselin (2005) | Anselin (2005) | TerraME team |
GAL files store connections between objects, without using weight.
These objects must belong to the same dataset, which means that a GPM
must
be created with origin
being equals to destination
.
The first line of the file contains a header with the following fields:
- Character "0" indicating that it is a neighborhood file.
- Number of objects in the data;
- Name of the layer (or file) over which GPM was created;
- Name of the object attribute used as unique identifier for the objects.
The relations are described from the second line until the end of file. Each relation uses two lines. The first contains an unique identifier of the object and then the number of neighbors of the object. The second line contains the unique identifier of each of its connections. See a general descripttion below.
0 |
Num_Objects |
Layer |
Key_Variable |
---|---|---|---|
ID_Object_1 |
Num_Neighbors |
||
ID_Neighbor_1 |
ID_Neighbor_2 |
... |
ID_Neighbor_N |
... |
A small example of a GAL file is shown below. This file was created from the output of the GPM created in section Border.
0 5 partofbrazil.shp object_id_
0 3
2 3 4
1 1
3
2 3
0 3 4
3 3
0 1 2
4 2
0 2
Gwt format describes relations using an additional value indicating the connection weight.
It also requires that the objects must belong to the same dataset (origin
equals to destination
).
The header of the gwt format is the same of gal.
From the second line until the end of file, it stores one connection by line using the following fields:
- Unique identifier of the object N-th object;
- M-th neighbor of the N-th object;
- Weight (attribute value) of the relation between the N-th object and the M-th neighbor.
The structure is presended below.
0 |
Num_Objects |
Layer |
Key_Variable |
---|---|---|---|
ID_Object_1 |
ID_Neighbor_1 |
Weight_Neighbor_1 |
|
ID_Object_1 |
ID_Neighbor_2 |
Weight_Neighbor_2 |
|
... |
|||
ID_Object_1 |
ID_Neighbor_N |
Weight_Neighbor_N |
|
ID_Object_2 |
ID_Neighbor_1 |
Weight_Neighbor_1 |
|
... |
A small example of a GWT file is shown below. This file was created from the output of the GPM created in section Border.
0 5 partofbrazil.shp object_id_
0 2 0.096047689354866
0 3 0.25358518320813
0 4 0.12346779200856
1 3 0.3231305819424
2 0 0.28661770283377
2 3 0.13545641162995
2 4 0.085637388476816
3 0 0.38457681249402
3 1 0.26322254348472
3 2 0.06884029109318
4 0 0.4839820977464
4 2 0.11249233755161
Gpm is a more general format that can connects two different layers and store any amount of attributes to each connection. The structure of the GPM file contains a header in the first line, with the following fields:
- Number of attributes associated to each relation
- Name of the first layer (origin) of GPM. If the data was loaded directly from a shapefile, it will be the name of the file.
- Name of the second layer (destination). Connections are established from the first layer to the second one. If the GPM was created from a single layer, then the name of the first layer will be repeated in this field.
- Names of the GPM attributes.
The second line until the end of the file describe the connections. The neighborhood of each object uses two lines. The first one contains:
- Unique identifier of the N-th object;
- Number of neighbors of the N-th object.
The second line contais the neighborhood of the object which ID is in the previous line, represented by the fields below.
- M-th neighbor of the N-th object;
- Value of the k-th attribute of the M-th neighbor.
The structure of a GPM file is summarized below.
Num_attributes |
Layer_1 |
Layer_2 |
Attribute_1 |
Attribute_2 |
... |
Attribute_N |
---|---|---|---|---|---|---|
ID_Object_1 |
Num_Neighbors |
|||||
ID_Neighbor_1 |
Attrib_1_Neigh_1 |
Attrib_2_Neigh_1 |
... |
Attrib_N_Neigh_1 |
ID_Neighbor_2 |
... |
ID_Object_2 |
... |
|||||
... |
An example of a GPM file with one attribute is shown below. This output is part of the file produced by the GPM created in section Network.
111 farms_cells.shp communities.shp object_id_
0 4
0 5501.9562754449 1 6153.7686145953 2 10641.235146975 3 13929.451501441
1 4
0 7365.1627505185 1 8020.7951384333 2 9759.3030416018 3 12486.174413104
10 4
0 5012.8344428588 1 3069.7189628308 2 9426.2638737062 3 15346.631146027
100 4
0 19168.291791002 1 8013.275527337 2 3320.4380158657 3 10619.245311459
101 4
0 18116.038549794 1 6568.2057569898 2 4562.1923447465 3 8744.1659540264
Aguiar et al. (2003) Modeling Spatial Relations by Generalized Proximity Matrices. Proceedings of V Brazilian Symposium in Geoinformatics (GeoInfo'03).
Anselin, L. (2005) Exploring Spatial Data with GeoDaTM : A Workbook. http://www.csiss.org/clearinghouse/GeoDa/geodaworkbook.pdf
If you have comments, doubts or suggestions related to this document, please write a feedback to pedro.andrade <at> inpe.br.
Back to wiki or terrame.org.