Two different ways of querying OSM data are available, which determine how the OSM history data is actually analyzed in a given OSHDB query:
- The snapshot view (
OSMEntitySnapshotView
) returns the state of the OSM history data at specific given points in time. - The contribution view (
OSMContributionView
) returns all modifications (e.g., creations, modifications or deletions) to the OSM elements within a given time period.
The snapshot view is particularly useful for analysing how the amount of OSM data changed over time. The contribution view can be used to determine the number of OSM contributors editing the OSM data.
Both views can be used in the OSHDB API in very similar ways and only differ in the type of data that is returned by the MapReducer
object that is returned when calling the on
method of the respective view: The OSMEntitySnapshotView
returns a MapReducer of OSMEntitySnapshot
objects, while the OSMContributionView
returns a MapReducer of OSMContribution
objects.
OSHDBDatabase oshdb = …;
MapReducer<OSMEntitySnapshot> snapshotsMapReducer = OSMEntitySnapshotView.on(oshdb);
// or
MapReducer<OSMContribution> contributionsMapReducer = OSMContributionView.on(oshdb);
A MapReducer is conceptually very similar to a Stream object in Java 11: It stores all the information about what kind of filters, transformation functions and aggregation methods should be applied to the data and is executed exactly once by calling a terminal operation, such as the reduce method, or one of the supplied specialized reducers (e.g., count
, sum
, etc.). The chapter “Map and Reduce” of this manual describes the ideas of the MapReducer
object in more detail.
The OSMEntitySnapshot
is quite simple: it returns the state of the OSM data at a given point in time, or at multiple given points in time. In the OSHDB API, these are called snapshots and are represented by OSMEntitySnapshot
objects. They allow access to the following properties:
- the timestamp of the snapshot
- the geometry of the queried OSM feature
- the OSM entity of this snapshot
The OSMContributionView
returns all modifications to matching OSM entities. This is in general more computationally intensive than using the snapshot view, but allows to inspect the OSM data in more detail, especially if one is interested in how the OSM data is modified by the contributors to the OSM project.
Specifically, the OSHDB API considers all modifications to semantic OSM elements as a contribution: This includes both direct edits (e.g. tag changes) as well as changes which are based in changes of referenced OSM objects (e.g. certain geometry changes). When OSM entities are changed multiple times in a single OSM changeset, these are squashed into one single contribution result.
Through the returned OSMContribution
objects, one has access to the following properties:
- the timestamp of the contribution
- the geometries before and after the modification. If the contribution object represents a creation of an entity, the before geometry doesn't exist and returns
null
if it is accessed. Similarly, this is also true for the geometry after a deletion of an OSM object. - the OSM entity before and after the modification
- the id of the OSM user who performed this contribution
- the id of the OSM changeset in which this contribution was performed
- the type of the contribution.
The contribution type can be either a creation, a deletion, a tag change, or a geometry change of an OSM entity.
All of these contribution types refer to the filtered set of OSM data of the current MapReducer. This means that an OSM feature that has gained a specific tag in one of versions greater than one, will be reported as a “creation” by the contribution view of the OSHDB API if the query was programmed to filter for that particular tag. Analogously this is also the case if an object was moved from outside an area of interest into the query region, and also for the inverse cases which are returned as deletions. This makes sure that summing up all creations and subtracting all deletions matches the results one can obtain from a query using the snapshot view.
Note that there exist cases where a contribution object doesn't belong to any of the mentioned contribution type (i.e. when a modification of an object doesn't result in a change in geometry or tags).
The groupByEntity()
method of a MapReducer slightly changes the way the MapReducers receives and transforms values: Instead of iterating over each snapshot or contribution individually, in this mode all snapshots or all contributions of an individual OSM entity are collected into a list (sorted by timestamps) first. This makes it possible to investigate the full edit history of individual OSM objects at once, which is for example needed when one is looking for contributions that got reverted at a later point in time.
It is recommended to call this method immediately after creating the MapReducer from a view:
OSMEntitySnapshotView.on(oshdb).groupByEntity()
Note that the similarly named
aggregateBy
function of the OSHDB API does something different: it is used at a later stage of an OSHDB query to calculate aggregated results for multiple groups or partitions of the data at once.