Skip to content

Commit

Permalink
Add instructions for using with QLever (#1)
Browse files Browse the repository at this point in the history
* Add instructions for using with QLever

* Add --suffix $' .\n'

* A round of corrections

---------

Co-authored-by: Hannah Bast <[email protected]>
  • Loading branch information
hannahbast and Hannah Bast authored Mar 25, 2024
1 parent 318a0c8 commit 7891b26
Showing 1 changed file with 54 additions and 0 deletions.
54 changes: 54 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,57 @@ $ bzcat rels.out.bz2
9 intersects 1
[...]
```

## Use with QLever and osm2rdf

One use case of `spatialjoin` is to add triples for the relations `contains` and
`intersects` to an RDF dataset with WKT literals. The following example shows
the process for the OSM data for germany.

### Step 1: Download PBF from Geofabrik and convert to RDF

```
NAME=osm-germany
wget -O ${NAME}.pbf https://download.geofabrik.de/europe/germany-latest.osm.pbf
osm2rdf ${NAME}.pbf -o ${NAME}.ttl --simplify-wkt 0 --write-ogc-geo-triples none
```

Note: `osm2rdf` by default computes and outputs the predicates `ogc:sfContains`
and `ogc:sfIntersects`. The `--write-ogc-geo-triples none` option disables
this. To have both the `osm2rdf` predicates *and* the `spatiajoin` predicates
(for comparison or debugging), just omit the option.

### Step 2: Build a QLever instance, start it, and download the geometries

```
PORT=7008
echo '{ "languages-internal": [], "prefixes-external": [""], "ascii-prefixes-only": false, "num-triples-per-batch": 1000000 }' > ${NAME}.settings.json
ulimit -Sn 1048576; bzcat ${NAME}.ttl.bz2 | IndexBuilderMain -F ttl -f - -i ${NAME} -s ${NAME}.settings.json --stxxl-memory 10G | tee ${NAME}.index-log.txt
ServerMain -i ${NAME} -j 8 -p ${PORT} -m 20G -c 10G -e 3G -k 200 -s 300s
curl -s localhost:${PORT} -H "Accept: text/tab-separated-values" -H "Content-type: application/sparql-query" --data "PREFIX geo: <http://www.opengis.net/ont/geosparql#> SELECT ?osm_id ?geometry WHERE { ?osm_id geo:hasGeometry/geo:asWKT ?geometry }" | sed -E 's#<https://www.openstreetmap.org/(rel|way|node)(ation)?/([0-9]+)>\t"(.+)"\^\^<http:.*wktLiteral>#osm\1:\3\t\4#g' | sed 1d > spatialjoin.input.tsv
```

Note: The `sed` command replaces the full IRIs by shorter prefixed IRIs. Also
note that we only get the WKT literals from `geo:gasGeometry/geo:asWKT` here.
It would be nicer to fetch all WKT literals in the datasets, no matter to which
predicate they belong (for example, the predicates `osm2rdfgeom:envelope` or
`osm2rdfgeom:convex_hull` also have WKT literals as objects)

### Step 3: Compute the spatial relations

```
cat spatialjoin.input.tsv | spatialjoin --suffix $' .\n'
```

Note that we could feed the geometries directly into `spatialjoin` as follows:

```
curl -s localhost:${PORT} -H "Accept: text/tab-separated-values" -H "Content-type: application/sparql-query" --data "PREFIX geo: <http://www.opengis.net/ont/geosparql#> SELECT ?osm_id ?geometry WHERE { ?osm_id geo:hasGeometry/geo:asWKT ?geometry }" | sed -E 's#<https://www.openstreetmap.org/(rel|way|node)(ation)?/([0-9]+)>\t"(.+)"\^\^<http:.*wktLiteral>#osm\1:\3\t\4#g' | sed 1d | spatialjoin --suffix $' .\n'
```

### Step 4: Rebuild the QLever index with the added triples

```
ulimit -Sn 1048576; bzcat ${NAME}.ttl.bz2 rels.out.bz2 | IndexBuilderMain -F ttl -f - -i ${NAME} -s ${NAME}.settings.json --stxxl-memory 10G | tee ${NAME}.index-log.txt
ServerMain -i ${NAME} -j 8 -p ${PORT} -m 20G -c 10G -e 3G -k 200 -s 300s
```

0 comments on commit 7891b26

Please sign in to comment.