diff --git a/README.md b/README.md index 9872ed3..ceb7db5 100644 --- a/README.md +++ b/README.md @@ -33,3 +33,57 @@ $ bzcat rels.out.bz2 9 intersects 1 [...] ``` + +## Use with QLever and osm2rdf + +One use case of `spatialjoin` is to add triples for the relations `contains` and +`intersects` to an RDF dataset with WKT literals. The following example shows +the process for the OSM data for germany. + +### Step 1: Download PBF from Geofabrik and convert to RDF + +``` +NAME=osm-germany +wget -O ${NAME}.pbf https://download.geofabrik.de/europe/germany-latest.osm.pbf +osm2rdf ${NAME}.pbf -o ${NAME}.ttl --simplify-wkt 0 --write-ogc-geo-triples none +``` + +Note: `osm2rdf` by default computes and outputs the predicates `ogc:sfContains` +and `ogc:sfIntersects`. The `--write-ogc-geo-triples none` option disables +this. To have both the `osm2rdf` predicates *and* the `spatiajoin` predicates +(for comparison or debugging), just omit the option. + +### Step 2: Build a QLever instance, start it, and download the geometries + +``` +PORT=7008 +echo '{ "languages-internal": [], "prefixes-external": [""], "ascii-prefixes-only": false, "num-triples-per-batch": 1000000 }' > ${NAME}.settings.json +ulimit -Sn 1048576; bzcat ${NAME}.ttl.bz2 | IndexBuilderMain -F ttl -f - -i ${NAME} -s ${NAME}.settings.json --stxxl-memory 10G | tee ${NAME}.index-log.txt +ServerMain -i ${NAME} -j 8 -p ${PORT} -m 20G -c 10G -e 3G -k 200 -s 300s +curl -s localhost:${PORT} -H "Accept: text/tab-separated-values" -H "Content-type: application/sparql-query" --data "PREFIX geo: SELECT ?osm_id ?geometry WHERE { ?osm_id geo:hasGeometry/geo:asWKT ?geometry }" | sed -E 's#\t"(.+)"\^\^#osm\1:\3\t\4#g' | sed 1d > spatialjoin.input.tsv +``` + +Note: The `sed` command replaces the full IRIs by shorter prefixed IRIs. Also +note that we only get the WKT literals from `geo:gasGeometry/geo:asWKT` here. +It would be nicer to fetch all WKT literals in the datasets, no matter to which +predicate they belong (for example, the predicates `osm2rdfgeom:envelope` or +`osm2rdfgeom:convex_hull` also have WKT literals as objects) + +### Step 3: Compute the spatial relations + +``` +cat spatialjoin.input.tsv | spatialjoin --suffix $' .\n' +``` + +Note that we could feed the geometries directly into `spatialjoin` as follows: + +``` +curl -s localhost:${PORT} -H "Accept: text/tab-separated-values" -H "Content-type: application/sparql-query" --data "PREFIX geo: SELECT ?osm_id ?geometry WHERE { ?osm_id geo:hasGeometry/geo:asWKT ?geometry }" | sed -E 's#\t"(.+)"\^\^#osm\1:\3\t\4#g' | sed 1d | spatialjoin --suffix $' .\n' +``` + +### Step 4: Rebuild the QLever index with the added triples + +``` +ulimit -Sn 1048576; bzcat ${NAME}.ttl.bz2 rels.out.bz2 | IndexBuilderMain -F ttl -f - -i ${NAME} -s ${NAME}.settings.json --stxxl-memory 10G | tee ${NAME}.index-log.txt +ServerMain -i ${NAME} -j 8 -p ${PORT} -m 20G -c 10G -e 3G -k 200 -s 300s +```