From f137f88bd8c1342ee1d9fd254b16ca53267151a1 Mon Sep 17 00:00:00 2001 From: Hannah Bast Date: Wed, 3 Apr 2024 06:31:25 +0200 Subject: [PATCH] Simplify instructions for adding spatialjoin triples --- README.md | 37 ++++++++++++++----------------------- 1 file changed, 14 insertions(+), 23 deletions(-) diff --git a/README.md b/README.md index cad7c79..2eb6304 100644 --- a/README.md +++ b/README.md @@ -72,38 +72,29 @@ and `ogc:sfIntersects`. The `--write-ogc-geo-triples none` option disables this. To have both the `osm2rdf` predicates *and* the `spatiajoin` predicates (for comparison or debugging), just omit the option. -### Step 2: Build a QLever instance, start it, and download the geometries +### Step 2: Extract geometries and feed into spatialjoin ``` -PORT=7008 -echo '{ "languages-internal": [], "prefixes-external": [""], "ascii-prefixes-only": false, "num-triples-per-batch": 1000000 }' > ${NAME}.settings.json -ulimit -Sn 1048576; bzcat ${NAME}.ttl.bz2 | IndexBuilderMain -F ttl -f - -i ${NAME} -s ${NAME}.settings.json --stxxl-memory 10G | tee ${NAME}.index-log.txt -ServerMain -i ${NAME} -j 8 -p ${PORT} -m 20G -c 10G -e 3G -k 200 -s 300s -curl -s localhost:${PORT} -H "Accept: text/tab-separated-values" -H "Content-type: application/sparql-query" --data "PREFIX geo: SELECT ?osm_id ?geometry WHERE { ?osm_id geo:hasGeometry/geo:asWKT ?geometry }" | sed -E 's#\t"(.+)"\^\^#osm\1:\3\t\4#g' | sed 1d > spatialjoin.input.tsv -``` - -Note: The `sed` command replaces the full IRIs by shorter prefixed IRIs. Also -note that we only get the WKT literals from `geo:gasGeometry/geo:asWKT` here. -It would be nicer to fetch all WKT literals in the datasets, no matter to which -predicate they belong (for example, the predicates `osm2rdfgeom:envelope` or -`osm2rdfgeom:convex_hull` also have WKT literals as objects) - -### Step 3: Compute the spatial relations - +time lbzcat -n 2 ${NAME}.ttl.bz2 | \grep "^osm2rdf" | sed -En 's/^osm2rdf(geom)?:(osm_)?(node|rel|way)[a-z]*_([0-9]+) geo:asWKT "([^\"]+)".*/osm\3:\4\t\5/p' | spatialjoin --contains ' ogc:sfContainsNew ' --intersects ' ogc:sfIntersectsNew ' --suffix $' .\n' -o ${NAME}.spatialjoin-triples.ttl.bz2 ``` -cat spatialjoin.input.tsv | spatialjoin --contains ' ogc:_contains ' --intersects ' ogc:_intersects ' --suffix $' .\n' -o rels.out.bz2 -``` +Note: This reconstructs the OSM IDs from osm2rdf's geo:asWKT triples, where the +subject is of one of these forms (with prefix `osm2rdfgeom:` for most, but very +confusingly `osm2rdf:` for ways): `osm_node_(\d+)`, `osm_rel_(\d+)`, `way_(\d+)`, +`osm_wayarea_(\d+)`, `osm_relarea_(\d+)` (note the inconsistency that it's not +`osm_way_(\d+)`). -Note that we could feed the geometries directly into `spatialjoin` as follows: +### Step 4: Create SPARQL endpoint with QLever ``` -curl -s localhost:${PORT} -H "Accept: text/tab-separated-values" -H "Content-type: application/sparql-query" --data "PREFIX geo: SELECT ?osm_id ?geometry WHERE { ?osm_id geo:hasGeometry/geo:asWKT ?geometry }" | sed -E 's#\t"(.+)"\^\^#osm\1:\3\t\4#g' | sed 1d | spatialjoin --contains ' ogc:_contains ' --intersects ' ogc:_intersects ' --suffix $' .\n' -o rels.out.bz2 +ulimit -Sn 1048576; bzcat ${NAME}.ttl.bz2 ${NAME}.spatialjoin-triples.ttl.bz2 | IndexBuilderMain -F ttl -f - -i ${NAME} -s ${NAME}.settings.json --stxxl-memory 10G | tee ${NAME}.index-log.txt +ServerMain -i ${NAME} -j 8 -p ${PORT} -m 20G -c 10G -e 3G -k 200 -s 300s ``` -### Step 4: Rebuild the QLever index with the added triples +With the right QLeverfile (modify the one obtained via `qlever setup-config +osm-planet`), it's simply: ``` -ulimit -Sn 1048576; bzcat ${NAME}.ttl.bz2 rels.out.bz2 | IndexBuilderMain -F ttl -f - -i ${NAME} -s ${NAME}.settings.json --stxxl-memory 10G | tee ${NAME}.index-log.txt -ServerMain -i ${NAME} -j 8 -p ${PORT} -m 20G -c 10G -e 3G -k 200 -s 300s +qlever index +qlever start ```