Skip to content

Latest commit

 

History

History
166 lines (115 loc) · 6.86 KB

README.md

File metadata and controls

166 lines (115 loc) · 6.86 KB

gaia (geOrchestra Automated Integrity Analysis) - a geOrchestra dashboard

Summary

the problem

The SDI data admin's life looks like and enless quest for consistency. He/she has to deal with loosely linked and perpetually moving datas, metadatas, services and maps published on numerous platforms. However, all this information is structured according to OGC standards. Thus most of the tests that the admin does manually can be automated. And most of the information, even if it comes from different platforms, can be presented in a synthetic way to obtain an ongoing evaluation of the problems, and save a lot of time on corrections.

the response

This project aims at providing a data quality insurance dashboard for geOrchestra, to make the data or map admin's life easier. Some of the GAIA benefits :

Automated inventory : GAIA scans interactively and periodically and interactively the catalogs, services and maps, and displays all those contents in one place. You get a birdeye view on all contents.

Integrity check : GAIA performs content analysis : missing or unreachable metadatas, bad OGC services, http errors, inconsistencies between metadadas and services ... reusing what was done in sdi-consistency-check.

Admin helper : You want to fix an error. GAIA let you access instantly the admin page, modify settings and check again the ressource

API : GAIA returns all results as JSON so you can use this data in your own tools

detailed features

  • clean and fine-grained URLs for all ressources
  • returns results in HTML pages or JSON
  • checks for common errors
  • give direct access to data/metadata/map previews
  • give direct access to data/metadata/map administration pages
  • can use geOrchestra roles
  • performs scheduled scans
  • performs on demand scans

dependencies

Here are the dependencies and why they are needed :

developpment status

it is a work in progress, being developed when spare time is available. for now developped in my own github account, but if enough features are developed and interest is shown, it'll move to the geOrchestra organization.

installation

debian installation

GAIA is being written using the versions of python/flask/celery provided by debian 12, it should only require 'recent' versions of those:

apt install python3-flask-bootstrap python3-flask python3-celery python3-sqlalchemy python3-psycopg2 python3-owslib python3-jsonpickle python3-redis

virtualenv installation

GAIA runs in a python virtualenv >= 3.10 with the provided requirements.txt

python -m virtualenv venv
source ./venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
./run.sh

configuration

geOrchestra integration

the web service should be added behind geOrchestra's security-proxy/gateway, so that it knows the connected user and can display user-tailored information.

add this line to /etc/georchestra/security-proxy/target-mappings.properties to declare GAIA in the geOrchestra security proxy :

gaia=http://<hostname>:<port>/gaia/

and visit https:///gaia/, which should list for now:

  • your metadatas
  • the maps & contexts you can access

if your datadir isn't in /etc/georchestra, point the georchestradatadir environment variable to the path where your datadir is located.

cache

for now a redis instance is used for celery's broker/result backend storage, to configure in config.py - celery can use rabbitmq for the broker, and in the end the geOrchestra PostgreSQL database will be used to store task results.

it tries as much as possible to autoconfigure itself by reading configuration files from geOrchestra's datadir

service

needs two services running (TODO)

  • the flask webapp, accessed at https://<idsurl>/gaia/
  • the celery worker, for long-running checks

for now during development those are started by run.sh, proper integration via gunicorn/systemd is the goal

Usage

pages

here's a quick list of pages/routes implemented so far, the goal is to have as much interlinking as possible.

the logic behind each url/route is that if you know what you want to access, be it a given OGC layer by its short name, a metadata by its uuid, or a mapstore map by its numeric id, you should be able to directly access it by typing the url in your browser.

/

lists:

  • metadatas belonging to the connected user
  • maps and contexts he is allowed to visit

/admin/

  • lists all maps and contexts current problems
  • allows to manually trigger a check for the integrity of all maps/contexts

/map/<mapid>

  • displays map details & current problems
  • links to the OGC layers used by the map

/ctx/<mapid>

  • displays ctx map details & current problems
  • links to the OGC layers used by the ctx map

/ows/<{wms,wfs,wmts}>/<service url>/

  • displays contents of a given OGC service
  • shows the list of all problems for all layers in the service
  • service url is an url with slashes replaced by ~, eg ~geoserver~wms for the default WMS geoserver, ~geoserver~workspace~ows for a workspace
  • full urls such as https:~~wmts.craig.fr~pci~service can be used, if the https:~~fqdn part is omitted the georchestra FQDN is assumed

/ows/<{wms,wfs,wmts}>/<service url>/<layername>

  • displays the details about a given layer in a given OGC service
  • links to the mapstore maps & contexts that use this layer
  • links to the metadata page
  • allow to preview the layer in geoserver, or open it in mapstore
  • links to the geoserver layer edit page
  • shows the list of all problems for all layers in the service

/csw/<portal>

  • displays the lists of metadatas in a given CSW portal, eg /csw/srv for all metadatas, and if you've created an opendata portal then /csw/opendata lists the metadatas in this CSW endpoint.

/csw/<portal>/<uuid>

  • displays the details about a given metadata in a given CSW portal
  • allows to view the metadata in datahub/geonetwork
  • links to the editor view in geonetwork
  • links to the OGC:W{M,F}S layers listed in the metadata