The Metadata Enhancer Service is a Python-based web application designed to enrich metadata for datasets with additional information obtained from external resources. The service supports multiple enhancers, each responsible for enriching specific types of metadata fields. A deployed version of this service can be found here.
- Introduction
- Features
- Setup
- Makefile Commands
- Usage
- Enhancers
- API Endpoints
- Continuous Integration with GitHub Actions
The Metadata Enhancer Service aims to improve the quality and discoverability of metadata for datasets by adding relevant and meaningful information to certain fields. It utilizes look-up tables created from external vocabularies to enrich the existing metadata. As of now the endpoints of this service expect JSON metadata formatted for Dataverse.
- Enrich metadata fields with external data sources.
- Creates look-up tables by querying external vocabularies using SPARQL or SKOSMOS API.
- Support for different types of enhancers (e.g., ELSSTEnhancer, FrequencyEnhancer, VariableEnhancer).
- FastAPI-based RESTful API for easy integration with other applications.
- Uses docker-compose and docker to set up the service in a container.
- Uses automatic testing and image pushing.
- Copy the
dot_env_example
file to.env
and set the environment variables appropriately for your specific setup. - Use
make build
to build the Docker images and start the application. - Access the application at the port specified in the
.env
file. This will be http://localhost:7070 if you copied the dot_env_example. - Use
make stop
to gracefully stop the application when done.
-
make build
: Builds the Docker image and starts the project. This command can be used when setting up the project for the first time or when changes have been made to the Dockerfile. -
make start
: Starts the project running in a non-detached mode. -
make stop
: Gracefully stops the running metadata-enhancer container. -
make test
: Runs the unit tests inside the Docker container. Note: The container needs to be running to be able to execute this command.
The service provides RESTful API endpoints to enhance metadata. Clients can make POST requests to these endpoints with the relevant dataset metadata as input to receive enriched metadata as output.
The Metadata Enhancer Service currently supports the following enhancers:
- VocabularyEnhancer: Enriches terms with concepts in any given vocabulary.
- FrequencyEnhancer: Enhances CBS metadata with frequency information from an external table.
- VariableEnhancer: Enriches CBS dataset variables with additional attributes.
- POST /enrich/elsst/{language}: Enrich metadata terms using ELSST vocabulary.
- POST /enrich/cbs-concepts: Enrich metadata terms using CBS concepts.
- POST /enrich/cbs-taxonomy: Enrich metadata terms using the CBS taxonomy.
- POST /enrich/frequency: Enrich metadata terms with frequency information.
- POST /enrich/variable: Enrich metadata terms related to dataset variables.
All endpoints expect JSON metadata formatted for Dataverse. Examples of this metadata can be found here . For more information about Dataverse you can take a look at the Dataverse documentation .
the enrich/elsst
endpoint also requires a language. Currently, the options
are 'nl' for Dutch or 'en' for English.
The Metadata Enhancer Service has continuous integration set up with GitHub
Actions. Automated testing is performed whenever a pull request is created or
pushed to the main
branch. Docker image publishing to DockerHub is done
automatically when a tag is pushed.