Skip to content

Service to enhance metadata with terms or URI's retrieved from external vocabularies.

Notifications You must be signed in to change notification settings

odissei-data/metadata-enhancer

Repository files navigation

Metadata Enhancer Service

The Metadata Enhancer Service is a Python-based web application designed to enrich metadata for datasets with additional information obtained from external resources. The service supports multiple enhancers, each responsible for enriching specific types of metadata fields. A deployed version of this service can be found here.

Table of Contents

Introduction

The Metadata Enhancer Service aims to improve the quality and discoverability of metadata for datasets by adding relevant and meaningful information to certain fields. It utilizes look-up tables created from external vocabularies to enrich the existing metadata. As of now the endpoints of this service expect JSON metadata formatted for Dataverse.

Features

  • Enrich metadata fields with external data sources.
  • Creates look-up tables by querying external vocabularies using SPARQL or SKOSMOS API.
  • Support for different types of enhancers (e.g., ELSSTEnhancer, FrequencyEnhancer, VariableEnhancer).
  • FastAPI-based RESTful API for easy integration with other applications.
  • Uses docker-compose and docker to set up the service in a container.
  • Uses automatic testing and image pushing.

Setup

  1. Copy the dot_env_example file to .env and set the environment variables appropriately for your specific setup.
  2. Use make build to build the Docker images and start the application.
  3. Access the application at the port specified in the .env file. This will be http://localhost:7070 if you copied the dot_env_example.
  4. Use make stop to gracefully stop the application when done.

Makefile Commands

  1. make build: Builds the Docker image and starts the project. This command can be used when setting up the project for the first time or when changes have been made to the Dockerfile.

  2. make start: Starts the project running in a non-detached mode.

  3. make stop: Gracefully stops the running metadata-enhancer container.

  4. make test: Runs the unit tests inside the Docker container. Note: The container needs to be running to be able to execute this command.

Usage

The service provides RESTful API endpoints to enhance metadata. Clients can make POST requests to these endpoints with the relevant dataset metadata as input to receive enriched metadata as output.

Enhancers

The Metadata Enhancer Service currently supports the following enhancers:

  • VocabularyEnhancer: Enriches terms with concepts in any given vocabulary.
  • FrequencyEnhancer: Enhances CBS metadata with frequency information from an external table.
  • VariableEnhancer: Enriches CBS dataset variables with additional attributes.

API Endpoints

  • POST /enrich/elsst/{language}: Enrich metadata terms using ELSST vocabulary.
  • POST /enrich/cbs-concepts: Enrich metadata terms using CBS concepts.
  • POST /enrich/cbs-taxonomy: Enrich metadata terms using the CBS taxonomy.
  • POST /enrich/frequency: Enrich metadata terms with frequency information.
  • POST /enrich/variable: Enrich metadata terms related to dataset variables.

All endpoints expect JSON metadata formatted for Dataverse. Examples of this metadata can be found here . For more information about Dataverse you can take a look at the Dataverse documentation .

the enrich/elsst endpoint also requires a language. Currently, the options are 'nl' for Dutch or 'en' for English.

Continuous Integration with GitHub Actions

The Metadata Enhancer Service has continuous integration set up with GitHub Actions. Automated testing is performed whenever a pull request is created or pushed to the main branch. Docker image publishing to DockerHub is done automatically when a tag is pushed.

About

Service to enhance metadata with terms or URI's retrieved from external vocabularies.

Resources

Stars

Watchers

Forks

Packages

No packages published