Extracting competency questions from the Wikidata knowledge graph
-
The process of developing ontologies - a formal, explicit specification of a shared conceptualisation - is addressed by well-known methodologies. As for any engineering development, its fundamental basis is the collection of requirements, which includes the elicitation of competency questions. Competency questions are defined through interacting with domain and application experts or by investigating existing datasets that may be used to populate the ontology i.e. its knowledge graph. The rise in popularity and accessibility of knowledge graphs provides an opportunity to support this phase with automatic tools. In this work, we explore the possibility of extracting competency questions from a knowledge graph. We describe in detail RevOnt, an approach that extracts and abstracts triples from a knowledge graph, generates questions based on triple verbalisations, and filters the questions to guarantee that they are competency questions. This approach is implemented utilizing the Wikidata knowledge graph as a use case. The implementation results in a set of core competency questions from 20 domains present in the dataset presenting the knowledge graph, and their respective templates mapped to SPARQL query templates. We evaluate the resulting competency questions by calculating the BLEU score using human-annotated references. The results for the abstraction and question generation components of the approach show good to high quality. Meanwhile, the accuracy of the filtration component is above 86%, which is comparable to the state-of-the-art classifications.
An overview of the RevOnt framework. The first stage, Verbalisation Abstraction, generates the abstraction of a triple verbalisation. The abstraction is used as input in the second stage, Question Generation, to generate three questions per triple and perform a grammar check. Lastly, the third stage, Question Filtration, filters the questions by performing different techniques.
This project needs to have installed several packages for the usage of the language models and the Wikidata querying service.
pip install -U sentence-transformers pip install happytransformer pip3 install qwikidata
The functions import many packages and need to the downloading of wordnet and OMW-1.4 as shown below.
from qwikidata.sparql import return_sparql_query_results from IPython.core.debugger import skip_doctest from sentence_transformers import SentenceTransformer from transformers import AutoTokenizer, AutoModel, AutoModelForTokenClassification, pipeline from happytransformer import HappyTextToText, TTSettings from sklearn.metrics.pairwise import cosine_similarity import re import json import time import torch import torch.nn.functional as F import nltk from nltk.corpus import wordnet nltk.download('wordnet') nltk.download('omw-1.4')
In the repository, there are separate scripts for each of the components. This separation provides the possibility to opt out a using a component or interchanging the queue in which the components are executed. The scripts also allow to use a different language model that the default one. The language models used in the scripts are state-of-the-art models that have shown good to high results in the first evaluation of the method.
- First implementation of the RevOnt method using data from the Wikidata knowledge graph
- Second implementation of the RevOnt method using data from a AMR graph built from a textual corpus.
Fiorela Ciroku - @ciroku_fiorela - [email protected]
Project Link: (https://github.com/FiorelaCiroku/RevOnt)[https://github.com/FiorelaCiroku/RevOnt]