Repository for Large Language Models for Knowledge Engineering (LLM4KE).
Original idea:
How much LLM could co-contribute in the knowledge engineering process together with our usual methodology (competency questions, ontology re-use, authoring tests, etc.).
Set of questions we could investigate:
- Could a LLM reverse engineer an ontology and find out what good competency questions could be derived?
- Could a LLM take as input the CQ and generate parts of the ontology?
- Could a LLM take as input the CQ and extend an existing ontology?
- Could a LLM take as input the CQ and generate abstract patterns?
- Could a LLM write an authoring test (a SPARQL query) given the ontology and the CQ?
- Given a dataset and an ontology, is an LLM able to generate an adequate set of RML rules for data ingestion?
- Could a LLM take as input the CQ and extend an existing ontology?
The content of this code repository accompanies the research project explained in the following papers:
@inproceedings{llm4ke-2024,
title = {{Can LLMs Generate Competency Questions?}},
author = {{Youssra Rebboud} and {Lionel Tailhardat} and {Pasquale Lisena} and {Rapha\"el Troncy}},
booktitle = {Semantic Web - 21st International Conference (ESWC), LLMs for KE track, Hersonissos, Crete, Greece, May 26 - 30, 2024},
year = {2024}
}
@inproceedings{llm4ke-bench-2024,
title = {{Benchmarking LLM-based Ontology Conceptualization: A Proposal}},
author = {{Youssra Rebboud} and {Pasquale Lisena} and {Lionel Tailhardat} and {Rapha\"el Troncy}},
booktitle = {ISWC 2024, 23rd International Semantic Web Conference, 11-15 November 2024, Baltimore, USA},
year = {2024}
}
See the Repository Structure for navigating into this repository:
llm4ke
├───data <Reference data models with their related components>
│ └─[DataModelName]
│ ├─dm <data model implementation>
│ ├─rq <set of queries>
│ └─...
├───src <Processing pipeline code>
└───...
We will now address the research question "1. Could a LLM reverse engineer an ontology and identify potential competency questions?" mentioned above.
The pipeline uses LangChain, and in particular Ollama.
- Install Ollama from its website.
- Install requirements
pip install -r requirements.txt
- Download the desidered LLM (full list of available LLMs)
ollama pull llama2
- Run the pipeline to generate Competency Questions for a given ontology
Then browse the results in the
# Canonical form: # python src/main.py <task> --name <OntologyName> --input <OntologyFolder> --llm <ModelName> # Basic example for the Odeuropa ontology: python src/main.py all_classes --name Odeuropa --input ./data/Odeuropa/ --llm llama2
out/Odeuropa/
directory. You can get the full list of available parameters withpython src/main.py --help
With the output data from the above Generating Competency Questions step,
- Run the evaluation pipeline to compute similarity scores for all ontologies or a given ontology
Then browse the results in the
# Canonical form: # python src/eval.py <all|OntologyName> # Basic example for the Odeuropa ontology with a 0.8 similarity threshold and verbose logging: python3 ./src/eval.py Odeuropa -t 0.8 --log 10
./results_<all|OntologyName>.json/
file.
Copyright (c) 2023-2024, EURECOM. All rights reserved.