Skip to content
/ llm4ke Public

Repository for Large Language Models for Knowledge Engineering

License

Notifications You must be signed in to change notification settings

D2KLab/llm4ke

Repository files navigation

llm4ke

Repository for Large Language Models for Knowledge Engineering (LLM4KE).

Objectives

Original idea:

How much LLM could co-contribute in the knowledge engineering process together with our usual methodology (competency questions, ontology re-use, authoring tests, etc.).

Set of questions we could investigate:

  1. Could a LLM reverse engineer an ontology and find out what good competency questions could be derived?
  2. Could a LLM take as input the CQ and generate parts of the ontology?
  3. Could a LLM take as input the CQ and extend an existing ontology?
  4. Could a LLM take as input the CQ and generate abstract patterns?
  5. Could a LLM write an authoring test (a SPARQL query) given the ontology and the CQ?
  6. Given a dataset and an ontology, is an LLM able to generate an adequate set of RML rules for data ingestion?
  7. Could a LLM take as input the CQ and extend an existing ontology?

The content of this code repository accompanies the research project explained in the following papers:

@inproceedings{llm4ke-2024,
  title     = {{Can LLMs Generate Competency Questions?}},
  author    = {{Youssra Rebboud} and {Lionel Tailhardat} and {Pasquale Lisena} and {Rapha\"el Troncy}},
  booktitle = {Semantic Web - 21st International Conference (ESWC), LLMs for KE track, Hersonissos, Crete, Greece, May 26 - 30, 2024},
  year      = {2024}
}

@inproceedings{llm4ke-bench-2024,
  title     = {{Benchmarking LLM-based Ontology Conceptualization: A Proposal}},
  author    = {{Youssra Rebboud} and {Pasquale Lisena} and {Lionel Tailhardat} and {Rapha\"el Troncy}},
  booktitle = {ISWC 2024, 23rd International Semantic Web Conference, 11-15 November 2024, Baltimore, USA},
  year      = {2024}
}

Usage

See the Repository Structure for navigating into this repository:

llm4ke
├───data <Reference data models with their related components>
│   └─[DataModelName]
│     ├─dm <data model implementation>
│     ├─rq <set of queries>
│     └─...
├───src <Processing pipeline code>
└───...

Generating Competency Questions

We will now address the research question "1. Could a LLM reverse engineer an ontology and identify potential competency questions?" mentioned above.

The pipeline uses LangChain, and in particular Ollama.

  • Install Ollama from its website.
  • Install requirements
    pip install -r requirements.txt
  • Download the desidered LLM (full list of available LLMs)
    ollama pull llama2
  • Run the pipeline to generate Competency Questions for a given ontology
    # Canonical form:
    # python src/main.py <task> --name <OntologyName> --input <OntologyFolder> --llm <ModelName>
    
    # Basic example for the Odeuropa ontology:
    python src/main.py all_classes --name Odeuropa --input ./data/Odeuropa/ --llm llama2
    Then browse the results in the out/Odeuropa/ directory. You can get the full list of available parameters with python src/main.py --help

Evaluating the LLM's Competency Questions

With the output data from the above Generating Competency Questions step,

  • Run the evaluation pipeline to compute similarity scores for all ontologies or a given ontology
    # Canonical form:
    # python src/eval.py <all|OntologyName>
    
    # Basic example for the Odeuropa ontology with a 0.8 similarity threshold and verbose logging:
    python3 ./src/eval.py Odeuropa -t 0.8 --log 10
    Then browse the results in the ./results_<all|OntologyName>.json/ file.

Copyright

Copyright (c) 2023-2024, EURECOM. All rights reserved.

License

Apache.

Maintainer