Create the conda env.
conda create -n promptkg python=3.8
Install the dependence.
pip install -r requirements.txt
Install our preprocessed datasets and put them into the dataset
folder.
Dataset (KGC) | Google Drive | Baidu Cloud |
---|---|---|
WN18RR | google drive | baidu drive axo7 |
FB15k-237 | google drive | baidu drive ju9t |
MetaQA | google drive | baidu drive hzc9 |
KG20C | google drive | baidu drive stnh |
We provide four tasks in our toolkit as Knowledgeg Graph Completion (KGC), Question Answering (QA), Recomandation (REC) and LAnguage Model Analysis (LAMA).
-
KGC
is our basic task to the knowledge graph embedding and evaluate the ability of the models. ** You can run the script underkgc
folder to train the model and get the KG embeddings (takesimkgc
as example).bash ./scripts/kgc/simkgc.sh
-
For
QA
task, you can run the script files undermetaqa
. ** We suggest you use generative model to solve theQA
task as below:bash ./scripts/metaqa/run.sh
-
For
REC
task, you need to firstly get the KG embeddings and then train the rec system models. ** use two-stage scripts below:bash ./scripts/kgrec/pretrain_item.sh bash ./scripts/kgrec/ml20m.sh
-
For
LAMA
task, you can use the files underlama
. ** We provideBERT
andRoBERTa
PLMs to evaluate their performance and with our KG embeddings (plet).bash ./scripts/lama/lama_roberta.sh
Models | Knowledge Graph Completion | Question Answering | Recomandation | LAnguage Model Analysis |
---|---|---|---|---|
KG-BERT | ✔ | ✔ | ||
GenKGC | ✔ | |||
KGT5 | ✔ | ✔ | ||
kNN-KGE | ✔ | ✔ | ✔ | |
SimKGC | ✔ |
For each knowledge graph, we have 5 files.
train.tsv
,dev.tsv
,test.tsv
, list as (h, r, t) for entity id and relation id (start from 0).entity2text.txt
, as (entity_id, entity description).relation2text.txt
, as (relation_id, relation description).
- add essemble-model for using
- add more kgc models based on pretrained language models
- add soft prompt in base model.
- add more dataset with the same format.