Relation Extraction with RotatE Graph Embeddings

Injection of knowledge graph embedding (RotatE) into BERT for biomedical Relation Extraction (RE).

${\color{orange}Use \ our \ methods \ on \ your \ own \ corpora}$

Refer to the README file in /preprocessing/ to prepare your own data.
Check all available options by:
```
python3 main.py --help
```
Add class weights of your dataset to "class_weights" in /preprocessing/utils.py. For each of K classes $c_i$, its weight should be $\frac{\sum_jN_j}{N_i}$, where $N_i$ is the number of training examples labeled by $c_i$.

✋ In case you want to change BERT model (PubMedBERT by default)

add the corresponding config.json under the "config" folder
add its Huggingface model card name to "model_download_shortcuts" in /preprocessing/utils.py
set --model_type {bert_model_name}, e.g. --model_type biobert

Installation

pip install -r requirements.txt

Quick Start

⚪ Training

Follow instructions in /preprocessing/ to prepare pre-trained RotatE graph embeddings. You can also download pre-trained graph embeddings of DSMZ+Genbank+Cirm (~3.3G) under /data/{corpus_name}/:

gdown --folder https://drive.google.com/drive/folders/1zLzaMO9f_1qHTAxh4CZtR0yELKshWU6g?usp=sharing

Generate Data

sbatch process.slurm {corpus_name} false true

Train without KB information (will train a single model seeded by 61; change hyper-parameters in run_no_kb.slurm)

sbatch run_no_kb.slurm

Train with KB information (will train a single model seeded by 61; change hyper-parameters in run_no_kb.slurm)

sbatch run_with_kb.slurm

❕(recommended) Set --dry_run to make a quick path (to make sure codes being executed without errors).

🔴 Inference only

(optional) Same as the first step for training: obtain pre-trained RotatE graph embeddings. (❗If you skip the first step, entities that do no exist in training will be initialized randomly and might degrade the performance)
Download a model pre-trained on BB-Rel under /models/ (10 models to choose, links in pretrained_download_links.csv):

gdown --folder https://drive.google.com/drive/folders/1kVoBsKMBQ3ghTalfirP9uxjYZx45yHH0?usp=drive_link

Generate data (without pre-trained KG embeddings; otherwise set the last parameter to true)

sbatch process.slurm {corpus_name} true false {file_name}

Inference using a chosen model. Note that checkpoint weights should be saved under /checkpoint_path/model/

sbatch inference.slurm {corpus_name} {file_name} {checkpoint_path} {no_kb / with_kb}

Input

In case of training, make sure that train.csv, dev.csv and test.csv under /data/{corpus_name}.
In case of inference only, make sure that test.csv exists under /data/{corpus_name}.

Output

⭐ In case of training, the output path is set to ./models/{corpus_name}{mode}{model_type}{learning_rate}{seed}. The expected output includes:

predictions on the validation set (dev_preds.npy), in the form of labels
predictions on the test set (test_preds.csv), in the form of probabilities
weights of the best checkpoint saved in the folder "model"

⭐ In case of inference only, the output path is the same as {checkpoint_path}.

make sure that the folder {checkpoint_path}/model/ exists and pre-trained model weights are saved under /{checkpoint_path}/model/.

💡 In case that you use slurm files

Set the following values in the slurm files (both run_no_kb.slurm and run_with_kb.slurm): number of labels (nl); number of training epochs (ne); corpus name (corpus); learning rate (lr).
inference.slurm has three parameters: {corpus_name} ($1); {checkpoint_path} ($2); {mode} ($3, no_kb or with_kb).
process.slurm has three parameters: {corpus_name} ($1); {inference_only} ($2, true or false); {pretrained_kge} ($3, true or false).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Relation Extraction with RotatE Graph Embeddings

${\color{orange}Use \ our \ methods \ on \ your \ own \ corpora}$

✋ In case you want to change BERT model (PubMedBERT by default)

Installation

Quick Start

Input

Output

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 201 Commits
config		config
data		data
preprocessing		preprocessing
reproduce		reproduce
README.md		README.md
bert_model.py		bert_model.py
inference.slurm		inference.slurm
loader.py		loader.py
main.py		main.py
opt.py		opt.py
pretrained_download_links.csv		pretrained_download_links.csv
process.slurm		process.slurm
requirements.txt		requirements.txt
run_no_kb.slurm		run_no_kb.slurm
run_with_kb.slurm		run_with_kb.slurm
train_rotate_embedding.slurm		train_rotate_embedding.slurm

Bibliome/KBPubMedBERT

Folders and files

Latest commit

History

Repository files navigation

Relation Extraction with RotatE Graph Embeddings

${\color{orange}Use \ our \ methods \ on \ your \ own \ corpora}$

✋ In case you want to change BERT model (PubMedBERT by default)

Installation

Quick Start

Input

Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages