This repository provides source codes of our paper [BERT-GT: Cross-sentence n-ary relation extraction with BERT and Graph Transformer] and the BERT-GT implementation of our paper [BioRED: A Rich Biomedical Relation Extraction Dataset].
- GPU: NVIDIA Tesla V100 SXM2
- Anaconda: Anaconda3
- Python: python3.6
- Tensorflow: tensorflow-gpu==2
pip install -r requirements.txt
We assume that you use the Anaconda envrionment, thus the above command will use "pip" to install tensorflow-gpu the packages.
Because the process of generating the input datasets is trivial (have to install the scispacy), we provide our dataset_format_convert.py, and you can use it to convert the CDR, n-ary, and BioRED datasets into the input datasets of BERT-GT.
bash build_cdr_dataset.sh
bash build_nary_dataset.sh
bash build_biored_dataset.sh
BERT-GT used Biobert's pre-trained model because they support longer text (with 512 sequence length).
After download the model, please use the below command to unzip it.
tar -xvzf biobert_v1.1_pubmed.tar.gz
bash run_cdr_exp.sh <CUDA_VISIBLE_DEVICES>
Please replace the above <CUDA_VISIBLE_DEVICES> with your GPUs' IDs. Eg: '0,1' for GPU devices 0 and 1.
For example
bash run_cdr_exp.sh 0,1
bash run_nary_exp.sh <CUDA_VISIBLE_DEVICES> <TASK_NAME> <INPUT_DATASET_DIR>
Please replace the above <CUDA_VISIBLE_DEVICES> with your GPUs' IDs. Eg: '0,1' for GPU devices 0 and 1.
Please replace the above <TASK_NAME> with one of "nary_dgv_bin", "nary_dgv_mul", "nary_dv_bin", or "nary_dv_mul". The above "dgv" and "dv" mean DRUG-GENE-MUTATION and DRUG-MUTATION, respectively; Similarly, the above "bin" and "mul" mean two classes and multiple classes, respectively.
Please replace the above <INPUT_DATASET_DIR> with either "datasets/nary/processed/all" or "datasets/nary/processed/only_single_sent". The "datasets/nary/processed/all" includes intra-sentence and inter-sentence instances. The "datasets/nary/processed/only_single_sent" only includes intra-sentence instances.
For example
bash run_nary_exp.sh 0,1 nary_dgv_bin "datasets/nary/processed/all"
bash run_biored_exp.sh <CUDA_VISIBLE_DEVICES>
For example
bash run_biored_exp.sh 0,1
- Lai P. T. and Lu Z. BERT-GT: Cross-sentence n-ary relation extraction with BERT and Graph Transformer. Bioinformatics. 2021.
@article{lai2021bertgt,
author = {Po-Ting Lai and Zhiyong Lu},
title = {BERT-GT: Cross-sentence n-ary relation extraction with BERT and Graph Transformer},
journal = {Bioinformatics},
year = {2021},
publisher = {Oxford University Press}
}
- Luo L., Lai P. T., Wei C. H., Arighi C. N. and Lu Z. BioRED: A Rich Biomedical Relation Extraction Dataset. Briefing in Bioinformatics. 2022.
@article{luo2022biored,
author = {Luo, Ling and Lai, Po-Ting and Wei, Chih-Hsuan and Arighi, Cecilia N and Lu, Zhiyong},
title = {BioRED: A Rich Biomedical Relation Extraction Dataset},
journal = {Briefing in Bioinformatics},
year = {2022},
publisher = {Oxford University Press}
}
We are grateful to the authors of AGGCN, BERT, BioBERT, GS LSTM, and NCBI BlueBERT to make the data and codes publicly available. We would like to thank Dr. Zhijiang Guo for helping us to reproduce the results of AGGCN on the n-ary dataset. We thank Dr. Chih-Husan Wei for his assistance on revising the manuscript.
This tool shows the results of research conducted in the Computational Biology Branch, NCBI. The information produced on this website is not intended for direct diagnostic use or medical decision-making without review and oversight by a clinical professional. Individuals should not change their health behavior solely on the basis of information produced on this website. NIH does not independently verify the validity or utility of the information produced by this tool. If you have questions about the information produced on this website, please see a health care professional. More information about NCBI's disclaimer policy is available.