Label Deconvolution for Node Representation Learning on Large-scale Attributed Graphs against Learning Bias
This repository is an implementation of LD. arxiv
We propose an efficient and effective label regularization technique, namely Label Deconvolution (LD), to alleviate the learning bias from that by the joint training.
The core packages are as follows.
- python=3.8
- ogb=1.3.3
- numpy=1.19.5
- dgl=0.8.0
- pytorch=1.10.2
- pyg=2.0.3
- hydra-core==1.3.1
To use our exact environment, one may install the environment by the following command:
conda env create -f environment.yml
Performance on ogbn-arxiv(5 runs):
Methods | Validation accuracy | Test accuracy |
---|---|---|
|
76.84 ± 0.09 | 76.22 ± 0.10 |
|
77.62 ± 0.08 | 77.26 ± 0.17 |
Performance on ogbn-products(5 runs):
Methods | Validation accuracy | Test accuracy |
---|---|---|
|
94.15 ± 0.03 | 86.45± 0.12 |
|
93.99 ± 0.02 | 87.18 ± 0.04 |
Performance on ogbn-protein(5 runs):
Methods | Validation accuracy | Test accuracy |
---|---|---|
|
95.27 ± 0.07 | 89.42 ± 0.07 |
We provide extracted node features for each dataset at Features.
Before starting the training process, there are certain preparations that need to be done.s
- Generate tokens from the original node attributes following GLEM and the "protein" folder. For convenience, we provide the LM token of each dataset at Token. The tokenizers are from the following pre-trained models in Huggingface.
-
ogbn-arxiv: deberta-base (REVGAT)
-
ogbn-products: deberta-base (GAMLP) and bert-base-uncased (SAGN)
-
ogbn-proteins: esm2-t33-650M-UR50D (GAT)
-
- Download the LM corresponding to each dataset from the Hugging Face official website.
- Modify the
path
andtoken_folder
values in thetransformer/conf/LM/*.yaml
folder.
- REVGAT:
cd transformer
bash scripts/shell_arxiv_revgat.sh
- GAMLP
cd transformer
bash scripts/shell_product_gamlp.sh
- SAGN
cd transformer
bash scripts/shell_product_sagn.sh
- GAT
cd transformer
bash scripts/shell_protein_gat.sh