LMDisorder is a fast and accurate protein disorder predictor that employed embedding generated by unsupervised pretrained language models as features.We showed that LMDisorder essentially surpassed the single-sequence-based methods by more than 6.0% and 18.0% on AUROC in two independent test sets, respectively. Furthermore, LMDisor-der showed equivalent or even better performance than the state-of-the-art profile-based technique SPOT-Disorder2.
python 3.7.9
numpy 1.19.1
pandas 1.1.0
pytorch 1.10.0
sentencepiece 0.1.96
transformers 4.18.0
tqdm 4.48.2
You need to prepare the pretrained language model ProtTrans to run LMDisorder: Download the pretrained ProtT5-XL-UniRef50 model (guide). # ~ 11.3 GB (download: 5.3 GB)
Simply run:
python LMDisorder_predict.py --fasta ./example/demo.fasta --device 'cpu' --model_path ./model/model.pkl
And the prediction results will be saved in
./example/result
We also provide the corresponding canonical prediction results in ./example/demo_result
for your reference.
We provide the datasets and the trained LMDisorder models here for those interested in reproducing our paper. The datasets used in this study are stored in ./datasets/
.
The trained LMDisorder models can be found under ./model/
.
Yidong Song ([email protected])
Yuedong Yang ([email protected])