-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
run_test_pred.sh can't run on CPU #8
Comments
Hi @michaelanekson, Here's a more native and fluent version of the message: Hi Michael, Thanks for bringing this error to our attention. It could be due to either an environment issue or a problem with the input PubTator file. I've updated the requirements.txt to address potential environment-related issues. As for the input file, would you mind sharing yours with me? Alternatively, you could try using the bc8_biored_task1_val.pubtator file from the BC8-BioRED dataset. You can find it in this ZIP archive: https://ftp.ncbi.nlm.nih.gov/pub/lu/BC8-BioRED-track/BC8_BioRED_Subtask1_PubTator.zip. Give it a shot and see if the program runs without errors. Let me know if you need any further assistance! Best, |
Here my input file, you can download it. Actually, my input file is already PubTator file but I don't why It doesn't work. You can see it in my uploaded file. Recently, I solve the problem by loading your LLM model manually using transformers. The situation is I am trying to load Pubtator3 BioLinkBERT and it works for me. Would you see my modified LLM code? maybe you can correct me if I use your LLM model in the wrong way. One more thing about environmental issue, I think if we use the most updated transformers, tensorflow, or biospacy, it does not work properly. So, I agree with your current modified requirement |
Hi @michaelanekson, Thank you for sharing the files. I've taken a look at your sample file, and I noticed you're using PubTator NE types instead of BioRED NE types. If you've downloaded your sample file from PubTator, you can use the converter mentioned in this GitHub issue comment to convert it to BioRED NE types. Since the NE type tags are different, it might negatively impact the script you provided. I'd recommend using our leaderboard's dataset and evaluation to test it before applying it to your own dataset. Also, if you're only interested in results for PubMed abstracts, we've already processed the entire PubMed database. You can find that data here. |
Hello, I already try with your BioRED dataset and it works in annotation part but it is not working in relation extraction part. The model that I use is pretrained_model_biolinkbert. Here's the error description: |
@ptlai is it possiblt to use the annotation created from AIONER? |
Sorry for the late reply. It looks like you were attempting to run BioREx with the pretrained BioLinkBERT model to predict on the BC8_BioRED_Subtask1_PubTator/bc8_biored_task1_val.pubtator file. You then encountered the error you mentioned. I tried replicating your setup, following the installation steps from https://github.com/ncbi/BioREx/?tab=readme-ov-file#installation. I'm running Windows 11 with WSL2 (Linux subsystem). Interestingly, I was able to run the prediction without any issues, as you can see in the attached screenshot. To help troubleshoot, could you provide some more details? Such complete Python package versions and operating system. Thank you |
@ptlai Thank you for your continuous support. The problem is not with setting up the project, but due to the input mismatch (PS: Which I found 1-2 days back), which has already been clarified by you in another thread (Link: #9 (comment)) today. Thanks.. |
Thank you for the reply. I have already solved the problem. Let's say the problem is about the input. I tried the BioRED format and it works fine. The conclusion of this issue for the readers is the author's code (run_test_pred.sh) can run fluently for the CPU (when there's a CUDA or GPU-related error, just ignore it) and the input must be transformed from Pubtator format into BioRED format before the analysis. I close this issue. Now, the situation is I have questions about the output and I need your advice. |
Hello, I am trying to run your code for "predicting new data" part. I follow all your command from creating new environment, install the requirement based on your txt file, until running the bash script. However, when I ran the bash script, I get this issue:
(biorex) [michaela95@BLOOM BioREx]$ bash scripts/run_biorex_new.sh
Converting the dataset into BioREx input format
2024-07-04 14:00:33.031276: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-04 14:00:33.054684: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-04 14:00:33.054743: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-04 14:00:33.071248: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-04 14:00:34.221688: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
number_unique_YES_instances 0
Generating RE predictions
2024-07-04 14:00:38.500149: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-04 14:00:38.524087: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-04 14:00:38.524137: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-04 14:00:38.540547: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-04 14:00:39.539068: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[INFO|training_args.py:804] 2024-07-04 14:00:42,112 >> using
logging_steps
to initializeeval_steps
to 10[INFO|training_args.py:1023] 2024-07-04 14:00:42,112 >> PyTorch: setting up devices
[INFO|training_args.py:885] 2024-07-04 14:00:42,113 >> The default value for the training argument
--report_to
will change in v5 (from all installed integrations to none). In v5, you will need to use--report_to all
to get the same behavior as now. You should start updating your code and make this info disappear :-).[INFO|training_args_tf.py:189] 2024-07-04 14:00:42,114 >> Tensorflow: setting up strategy
07/04/2024 14:00:42 - INFO - main - n_replicas: 1, distributed training: False, 16-bits training: False
07/04/2024 14:00:42 - INFO - main - Training/evaluation parameters TFTrainingArguments(
_n_gpu=0,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=10,
evaluation_strategy=IntervalStrategy.STEPS,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gcp_project=None,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=5e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=-1,
log_level=-1,
log_level_replica=-1,
log_on_each_node=True,
logging_dir=biorex_model/runs/Jul04_14-00-42_BLOOM,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=10,
logging_strategy=IntervalStrategy.STEPS,
lr_scheduler_type=SchedulerType.LINEAR,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=10.0,
optim=OptimizerNames.ADAMW_HF,
output_dir=biorex_model,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=32,
per_device_train_batch_size=16,
poly_power=1.0,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=biorex_model,
save_on_each_node=False,
save_steps=10,
save_strategy=IntervalStrategy.STEPS,
save_total_limit=None,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
tf32=None,
tpu_metrics_debug=False,
tpu_name=None,
tpu_num_cores=None,
tpu_zone=None,
use_legacy_prediction_loop=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
xla=False,
xpu_backend=None,
)
[INFO|tokenization_utils_base.py:1776] 2024-07-04 14:00:42,124 >> loading file pretrained_model/vocab.txt
[INFO|tokenization_utils_base.py:1776] 2024-07-04 14:00:42,124 >> loading file pretrained_model/tokenizer.json
[INFO|tokenization_utils_base.py:1776] 2024-07-04 14:00:42,124 >> loading file pretrained_model/added_tokens.json
[INFO|tokenization_utils_base.py:1776] 2024-07-04 14:00:42,124 >> loading file pretrained_model/special_tokens_map.json
[INFO|tokenization_utils_base.py:1776] 2024-07-04 14:00:42,124 >> loading file pretrained_model/tokenizer_config.json
=======================>label2id {'None': 0, 'Association': 1, 'Bind': 2, 'Comparison': 3, 'Conversion': 4, 'Cotreatment': 5, 'Drug_Interaction': 6, 'Negative_Correlation': 7, 'Positive_Correlation': 8, 'None-CID': 9, 'CID': 10, 'None-PPIm': 11, 'PPIm': 12, 'None-AIMED': 13, 'None-DDI': 14, 'None-BC7': 15, 'None-phargkb': 16, 'None-GDA': 17, 'None-DISGENET': 18, 'None-EMU_BC': 19, 'None-EMU_PC': 20, 'None-HPRD50': 21, 'None-PHARMGKB': 22, 'ACTIVATOR': 23, 'AGONIST': 24, 'AGONIST-ACTIVATOR': 25, 'AGONIST-INHIBITOR': 26, 'ANTAGONIST': 27, 'DIRECT-REGULATOR': 28, 'INDIRECT-DOWNREGULATOR': 29, 'INDIRECT-UPREGULATOR': 30, 'INHIBITOR': 31, 'PART-OF': 32, 'PRODUCT-OF': 33, 'SUBSTRATE': 34, 'SUBSTRATE_PRODUCT-OF': 35, 'mechanism': 36, 'int': 37, 'effect': 38, 'advise': 39, 'AIMED-Association': 40, 'HPRD-Association': 41, 'EUADR-Association': 42, 'None-EUADR': 43, 'Indirect_conversion': 44, 'Non_conversion': 45}
=======================>positive_label
=======================>use_balanced_neg False
=======================>max_neg_scale 2
07/04/2024 14:00:42 - INFO - main - pos_label_ids
07/04/2024 14:00:42 - INFO - main - [1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 16, 17, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45]
[INFO|configuration_utils.py:652] 2024-07-04 14:00:42,153 >> loading configuration file pretrained_model/config.json
[INFO|configuration_utils.py:690] 2024-07-04 14:00:42,154 >> Model config BertConfig {
"_name_or_path": "pretrained_model",
"architectures": [
"BertForSequenceClassification"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"finetuning_task": "text-classification",
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"id2label": {
"0": "None",
"1": "Association",
"2": "Bind",
"3": "Comparison",
"4": "Conversion",
"5": "Cotreatment",
"6": "Drug_Interaction",
"7": "Negative_Correlation",
"8": "Positive_Correlation",
"9": "None-CID",
"10": "CID",
"11": "None-PPIm",
"12": "PPIm",
"13": "None-AIMED",
"14": "None-DDI",
"15": "None-BC7",
"16": "None-phargkb",
"17": "None-GDA",
"18": "None-DISGENET",
"19": "None-EMU_BC",
"20": "None-EMU_PC",
"21": "None-HPRD50",
"22": "None-PHARMGKB",
"23": "ACTIVATOR",
"24": "AGONIST",
"25": "AGONIST-ACTIVATOR",
"26": "AGONIST-INHIBITOR",
"27": "ANTAGONIST",
"28": "DIRECT-REGULATOR",
"29": "INDIRECT-DOWNREGULATOR",
"30": "INDIRECT-UPREGULATOR",
"31": "INHIBITOR",
"32": "PART-OF",
"33": "PRODUCT-OF",
"34": "SUBSTRATE",
"35": "SUBSTRATE_PRODUCT-OF",
"36": "mechanism",
"37": "int",
"38": "effect",
"39": "advise",
"40": "AIMED-Association",
"41": "HPRD-Association",
"42": "EUADR-Association",
"43": "None-EUADR",
"44": "Indirect_conversion",
"45": "Non_conversion"
},
"initializer_range": 0.02,
"intermediate_size": 3072,
"label2id": {
"ACTIVATOR": 23,
"AGONIST": 24,
"AGONIST-ACTIVATOR": 25,
"AGONIST-INHIBITOR": 26,
"AIMED-Association": 40,
"ANTAGONIST": 27,
"Association": 1,
"Bind": 2,
"CID": 10,
"Comparison": 3,
"Conversion": 4,
"Cotreatment": 5,
"DIRECT-REGULATOR": 28,
"Drug_Interaction": 6,
"EUADR-Association": 42,
"HPRD-Association": 41,
"INDIRECT-DOWNREGULATOR": 29,
"INDIRECT-UPREGULATOR": 30,
"INHIBITOR": 31,
"Indirect_conversion": 44,
"Negative_Correlation": 7,
"Non_conversion": 45,
"None": 0,
"None-AIMED": 13,
"None-BC7": 15,
"None-CID": 9,
"None-DDI": 14,
"None-DISGENET": 18,
"None-EMU_BC": 19,
"None-EMU_PC": 20,
"None-EUADR": 43,
"None-GDA": 17,
"None-HPRD50": 21,
"None-PHARMGKB": 22,
"None-PPIm": 11,
"None-phargkb": 16,
"PART-OF": 32,
"PPIm": 12,
"PRODUCT-OF": 33,
"Positive_Correlation": 8,
"SUBSTRATE": 34,
"SUBSTRATE_PRODUCT-OF": 35,
"advise": 39,
"effect": 38,
"int": 37,
"mechanism": 36
},
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.18.0",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 28933
}
[INFO|modeling_tf_utils.py:1776] 2024-07-04 14:00:42,177 >> loading weights file pretrained_model/tf_model.h5
/media/data/biorex/lib/python3.10/site-packages/keras/src/layers/layer.py:1331: UserWarning: Layer 'tf_bert_for_sequence_classification' looks like it has unbuilt state, but Keras is not able to trace the layer
call()
in order to build it automatically. Possible causes:call()
method of your layer may be crashing. Try to__call__()
the layer eagerly on some test input first to see if it works. E.g.x = np.random.random((3, 4)); y = layer(x)
call()
method is correct, then you may need to implement thedef build(self, input_shape)
method on your layer. It should create all variables used by the layer (e.g. by callinglayer.build()
on all its children layers).Exception encountered: ''Exception encountered when calling TFBertMainLayer.call().
'NoneType' object has no attribute 'shape'
Arguments received by TFBertMainLayer.call():
• input_ids=tf.Tensor(shape=(3, 5), dtype=int32)
• attention_mask=None
• token_type_ids=None
• position_ids=None
• head_mask=None
• inputs_embeds=None
• encoder_hidden_states=None
• encoder_attention_mask=None
• past_key_values=None
• use_cache=None
• output_attentions=False
• output_hidden_states=False
• return_dict=True
• training=False''
warnings.warn(
/media/data/biorex/lib/python3.10/site-packages/keras/src/layers/layer.py:372: UserWarning:
build()
was called on layer 'tf_bert_for_sequence_classification', however the layer does not have abuild()
method implemented and it looks like it has unbuilt state. This will cause the layer to be marked as built, despite not being actually built, which may cause failures down the line. Make sure to implement a properbuild()
method.warnings.warn(
Traceback (most recent call last):
File "/home/michaela95/NLP_immune_checkpoint_gene/BERT_testing/BioREx/src/run_ncbi_rel_exp.py", line 884, in
main()
File "/home/michaela95/NLP_immune_checkpoint_gene/BERT_testing/BioREx/src/run_ncbi_rel_exp.py", line 687, in main
model = TFAutoModelForSequenceClassification.from_pretrained(
File "/media/data/biorex/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 446, in from_pretrained
return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, config=config, **kwargs)
File "/media/data/biorex/lib/python3.10/site-packages/transformers/modeling_tf_utils.py", line 1803, in from_pretrained
model(model.dummy_inputs) # build the network with dummy inputs
File "/media/data/biorex/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/media/data/biorex/lib/python3.10/site-packages/transformers/modeling_tf_utils.py", line 383, in run_call_with_unpacked_inputs
return func(self, **unpacked_inputs)
File "/media/data/biorex/lib/python3.10/site-packages/transformers/models/bert/modeling_tf_bert.py", line 1633, in call
outputs = self.bert(
File "/media/data/biorex/lib/python3.10/site-packages/transformers/modeling_tf_utils.py", line 383, in run_call_with_unpacked_inputs
return func(self, **unpacked_inputs)
File "/media/data/biorex/lib/python3.10/site-packages/transformers/models/bert/modeling_tf_bert.py", line 850, in call
encoder_outputs = self.encoder(
File "/media/data/biorex/lib/python3.10/site-packages/optree/ops.py", line 594, in tree_map
return treespec.unflatten(map(func, *flat_args))
AttributeError: Exception encountered when calling TFBertMainLayer.call().
'NoneType' object has no attribute 'shape'
Arguments received by TFBertMainLayer.call():
• input_ids=tf.Tensor(shape=(3, 5), dtype=int32)
• attention_mask=None
• token_type_ids=None
• position_ids=None
• head_mask=None
• inputs_embeds=None
• encoder_hidden_states=None
• encoder_attention_mask=None
• past_key_values=None
• use_cache=None
• output_attentions=False
• output_hidden_states=False
• return_dict=True
• training=False
cp: cannot stat 'biorex_model/test_results.tsv': No such file or directory
2024-07-04 14:00:46.035576: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-04 14:00:46.059361: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-04 14:00:46.059424: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-04 14:00:46.075918: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-04 14:00:47.150909: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
File "/home/michaela95/NLP_immune_checkpoint_gene/BERT_testing/BioREx/src/utils/run_pubtator_eval.py", line 1557, in
dump_pred_2_pubtator_file(in_pubtator_file = in_test_pubtator_file,
File "/home/michaela95/NLP_immune_checkpoint_gene/BERT_testing/BioREx/src/utils/run_pubtator_eval.py", line 206, in dump_pred_2_pubtator_file
add_relation_pairs_dict(
File "/home/michaela95/NLP_immune_checkpoint_gene/BERT_testing/BioREx/src/utils/run_pubtator_eval.py", line 83, in add_relation_pairs_dict
testdf = pd.read_csv(in_gold_tsv_file, sep="\t", index_col=0)
File "/media/data/biorex/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
return _read(filepath_or_buffer, kwds)
File "/media/data/biorex/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 620, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/media/data/biorex/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1620, in init
self._engine = self._make_engine(f, self.engine)
File "/media/data/biorex/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1898, in _make_engine
return mapping[engine](f, **self.options)
File "/media/data/biorex/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 93, in init
self._reader = parsers.TextReader(src, **kwds)
File "parsers.pyx", line 581, in pandas._libs.parsers.TextReader.cinit
pandas.errors.EmptyDataError: No columns to parse from file
Would you tell what should I do in dealing with this situation?
The text was updated successfully, but these errors were encountered: