You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I don't understand this error message that I am getting, did anybody experience anything similar?
I am running the training script provided in examples on my own dataset, making sure to pass the data in the same format as provided in the examples. I was getting a cuda error before but after lowering the batch size this is the error that is preventing me from running the code.
Here is the whole error log:
File "/disk/ocean/zein/neural/BERT_4_doc_class_training.py", line 141, in <module>
model.fit((train_documents, train_labels), (dev_documents,dev_labels))
File "/disk/ocean/zein/venv/lib64/python3.6/site-packages/bert_document_classification/document_bert.py", line 185, in fit
batch_document_sequence_lengths, device=self.args['device'])
File "/disk/ocean/zein/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/disk/ocean/zein/venv/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/disk/ocean/zein/venv/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/disk/ocean/zein/venv/lib64/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/disk/ocean/zein/venv/lib64/python3.6/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in replica 3 on device 3.
Original Traceback (most recent call last):
File "/disk/ocean/zein/venv/lib64/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/disk/ocean/zein/venv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
TypeError: forward() missing 2 required positional arguments: 'document_batch' and 'document_sequence_lengths'
@sjmielke@AndriyMulyar could this have anything to do with a package version? Is there a way you can provide me with a requirements file with the exact package versions used to succesfully run the code?
The text was updated successfully, but these errors were encountered:
For more context, I was initially facing the same issue as the one described in this post #18 and have downgraded torch to version 1.4.0 as described in the comments.
I managed to get the code to run by reducing the number of cudas used, I was initially using all 4 of the GPUs I had available but after limiting it to only 1 GPU the code runs. I don't know what might be causing the issue with using multiple GPUs.
I don't understand this error message that I am getting, did anybody experience anything similar?
I am running the training script provided in examples on my own dataset, making sure to pass the data in the same format as provided in the examples. I was getting a cuda error before but after lowering the batch size this is the error that is preventing me from running the code.
Here is the whole error log:
@sjmielke @AndriyMulyar could this have anything to do with a package version? Is there a way you can provide me with a requirements file with the exact package versions used to succesfully run the code?
The text was updated successfully, but these errors were encountered: