Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource exhausted #12

Open
rajae-Bens opened this issue Dec 23, 2020 · 5 comments
Open

Resource exhausted #12

rajae-Bens opened this issue Dec 23, 2020 · 5 comments

Comments

@rajae-Bens
Copy link

Hi,

first, thank u for having sharing ur cod with us

I am trying to further pretraining a bert model on my own corpus on colab gpu but I am getting an error of resource exhausted
can someone tell me how to fix this

Also what are the expected output of this further pretraining
Are they the bert tenserflow files that we can use for fine-tuning ( checkpoint, config, and vocab)?

Thank u

@chen3082
Copy link

Hey men, I encounter the same issue. Are you able to resolve it?
I keep getting this,
OP_REQUIRES failed at cwise_ops_common.cc:70 : Resource exhausted: OOM when allocating tensor with shape[768,768].
I already reduce the batch size to 3 but didn't work.

@xuyige
Copy link
Owner

xuyige commented Feb 19, 2021

Hi,

first, thank u for having sharing ur cod with us

I am trying to further pretraining a bert model on my own corpus on colab gpu but I am getting an error of resource exhausted
can someone tell me how to fix this

Also what are the expected output of this further pretraining
Are they the bert tenserflow files that we can use for fine-tuning ( checkpoint, config, and vocab)?

Thank u

sorry for the late answer!
i am not very familiar with tensorflow, but there are some suggestions:

  1. check the version of tensorflow and make sure it is 1.1x
  2. if it has OOM problems, please reduce your batch size or reduce your max sequence length. the official bert repo has provided an example:
    image
  3. we do not have some resources for fine-tuning with tensorflow, you can check from the official bert repo if you want

@xuyige
Copy link
Owner

xuyige commented Feb 19, 2021

Hey men, I encounter the same issue. Are you able to resolve it?
I keep getting this,
OP_REQUIRES failed at cwise_ops_common.cc:70 : Resource exhausted: OOM when allocating tensor with shape[768,768].
I already reduce the batch size to 3 but didn't work.

sorry for the late answer!
if you have OOM problems, please reduce your batch size and max sequence length
the official bert repo has provided some example with a 12G GPU

@rajae-Bens
Copy link
Author

Hi,

thank u for answering

I reduced the train_batch_size to 8 and max_seq_length to 40

but I still get the resource exhausted error

I am running the code on colab gpu 12G RAM
any ideas plz

thank u

@xuyige
Copy link
Owner

xuyige commented Mar 31, 2021

Hi,

thank u for answering

I reduced the train_batch_size to 8 and max_seq_length to 40

but I still get the resource exhausted error

I am running the code on colab gpu 12G RAM
any ideas plz

thank u

as your description:
does your model contain some other NN modules?
does your colab gpu need to share with others?
do you have enough cpu sources (e.g., it is cpu OOM but not gpu OOM)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants