OOM when batchSize=1 #13

chen3082 · 2021-01-30T11:03:10Z

Hi, thanks for your great work.
While running run_pretraining.py, I kept getting OOM for any size of the matrix.
I already reduce the batch size to 1 but didn't help.
I'm using 960M, TensorFlow-gpu1.10, Cuda toolkit 9.0
I'm wondering what version of TensorFlow are you using? Any thoughts on this issue?
Thanks in advance.

addiu · 2021-02-03T10:08:42Z

Hi,
I tried run_pretraining.py recently, works fine to me.
I'm using tensorflow-gpu=1.15.0, cudatoolkit=10.0.
First, I think that 960M has very limited VRAM, that could cause your issue.
Second, make sure that you use the same setting when running create_pretraining_data.py and run_pretraining.py. I had set once max_seq_length=512 in create_pretraining_data.py, but max_seq_length=128 in run_pretraining.py. That will also break the code, but not because of the OOM, I think.

xuyige · 2021-02-19T20:13:48Z

sorry for the late answer

as shown above, 960M may have very limited memory. a GPU with 12G memory can only contain batch size=6 if max_seq_len=512. so please reduce your max sequence length or improve your GPU, thank you!

xuyige · 2021-02-19T20:18:12Z

Hi,
I tried run_pretraining.py recently, works fine to me.
I'm using tensorflow-gpu=1.15.0, cudatoolkit=10.0.
First, I think that 960M has very limited VRAM, that could cause your issue.
Second, make sure that you use the same setting when running create_pretraining_data.py and run_pretraining.py. I had set once max_seq_length=512 in create_pretraining_data.py, but max_seq_length=128 in run_pretraining.py. That will also break the code, but not because of the OOM, I think.

thank you for your issue

could you please show more detail about your error?
otherwise, I forgot which version of tenserflow we used, but following the official bert repo, I suggest you trying to downgrade your tensorflow version (the official repo shows tensorflow-gpu >= 1.11.0, so maybe 1.11 or 1.12 can solve your problem)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM when batchSize=1 #13

OOM when batchSize=1 #13

chen3082 commented Jan 30, 2021

addiu commented Feb 3, 2021 •

edited

Loading

xuyige commented Feb 19, 2021

xuyige commented Feb 19, 2021

OOM when batchSize=1 #13

OOM when batchSize=1 #13

Comments

chen3082 commented Jan 30, 2021

addiu commented Feb 3, 2021 • edited Loading

xuyige commented Feb 19, 2021

xuyige commented Feb 19, 2021

addiu commented Feb 3, 2021 •

edited

Loading