cuda out of memory #10

wlhgtc · 2018-03-24T02:34:17Z

Thanks for your code, it helps me a lot. And I try to write on my own but I meet some questions.
When I rewrite the loss function as follows:
`class Custom_Loss(nn.Module):
def init(self):
super(Custom_Loss, self).init()

def loss_function(self, data, labels):
    loss = Variable(torch.zeros(1))
    for d, l in zip(data, labels):
        loss -= torch.log(d[l]).cpu()
    loss /= data.size(0)
    return loss

def forward(self, p1, p2, S, E):
    """
    N for batch and T for length of context

    :param p1: A tensor (N,T) represents for possibility of  choosing each word as answer(start)
    :param p2: A tensor (N,T) represents for possibility of  choosing each word as answer(end)
    :param S: A tensor for each query's start position
    :param E: A tensor for each query's end position
    :return: Loss of the BiDAF model
    """

    l1 = self.loss_function(p1, S)
    l2 = self.loss_function(p2, E)
    loss=l1+l2
    return loss`

I meet the error: cuda out of memory, I check my code and could not find the reason, can you help me?

The text was updated successfully, but these errors were encountered:

jojonki · 2018-03-24T19:56:20Z

@wlhgtc Thank you for your report. I am not sure about this. But cuda out of memory is exactly the problem of your GPU memory. Is your GPU memory sufficient? If you sum up Variables directly, this problem will happen. If so, .detach or .data may be helpful.

wlhgtc · 2018-03-25T02:03:49Z

@jojonki Thanks for your reply. I debug my code the whole day. I test my model layer by layer(I comment out the backward step and optimizer step). The "out of memory error" occur when I compute the matrix S(S=W[H,U,HºU] ) with the batch size 60(according to the paper) . But I find you config is 20. The model goes well in 30 size.
Finally , I find a phenomenon in 30 size : the memory wen in 9G at first and went down for7.2G finally remain steady. I don't know how you deal with the data. I use the torchtext package, for each batch, this package will padding the context according to the max length of context automatically . I think there are some context that are too long(in some batches) so that the memory run out!
So I wonder if you padding the context in batch the same as me. And why you set the batch size 20?
By the way ,I use GTX 1080Ti with Pytorch 0.3!

Vimos · 2018-06-25T09:54:29Z

Is the memory increasing in your case? Mine runs out of memory in the middle of training.

[20180625-174613] Epoch 0 74.2%, loss_p1: 3.338, loss_p2: 2.325
p1 acc: 9.000% (6077/65000), p2 acc: 10.000% (6521/65000)
 75%|████████████████████████████████████████████▋               | 3266/4379 [07:33<02:34,  7.21it/s]THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524590031827/work/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
  File "main.py", line 237, in <module>
    train(model, train_data, optimizer, ema, start_epoch=args.start_epoch)
  File "main.py", line 153, in train
    (loss_p1+loss_p2).backward()
  File "/home/vimos/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/vimos/anaconda3/lib/python3.6/site-packages/torch/autograd/__init__.py", line 89, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1524590031827/work/aten/src/THC/generic/THCStorage.cu:58

wlhgtc · 2018-06-25T09:59:20Z

@Vimos There are some sentences with lenth>500. You'd better set them with a fixed length(for me 300)

Vimos · 2018-06-25T11:58:29Z

@wlhgtc Thanks for the advice.
If I keep using the default value for length, I have to change to a smaller batch size of 10. This still require 7709MiB memory.

wlhgtc · 2018-06-26T00:03:56Z

If the memory keeps steady, it's fine. It's a large model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda out of memory #10

cuda out of memory #10

wlhgtc commented Mar 24, 2018

jojonki commented Mar 24, 2018

wlhgtc commented Mar 25, 2018

Vimos commented Jun 25, 2018

wlhgtc commented Jun 25, 2018

Vimos commented Jun 25, 2018 •

edited

Loading

wlhgtc commented Jun 26, 2018

cuda out of memory #10

cuda out of memory #10

Comments

wlhgtc commented Mar 24, 2018

jojonki commented Mar 24, 2018

wlhgtc commented Mar 25, 2018

Vimos commented Jun 25, 2018

wlhgtc commented Jun 25, 2018

Vimos commented Jun 25, 2018 • edited Loading

wlhgtc commented Jun 26, 2018

Vimos commented Jun 25, 2018 •

edited

Loading