Tips for training MTL on large dataset #43

negacy · 2019-02-27T20:48:54Z

Are there tips on how to train MLT model on large datasets that have millions of trainable parameters. I am trying to train this on 1TB memory of machine but still facing memory limit.

Thanks.

nreimers · 2019-02-27T21:48:01Z

How large are your train/dev/test datasets (in terms of size). The architecture loads the complete datasets into memory. If they are too large, your machine will crash. You then need to change the code so that the data is streamed from disk and not read into memory.

If your datasets are small (say, smaller than 10 GB), the issue is somewhere else.

negacy · 2019-02-27T22:11:08Z

The dataset is small, less than 3MB per task. I have seen the training failing due to memory limit for any model that has more than 1 million trainable parameters. The training goes smoothly for models that have less than 1 million trainable parameters.

nreimers · 2019-02-27T22:48:16Z

That is strange. How many tasks are you training?

It should be no issue to train with more than 1 million parameters, even with much smaller memory. I personally have about 16 GB of RAM and training runs smoothly on larger networks with datasets.

Are you using Python 3.6 (or newer) and a recent Linux system?

negacy · 2019-02-27T23:19:48Z

Yes, I am using Python 3.6 on CentOS version 7. I am having this issue even in two tasks.

nreimers · 2019-02-28T12:14:42Z

I sadly don't have an idea why this could be the case. It should work fine.

You could also test this implementation:
https://github.com/UKPLab/elmo-bilstm-cnn-crf

It works similar to this repository, but it also allows to use ELMo representation. Maybe there this issue does not happen?

negacy · 2019-02-28T14:54:21Z

Still the same issue even with the elmo implementation. Here is the error:
Training: 0 Batch [00:00, ? Batch/s]/tmp/slurmd/job1924456/slurm_script: line 18: 21081 Segmentation fault (core dumped) python Train_multitask.py

nreimers · 2019-03-01T12:41:06Z

Is python actually allocating that much memory? Maybe the OS imposes some limit on the memory / heap / stack size, so that the scripts crashes even if only e.g. 4 GB RAM are allocated.

Maybe this thread helps:
https://stackoverflow.com/questions/10035541/what-causes-a-python-segmentation-fault

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tips for training MTL on large dataset #43

Tips for training MTL on large dataset #43

negacy commented Feb 27, 2019

nreimers commented Feb 27, 2019

negacy commented Feb 27, 2019 •

edited

Loading

nreimers commented Feb 27, 2019

negacy commented Feb 27, 2019 •

edited

Loading

nreimers commented Feb 28, 2019

negacy commented Feb 28, 2019

nreimers commented Mar 1, 2019

Tips for training MTL on large dataset #43

Tips for training MTL on large dataset #43

Comments

negacy commented Feb 27, 2019

nreimers commented Feb 27, 2019

negacy commented Feb 27, 2019 • edited Loading

nreimers commented Feb 27, 2019

negacy commented Feb 27, 2019 • edited Loading

nreimers commented Feb 28, 2019

negacy commented Feb 28, 2019

nreimers commented Mar 1, 2019

negacy commented Feb 27, 2019 •

edited

Loading

negacy commented Feb 27, 2019 •

edited

Loading