Validation step uses 2x memory and 12x compute time #558

mashdragon · 2023-12-31T07:44:40Z

mashdragon
Dec 31, 2023

The validation stage of running an experiment consistently uses about 2x the memory of the training stage, and experiments that fail during the validation stage are considered failures and (in the UI at least) cannot be used in inference. This restriction is very limiting since it means I can only train with half my available memory for the rest to be used during validation. (Might this be related to the default settings, since 512 tokens as the max length = 2x 256, the max number of response tokens?)

Additionally, the validation processing takes an extraordinary amount of time, almost 12x longer than training! In my configuration, I have validation size = 0.01 and data sample = 0.3, but for training the logs say INFO: train loss: 2.33: 5%|4 | 51/1032 [00:27<08:51, 1.85it/s] while validation seems to have about the same number of samples: INFO: validation progress: 10%|9 | 102/1033 [37:20<5:41:59, 22.04s/it] I would have expected fewer samples to be used for validation.

Is there a way for me to skip the validation step of the experiment? I just want to finetune a model via LoRA.

Answered by pascal-pfeiffer

Mar 14, 2024

You can sample the validation set to reduce the time further:

Or create a custom validation dataset with only very few samples.

It is expected that the validation metrics that rely on generation of new output (BLEU and GPT) are slower than training. Perplexity should have about the same speed as your training.

View full answer

mashdragon · 2023-12-31T19:08:20Z

mashdragon
Dec 31, 2023
Author

I was using the BLEU metric, and switching to Perplexity resulted in much reduced memory usage (comparable to training memory usage) and reduced compute time (still takes a long time, but not as bad as BLEU)

0 replies

pascal-pfeiffer · 2024-03-14T10:57:53Z

pascal-pfeiffer
Mar 14, 2024
Maintainer

You can sample the validation set to reduce the time further:

Or create a custom validation dataset with only very few samples.

It is expected that the validation metrics that rely on generation of new output (BLEU and GPT) are slower than training. Perplexity should have about the same speed as your training.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation step uses 2x memory and 12x compute time #558

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Validation step uses 2x memory and 12x compute time #558

mashdragon Dec 31, 2023

Replies: 2 comments

mashdragon Dec 31, 2023 Author

pascal-pfeiffer Mar 14, 2024 Maintainer

mashdragon
Dec 31, 2023

mashdragon
Dec 31, 2023
Author

pascal-pfeiffer
Mar 14, 2024
Maintainer