Avoid saving checkpoints in diffusers convergence test #1038

yeounoh · 2024-01-09T22:27:39Z

Description

Saving the checkpoints during training with full 833 batch size ran into OOM.

Tests

tested locally,

INFO:__main__:***** Running training *****
INFO:__main__:  Num examples = 833
INFO:__main__:  Num Epochs = 1
INFO:__main__:  Instantaneous batch size per device = 1
INFO:__main__:  Total train batch size (w. parallel, distributed & accumulation) = 1
INFO:__main__:  Gradient Accumulation steps = 1
INFO:__main__:  Total optimization steps = 1
Steps:   0%|                                                                                                              | 0/1 [00:00<?, ?it/s]/usr/local/lib/python3.8/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
Steps: 100%|██████████████████████████████████████████████████████████████████████████| 1/1 [01:44<00:00, 104.08s/it, lr=1e-5, step_loss=0.0488]


INFO:root:Joining rendezvous 'accelerate.utils.wait_for_everyone'...
{'image_encoder', 'requires_safety_checker'} was not found in config. Values will be initialized to default values.
                                                                                                                                               Loaded feature_extractor as CLIPImageProcessor from `feature_extractor` subfolder of CompVis/stable-diffusion-v1-4.        | 0/7 [00:00<?, ?it/s]
Loaded tokenizer as CLIPTokenizer from `tokenizer` subfolder of CompVis/stable-diffusion-v1-4.
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["bos_token_id"]` will be overriden.
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["eos_token_id"]` will be overriden.
Loaded safety_checker as StableDiffusionSafetyChecker from `safety_checker` subfolder of CompVis/stable-diffusion-v1-4.
                                                                                                                                               {'prediction_type', 'timestep_spacing'} was not found in config. Values will be initialized to default values.     | 6/7 [00:00<00:00, 13.69it/s]
Loaded scheduler as PNDMScheduler from `scheduler` subfolder of CompVis/stable-diffusion-v1-4.
Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 15.89it/s]
Steps: 100%|██████████████████████████████████████████████████████████████████████████| 1/1 [05:20<00:00, 320.96s/it, lr=1e-5, step_loss=0.0488]

Copy from #1038 to r2.2

yeounoh and others added 12 commits September 5, 2023 18:08

Skip saving pretrained model for diffuser.

157c08a

Skip saving pretrained model for diffuser, in nightly

bfd1dcc

Merge branch 'GoogleCloudPlatform:master' into master

19bdaa6

Merge branch 'GoogleCloudPlatform:master' into master

49e8b9e

Merge branch 'GoogleCloudPlatform:master' into master

b383611

Merge branch 'GoogleCloudPlatform:master' into master

d736b67

Merge branch 'GoogleCloudPlatform:master' into master

b70f4f9

Merge branch 'GoogleCloudPlatform:master' into master

a895d50

Merge branch 'GoogleCloudPlatform:master' into master

0f15476

Skip checkpointing in hf-diffusers, hf-glue conv tests.

6c95050

Use the latest transformers

5f58391

Avoid saving checkpoints

0a081a0

yeounoh requested review from will-cromar and RissyRan January 9, 2024 22:27

will-cromar approved these changes Jan 9, 2024

View reviewed changes

will-cromar merged commit 5aed632 into GoogleCloudPlatform:master Jan 9, 2024
3 checks passed

zpcore added a commit that referenced this pull request Jan 9, 2024

Update hf-diffusers.libsonnet

afe10f3

Copy from #1038 to r2.2

zpcore mentioned this pull request Jan 9, 2024

Update hf-diffusers.libsonnet #1039

Merged

will-cromar pushed a commit that referenced this pull request Jan 9, 2024

Update hf-diffusers.libsonnet (#1039)

a9260e9

Copy from #1038 to r2.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid saving checkpoints in diffusers convergence test #1038

Avoid saving checkpoints in diffusers convergence test #1038

yeounoh commented Jan 9, 2024

Avoid saving checkpoints in diffusers convergence test #1038

Avoid saving checkpoints in diffusers convergence test #1038

Conversation

yeounoh commented Jan 9, 2024

Description

Tests