Cross-Validation & Regularization Technique Implemented #10

Sanjeev-Kumar78 · 2024-10-17T01:06:49Z

Description

File: main.py

Added K-Fold Cross-Validation:
- Implemented K-fold cross-validation to train the model on different data splits.
- Split the dataset into K folds and trained the model on K-1 folds while validating on the remaining fold.
- Added L2 Weight Decay to Optimizer: Updated the optimizer to include L2 weight decay.
- optimizer = torch.optim.AdamW(unet.parameters(), lr=1e-4, weight_decay=1e-5)

File: lora.py

Added Dropout to LoRA Layers:
- Modified the LoRALayer class to include a dropout layer.
- Updated the forward method in LoRALayer to apply dropout to the input tensor x.
- Modified the LoRALinear class to pass the dropout rate to the LoRALayer.

File: train.py

Implemented Early Stopping:
- Added early stopping based on validation loss to prevent overfitting and optimize training time.
- Tracked the validation loss and stopped training if the loss did not improve after a set number of epochs.

Fixes # (#9 #8 #7 )

Type of Change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)
This change requires a documentation update
Other (please specify):

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
New and existing unit tests pass locally with my changes

Additional Information

Name: Sanjeev Kumar
Email:[email protected]
Roll No:2405605
Branch: CSE

@sohambuilds

rycerzes · 2024-10-17T15:38:33Z

@sohambuilds please review this

sohambuilds

Please add 10-15 generation samples post fine-tuning within your directory, It is suggested but not compulsory, to use another set of images for your data(could be any character or object), and finetune on that. If possible, include CLIP scores or TIFA scores to evaluate performance.

Ensure that based on the number of images in the dataset, you adjust your learning rate. 1e-5 is recommended for small datasets. of 10-20 images. 1e-4 is not appropriate.

Early stopping should not be implemented for few-shot learning/finetuning.

Sanjeev-Kumar78 · 2024-10-17T18:01:42Z

Please add 10-15 generation samples post fine-tuning within your directory, It is suggested but not compulsory, to use another set of images for your data(could be any character or object), and finetune on that. If possible, include CLIP scores or TIFA scores to evaluate performance.

Ensure that based on the number of images in the dataset, you adjust your learning rate. 1e-5 is recommended for small datasets. of 10-20 images. 1e-4 is not appropriate.

Thank you for the feedback. I will attempt to make the suggested changes. However, my laptop does not have a powerful GPU, so I’ve been training on CPU, which takes approximately ~40 minutes per epoch for each k-fold. Unfortunately, I've also faced issues with both Colab and Kaggle—Colab had GPU memory overflow problems, and Kaggle produced a different error.

I’ll continue troubleshooting these issues, but due to these constraints, progress may be slower.

sohambuilds · 2024-10-17T19:31:44Z

Running on colab should be possible, as we have tried it. If there is a specific issue that you need to troubleshoot, you may join the WhatsApp group for the ML contributors: https://chat.whatsapp.com/Kx8okfEdirALcC8UeFtl5j

Also, do note that implementing early stopping for few shot learning(very few samples) is not desirable.

Sanjeev-Kumar78 · 2024-10-17T19:59:38Z

Got it. 👍

Sanjeev-Kumar78 · 2024-10-17T20:49:19Z

I'm trying to run this notebook: https://colab.research.google.com/drive/1Zv6eLFRHovlJgxumTtPozstIqT8cBJ-I?usp=sharing
But having these errors:

In TPU v2-8 : ERROR: Unknown command line flag 'xla_latency_hiding_scheduler_rerun
In T4 GPU: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 22.00 MiB. GPU 0 has a total capacity of 14.75 GiB of which 3.06 MiB is free. Process 8429 has 14.74 GiB memory in use. Of the allocated memory 14.36 GiB is allocated by PyTorch, and 253.63 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
In CPU : System RAM gets high and kills the process.

sohambuilds · 2024-10-18T02:10:36Z

Please run the following in your colab runtime:
!pip uninstall -y tensorflow !pip install tensorflow-cpu Runtime > Restart runtime

sohambuilds · 2024-10-18T04:57:18Z

Please commit the updated code, and your dataset, if you're using a new one. You can add the samples later, I will merge it for now.

…scoring, and update optimizer learning rate

Sanjeev-Kumar78 · 2024-10-18T09:34:43Z

I managed to run the notebook: https://colab.research.google.com/drive/1Zv6eLFRHovlJgxumTtPozstIqT8cBJ-I?usp=sharing in T4 GPU mode by optimizing the GPU's memory management during the model training in the train.py file. I haven't updated these changes in this repository because I don't believe they're necessary here. Instead, I have updated the zip file, which will download automatically when the notebook is executed.

Please review the changes and let me know if there's anything that needs improvement, @sohambuilds.

sohambuilds · 2024-10-18T17:50:12Z

CONTRIBUTIONS/Sanjeev-Kumar78/src/train.py

+        val_loss, val_clip_score = validate(val_loader, unet, text_encoder, vae, noise_scheduler, device, pipe)
+        print(f"Epoch {epoch+1}/{num_epochs}, Validation Loss: {val_loss:.4f}, Validation CLIP Score: {val_clip_score:.4f}")
+
+        # Check for early stopping


Please remove early stopping as requested earlier. early stopping is implemented in large datasets to check for overfitting. Not in the case of only 10 images. model may need more than a patience of 5 to learn in such cases.
Good to know that you were able to make it run on colab.

…ied training process

sohambuilds · 2024-10-18T18:05:12Z

Merging now. Sample generation is not proper. need to check.

Cross-Validation & Regularization Technique Implemented

3406aae

dead8309 requested review from sohambuilds, rycerzes and twinkle485 October 17, 2024 15:34

sohambuilds requested changes Oct 17, 2024

View reviewed changes

Enhance training and image generation: add parallel processing, CLIP …

2adfbdc

…scoring, and update optimizer learning rate

Sanjeev-Kumar78 requested a review from sohambuilds October 18, 2024 05:47

Sanjeev-Kumar78 added 2 commits October 18, 2024 14:30

Added image generation script using Stable Diffusion and LoRA weights

9a25954

Fix files path updated in generate_image_from_pretrained.py script

0bd7f0a

sohambuilds reviewed Oct 18, 2024

View reviewed changes

Refactor train_loop function: remove early stopping logic for simplif…

9d8f2b0

…ied training process

sohambuilds merged commit 188db05 into MLSAKIIT:main Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross-Validation & Regularization Technique Implemented #10

Cross-Validation & Regularization Technique Implemented #10

Sanjeev-Kumar78 commented Oct 17, 2024 •

edited

Loading

rycerzes commented Oct 17, 2024

sohambuilds left a comment •

edited

Loading

Sanjeev-Kumar78 commented Oct 17, 2024

sohambuilds commented Oct 17, 2024

Sanjeev-Kumar78 commented Oct 17, 2024

Sanjeev-Kumar78 commented Oct 17, 2024 •

edited

Loading

sohambuilds commented Oct 18, 2024

sohambuilds commented Oct 18, 2024

Sanjeev-Kumar78 commented Oct 18, 2024

sohambuilds Oct 18, 2024 •

edited

Loading

sohambuilds commented Oct 18, 2024

Cross-Validation & Regularization Technique Implemented #10

Cross-Validation & Regularization Technique Implemented #10

Conversation

Sanjeev-Kumar78 commented Oct 17, 2024 • edited Loading

Description

Type of Change

Checklist:

Additional Information

rycerzes commented Oct 17, 2024

sohambuilds left a comment • edited Loading

Choose a reason for hiding this comment

Sanjeev-Kumar78 commented Oct 17, 2024

sohambuilds commented Oct 17, 2024

Sanjeev-Kumar78 commented Oct 17, 2024

Sanjeev-Kumar78 commented Oct 17, 2024 • edited Loading

sohambuilds commented Oct 18, 2024

sohambuilds commented Oct 18, 2024

Sanjeev-Kumar78 commented Oct 18, 2024

sohambuilds Oct 18, 2024 • edited Loading

Choose a reason for hiding this comment

sohambuilds commented Oct 18, 2024

Sanjeev-Kumar78 commented Oct 17, 2024 •

edited

Loading

sohambuilds left a comment •

edited

Loading

Sanjeev-Kumar78 commented Oct 17, 2024 •

edited

Loading

sohambuilds Oct 18, 2024 •

edited

Loading