Can we apply sparsity learning to scratch model training (no pretraining)? #123
Replies: 3 comments 2 replies
-
Yes, you can apply any sparse training method to a randomly-initialized model. But the strength of regularization in our benchmark should be tuned as it is designed for pre-trained models. I will update the training scripts for sparse training from scratch. |
Beta Was this translation helpful? Give feedback.
-
Thanks, @VainF. |
Beta Was this translation helpful? Give feedback.
-
Hi @ghimiredhikura , thanks for your research. |
Beta Was this translation helpful? Give feedback.
-
Dear @VainF, @Serjio42, and @lhx0525,
I have a research question regarding the current algorithm being used for sparsity learning, pruning, and fine-tuning. It seems that the algorithm utilizes a pre-trained network for sparsity learning. Therefore overall we require training the network for three cycles to get pruned network from scratch. For example, for ResNet20 training on the CIFAR10 dataset, we would need 200 x 3 total epochs to complete the pruning of the scratch model.
However, I have attempted to train with sparsity learning from scratch (without pretraining), but unfortunately, the network suffers from an inability to increase validation accuracy during training.
I have come across papers such as OTO and OTOv2, which also apply group sparsity learning but manage to train and prune the scratch model in just one training loop. As you seem to have a good understanding of group regularization could we get some ideas from these papers for group regularization?
While this repository is excellent for physically removing neurons to obtain a slimmed network, overall complexity also matters.
Thank you in advance.
Beta Was this translation helpful? Give feedback.
All reactions