Can we apply sparsity learning to scratch model training (no pretraining)? #123

ghimiredhikura · 2023-03-25T08:20:10Z

ghimiredhikura
Mar 25, 2023

I have a research question regarding the current algorithm being used for sparsity learning, pruning, and fine-tuning. It seems that the algorithm utilizes a pre-trained network for sparsity learning. Therefore overall we require training the network for three cycles to get pruned network from scratch. For example, for ResNet20 training on the CIFAR10 dataset, we would need 200 x 3 total epochs to complete the pruning of the scratch model.

However, I have attempted to train with sparsity learning from scratch (without pretraining), but unfortunately, the network suffers from an inability to increase validation accuracy during training.

I have come across papers such as OTO and OTOv2, which also apply group sparsity learning but manage to train and prune the scratch model in just one training loop. As you seem to have a good understanding of group regularization could we get some ideas from these papers for group regularization?

While this repository is excellent for physically removing neurons to obtain a slimmed network, overall complexity also matters.

Thank you in advance.

VainF · 2023-03-25T09:04:11Z

VainF
Mar 25, 2023
Maintainer

Yes, you can apply any sparse training method to a randomly-initialized model. But the strength of regularization in our benchmark should be tuned as it is designed for pre-trained models. I will update the training scripts for sparse training from scratch.

0 replies

ghimiredhikura · 2023-03-26T07:13:22Z

ghimiredhikura
Mar 26, 2023
Author

Thanks, @VainF.

0 replies

Serjio42 · 2023-03-29T09:05:03Z

Serjio42
Mar 29, 2023

Hi @ghimiredhikura , thanks for your research.
I've just looked at OTOv2 and wondering whether it works fine/better than torch-pruning core method? Did you benchmarked them to compare? They are not compared in corresponding papers.
There are so many papers for different pruning methods, and many are not practical for usage, and are not compared to each other. Uhh...

2 replies

VainF Mar 29, 2023
Maintainer

I had a discussion last month with Dr. Tianyi Chen (the author of OTOv2) about the potential integration of the 'Dual Half-Space Project Gradient (DHSPG)' method into Torch-Pruning. Tianyi is an expert in sparse learning. In my opinion, the DHSPG method is an impressive approach, and I think it would be very useful if we can support DHSPG in Torch-Pruning.

Nevertheless, it's worth noting that there are no one-size-fits-all importance criteria for pruning. As evidenced by https://github.com/MingSun-Tse/Why-the-State-of-Pruning-so-Confusing, even a simple norm pruner can be competitive enough for pruning. However, having more choices available for users is always a positive thing.

ghimiredhikura Mar 29, 2023
Author

Hi,

No, I haven't benchmarked OTOv2 for comparison. But their result seems impressive just with one cycle of sparse training. As @VainF implemented this project so well that we can scan through the groups and decide whether to prune neurons or not in an easy way. But we need sparse training to push groups towards zero for pruning.

Therefore, to make this project successful I think it is necessary to add sparse training schemes such as DHSPG or something new. At the same time, it will be nice if sparse training can be done from some epochs of original network training to save overall cost. I am looking for some ideas for sparse training dividing the groups into pruning and non-pruning sets for a given pruning ratio.

Any idea @Serjio42, @VainF? ^^

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we apply sparsity learning to scratch model training (no pretraining)? #123

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Can we apply sparsity learning to scratch model training (no pretraining)? #123

ghimiredhikura Mar 25, 2023

Replies: 3 comments · 2 replies

VainF Mar 25, 2023 Maintainer

ghimiredhikura Mar 26, 2023 Author

Serjio42 Mar 29, 2023

VainF Mar 29, 2023 Maintainer

ghimiredhikura Mar 29, 2023 Author

ghimiredhikura
Mar 25, 2023

Replies: 3 comments 2 replies

VainF
Mar 25, 2023
Maintainer

ghimiredhikura
Mar 26, 2023
Author

Serjio42
Mar 29, 2023

VainF Mar 29, 2023
Maintainer

ghimiredhikura Mar 29, 2023
Author