-
Hernandez Lobato, JM and Adams, R P. Probabilistic backpropagation for scalable learning of bayesian neural networks - ICML - 2015 - https://arxiv.org/abs/1502.05336 - Expectation propagation
-
Boris Hanin - Which Neural Net Architectures Give Rise To Exploding and Vanishing Gradients? - NIPS 2018 - https://arxiv.org/abs/1801.03744 - Failure modes in Deep Learning
-
Akhilesh Gotmare, Nitish Shirish Keskar, Caiming Xiong, Richard Socher - A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation - 2018 - https://arxiv.org/abs/1810.13243 - empirical study of warmup
-
Ari S. Morcos, Haonan Yu, Michela Paganini, Yuandong Tian - One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers - 2019 - https://arxiv.org/abs/1906.02773 - Lottery ticket discussions
-
Mingxing Tan, Quoc V. Le - EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks - 2019 - https://arxiv.org/abs/1905.11946 - SOTA Computer Vision
-
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems - Sergey Levine, Aviral Kumar, George Tucker, Justin Fu - 2020 - https://arxiv.org/abs/2005.01643
-
https://pytorch.org/tutorials/intermediate/pruning_tutorial.html https://towardsdatascience.com/5-advanced-pytorch-tools-to-level-up-your-workflow-d0bcf0603ad5
-Pruning neural networks without any data by iteratively conserving synaptic flow by Hidenori Tanaka, Daniel Kunin, Daniel L. K. Yamins and Surya Ganguli
-
Mamba
-
Adapters (Olga)
- Modularity paper: https://arxiv.org/pdf/2302.11529.pdf
-
RAG
-
Descript
-
Supermasks: https://arxiv.org/pdf/2006.14769.pdf