This repo hosts the material and references covered in the DLG learning group in Fall 2021. This README
will be updated regularly before each meeting, so please check back for more information and resources. Contributions to this repo will also be greatly appreciated, so please feel free to fork and pull request with any updates you find necessary. Please star ⭐️ the repository for staying tuned with the updates
Click here for jumping to the reading list
The main objective of the learning group is to dive deeper into fundamentals of Machine Learning (Deep Learning in particular) through a mathematical lens. This group will provide a forum for graduate students, postdocs and faculty interested in Deep Learning to learn about the fundamentals and advances in the field while fostering broader discussions and collaborations. This group is organized by Ali Heydari.
The group will meet Fridays from 11:00am-12:30pm in ACS 362B, beginning September 3rd.
The first half hour of the sessions will be dedicated to reading, followed by discussions of the topics. This format is intended to provide a dedicated time for participants to read about the specific topics during each session, so that all can engage in fruitful discussions and contribute to the group’s collective learning. Given the time constraints however, participants are expected to have a working knowledge of machine learning and other relevant prerequisites.
Given the broad range of topics and the short amount of time, we would like to tailor each session to address the interest of the audience in the following meta topics:
- Universal Approximation Theorem
- Evaluation of ML/DL Models
- Optimization in Deep Learning
- Deep Learning in Computer Vision
- Deep Learning in Natural Language Processing
Here we provide a list of hand-picked resources for each of the meeting sessions, in addition to references for pre-requisites needed before attending the sessions. Per UC Merced's policy, the links will expire in 90 days. However, we have included the citations to each paper in the Acknowledgment section.
Meeting Date | Meeting Topic | Reading Resources | Pre-Requisites | Additional Notes |
---|---|---|---|---|
Sep. 3rd, 2021 | Universal Approximation Theorem | Ref 1, Ref 2 | Ref i, Ref ii, Ref iv | |
Sep. 10th, 2021 | Universal Approximation Theorem (Continuation of last week) | Ref iii (Main), Ref 3, Ref 4 | Ref i, Ref ii, Ref iv | |
Sep. 17th, 2021 | Universal Approximation Theorem of Operators | Ref 4 | Ref v, Ref vi | The Applied Analysis book by Hunter et Nachtergaele Chapter 11 can provide useful background on distributions (specifically Tempered distributions |
Sep. 24th, 2021 | Universal Approximation Theorem of Operators (DeepONets) | Ref 4 | Ref v, Ref vi | The Applied Analysis book by Hunter et Nachtergaele Chapter 11 can provide useful background on distributions (specifically Tempered distributions |
Oct. 1st, 2021 | Appropriate Metrics and Model Evaluation | Ref 5 (Main), Ref 6 | Ref vii, Ref viii | |
Oct. 8th, 2021 | Evaluation with Imbalance Data (Fairness and Bias in AI) | Ref 7 (Section 5) | ||
Oct. 15th, 2021 | Optimization in DL (Basics) | Ref 8 | Ref ix | The Nocedal and Wright textbook on optimization is a great reference for preliminary background on gradient descent |
Oct. 22nd, 2021 | On the Convergance of Adaptive Optimizers | Ref 9 (Main), Ref 10 | Ref ix | The Nocedal and Wright textbook on optimization is a great reference for preliminary background on gradient descent |
Oct. 29th, 2021 | DL in Natural Language Processing | Ref 12 | Ref xii, Ref xiii | |
Nov. 5th, 2021 | Additive Attention for Neural Language Translation | Ref 13 | Ref xiv, Ref xv | Supplementary references are for a quick introduction to encoders and decoders, as well as a source on RNN's and their applications. |
Nov. 12th, 2021 | Attention is All You Need | Ref 14 | Ref xvi, Ref xvii | Supplementary references are provided for condensed and visual descriptions of Transformers to aid in understanding the primary reference this week. |
@article{Ref 1,
abstract = {This paper deals with the approximation behaviour of soft computing techniques. First, we give a survey of the results of universal approximation theorems achieved so far in various soft computing areas, mainly in fuzzy control and neural networks. We point out that these techniques have common approximation behaviour in the sense that an arbitrary function of a certain set of functions (usually the set of continuous function, C) can be approximated with arbitrary accuracy ε on a compact domain. The drawback of these results is that one needs unbounded numbers of ``building blocks'' (i.e. fuzzy sets or hidden neurons) to achieve the prescribed ε accuracy. If the number of building blocks is restricted, it is proved for some fuzzy systems that the universal approximation property is lost, moreover, the set of controllers with bounded number of rules is nowhere dense in the set of continuous functions. Therefore it is reasonable to make a trade-off between accuracy and the number of the building blocks, by determining the functional relationship between them. We survey this topic by showing the results achieved so far, and its inherent limitations. We point out that approximation rates, or constructive proofs can only be given if some characteristic of smoothness is known about the approximated function.},
author = {Domonkos Tikk and L{\'a}szl{\'o} T. K{\'o}czy and Tam{\'a}s D. Gedeon},
doi = {https://doi.org/10.1016/S0888-613X(03)00021-5},
issn = {0888-613X},
journal = {International Journal of Approximate Reasoning},
keywords = {Universal approximation performed by fuzzy systems and neural networks, Kolmogorov's theorem, Approximation behaviour of soft computing techniques, Course of dimensionality, Nowhere denseness, Approximation rates, Constructive proofs},
number = {2},
pages = {185-202},
title = {A survey on universal approximation and its limits in soft computing techniques},
url = {https://www.sciencedirect.com/science/article/pii/S0888613X03000215},
volume = {33},
year = {2003},
Bdsk-Url-1 = {https://www.sciencedirect.com/science/article/pii/S0888613X03000215},
Bdsk-Url-2 = {https://doi.org/10.1016/S0888-613X(03)00021-5}}
@article{Ref 2,
abstract = {In this paper, we present a review of some recent works on approximation by feedforward neural networks. A particular emphasis is placed on the computational aspects of the problem, i.e. we discuss the possibility of realizing a feedforward neural network which achieves a prescribed degree of accuracy of approximation, and the determination of the number of hidden layer neurons required to achieve this accuracy. Furthermore, a unifying framework is introduced to understand existing approaches to investigate the universal approximation problem using feedforward neural networks. Some new results are also presented. Finally, two training algorithms are introduced which can determine the weights of feedforward neural networks, with sigmoidal activation neurons, to any degree of prescribed accuracy. These training algorithms are designed so that they do not suffer from the problems of local minima which commonly affect neural network learning algorithms.},
author = {Franco Scarselli and Ah {Chung Tsoi}},
doi = {https://doi.org/10.1016/S0893-6080(97)00097-X},
issn = {0893-6080},
journal = {Neural Networks},
keywords = {Approximation by neural networks, Approximation of polynomials, Constructive approximation, Feedforward neural networks, Multilayer neural networks, Radial basis functions, Universal approximation},
number = {1},
pages = {15-37},
title = {Universal Approximation Using Feedforward Neural Networks: A Survey of Some Existing Methods, and Some New Results},
url = {https://www.sciencedirect.com/science/article/pii/S089360809700097X},
volume = {11},
year = {1998},
Bdsk-Url-1 = {https://www.sciencedirect.com/science/article/pii/S089360809700097X},
Bdsk-Url-2 = {https://doi.org/10.1016/S0893-6080(97)00097-X}}
@ARTICLE{Ref 3,
author={Bianchini, Monica and Scarselli, Franco},
journal={IEEE Transactions on Neural Networks and Learning Systems},
title={On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures},
year={2014},
volume={25},
number={8},
pages={1553-1565},
doi={10.1109/TNNLS.2013.2293637}}
}
@article{Ref 4,
title={DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators},
author={Lu Lu and Pengzhan Jin and George Em Karniadakis},
year={2020},
eprint={1910.03193},
archivePrefix={arXiv},
primaryClass={cs.LG}
}}
@article{Ref 5,
title={DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators},
author={Lu Lu and Pengzhan Jin and George Em Karniadakis},
year={2020},
eprint={1910.03193},
archivePrefix={arXiv},
primaryClass={cs.LG}
}}
@inproceedings{Ref 6,
abstract = {Statistical learning algorithms often rely on the Euclidean distance. In practice, non-Euclidean or non-metric dissimilarity measures may arise when contours, spectra or shapes are compared by edit distances or as a consequence of robust object matching [1,2]. It is an open issue whether such measures are advantageous for statistical learning or whether they should be constrained to obey the metric axioms.},
address = {Berlin, Heidelberg},
author = {P{\k{e}}kalska, El{\.{z}}bieta and Harol, Artsiom and Duin, Robert P. W. and Spillmann, Barbara and Bunke, Horst},
booktitle = {Structural, Syntactic, and Statistical Pattern Recognition},
editor = {Yeung, Dit-Yan and Kwok, James T. and Fred, Ana and Roli, Fabio and de Ridder, Dick},
isbn = {978-3-540-37241-7},
pages = {871--880},
publisher = {Springer Berlin Heidelberg},
title = {Non-Euclidean or Non-metric Measures Can Be Informative},
year = {2006}}
@article{Ref 7,
author = {Mehrabi, Ninareh and Morstatter, Fred and Saxena, Nripsuta and Lerman, Kristina and Galstyan, Aram},
title = {A Survey on Bias and Fairness in Machine Learning},
year = {2021},
issue_date = {July 2021},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {54},
number = {6},
issn = {0360-0300},
url = {https://doi.org/10.1145/3457607},
doi = {10.1145/3457607},
journal = {ACM Comput. Surv.},
month = jul,
articleno = {115},
numpages = {35},
keywords = {machine learning, deep learning, representation learning, natural language processing, Fairness and bias in artificial intelligence}
}
@article{ref 8,
author = {Sebastian Ruder},
title = {An overview of gradient descent optimization algorithms},
journal = {CoRR},
volume = {abs/1609.04747},
year = {2016},
url = {http://arxiv.org/abs/1609.04747},
eprinttype = {arXiv},
eprint = {1609.04747},
timestamp = {Mon, 13 Aug 2018 16:48:10 +0200},
biburl = {https://dblp.org/rec/journals/corr/Ruder16.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
@misc{Ref 9,
title={On the Convergence of Adam and Beyond},
author={Sashank J. Reddi and Satyen Kale and Sanjiv Kumar},
year={2019},
eprint={1904.09237},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@article{Ref 10,
title={On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes},
author={Xiaoyu Li and Francesco Orabona},
year={2019},
eprint={1805.08114},
archivePrefix={arXiv},
primaryClass={stat.ML}
}
@InProceedings{Ref11,
title = {Multi-Task Learning with User Preferences: Gradient Descent with Controlled Ascent in Pareto Optimization},
author = {Mahapatra, Debabrata and Rajan, Vaibhav},
booktitle = {Proceedings of the 37th International Conference on Machine Learning},
pages = {6597--6607},
year = {2020},
editor = {III, Hal Daumé and Singh, Aarti},
volume = {119},
series = {Proceedings of Machine Learning Research},
month = {13--18 Jul},
publisher = {PMLR},
pdf = {http://proceedings.mlr.press/v119/mahapatra20a/mahapatra20a.pdf},
url = {https://proceedings.mlr.press/v119/mahapatra20a.html},
abstract = {Multi-Task Learning (MTL) is a well established paradigm for jointly learning models for multiple correlated tasks. Often the tasks conflict, requiring trade-offs between them during optimization. In such cases, multi-objective optimization based MTL methods can be used to find one or more Pareto optimal solutions. A common requirement in MTL applications, that cannot be addressed by these methods, is to find a solution satisfying userspecified preferences with respect to task-specific losses. We advance the state-of-the-art by developing the first gradient-based multi-objective MTL algorithm to solve this problem. Our unique approach combines multiple gradient descent with carefully controlled ascent to traverse the Pareto front in a principled manner, which also makes it robust to initialization. The scalability of our algorithm enables its use in large-scale deep networks for MTL. Assuming only differentiability of the task-specific loss functions, we provide theoretical guarantees for convergence. Our experiments show that our algorithm outperforms the best competing methods on benchmark datasets.}
}
@book{Ref i-Ref ii,
title={Deep Learning},
author={Ian Goodfellow and Yoshua Bengio and Aaron Courville},
publisher={MIT Press},
note={\url{http://www.deeplearningbook.org}},
year={2016}
}
@article{Ref iii,
abstract = {In this paper we demonstrate that finite linear combinations of compositions of a fixed, univariate function and a set of affine functionals can uniformly approximate any continuous function ofn real variables with support in the unit hypercube; only mild conditions are imposed on the univariate function. Our results settle an open question about representability in the class of single hidden layer neural networks. In particular, we show that arbitrary decision regions can be arbitrarily well approximated by continuous feedforward neural networks with only a single internal, hidden layer and any continuous sigmoidal nonlinearity. The paper discusses approximation properties of other possible types of nonlinearities that might be implemented by artificial neural networks.},
author = {Cybenko, G. },
da = {1989/12/01},
date-added = {2021-09-02 13:04:35 -0700},
date-modified = {2021-09-02 13:04:35 -0700},
doi = {10.1007/BF02551274},
id = {Cybenko1989},
isbn = {1435-568X},
journal = {Mathematics of Control, Signals and Systems},
number = {4},
pages = {303--314},
title = {Approximation by superpositions of a sigmoidal function},
ty = {JOUR},
url = {https://doi.org/10.1007/BF02551274},
volume = {2},
year = {1989},
Bdsk-Url-1 = {https://doi.org/10.1007/BF02551274}}
@book{Ref iv,
title={Infinite Dimensional Analysis: A Hitchhiker’s Guide},
author={Charalambos D. AliprantisKim C. Border},
publisher={Springer},
note={\url{https://link.springer.com/book/10.1007%2F3-540-29587-9}},
year={2006}
}
@ARTICLE{Ref v,
author={Tianping Chen and Hong Chen},
journal={IEEE Transactions on Neural Networks},
title={Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems},
year={1995},
volume={6},
number={4},
pages={911-917},
doi={10.1109/72.392253}}