The code in this package computes the entropy, mutual information and MMSE of multi-layer GLMs given orthogonally-invariant matrices of arbitrary spectrum. More details available in arXiv:1805.09785 and our published NeurIPS 2018 paper
Install package
First make sure you have all the requirements installed
- Python 3.x
- Cython
- Numpy
- Matplotlib
- Scipy
Then type
python setup.py install --user
You can then try the Demo.ipynb
notebook, and scripts in the examples
folder.
Adding new priors/activations/ensembles
In order to add new priors/activations, new classes should be written and
added to the dnner/{priors, activations, ensembles}
folders. Look at the
files already present in these folders for examples; the methods in them
(iter_a
, iter_v
, iter_theta
, eval_i
, ...) should be reimplemented.
For the priors, one must implement: iter_v
, eq. 55 in the paper,
eval_i
, eq. 34a, and eval_rho
, defined between eqs. 32 and 33.
For outputs: iter_a
, eq. 52, eval_i
, eq. 34b. For interfaces:
iter_a
, eq. 51, iter_v
, eq. 54, eval_i
, eq. 33, eval_rho
, see
footnote in page 19. Finally, for ensembles, iter_theta
, eq. 50a,
iter_llmse
, eq. 60, and eval_f
, eq. 6.
After implementing the new class, you can add it to the __init__.py
inside
priors
/activations
/ensembles
so that it can be more easily imported.
Currently the following priors are available
Normal
Bimodal
SpikeSlab
as well as the following activations
Linear
Probit
ReLU
LeakyReLU
HardTanh
and the following ensembles
Gaussian
Empirical
I keep get warnings throughout the iteration, should I be worried about it?
Numerical integrations performed for the (leaky) ReLU and the hard tanh are a bit tricky: the integrator might at occasion complain about lack of precision. Despite of that, the final result seems in our experience to be consistent. In any case, a deeper study of the integration procedure should be performed at some point.
Do I need to have noise in my activations?
It is essential to have noise in the outermost activation so that quantities do not diverge. Usually one can get by with zero noise in the inner layers; however, if the variables in the model are discrete, noise should also be added there.
Can it happen that the iteration does not converge?
As described in the Supplementary Material of our paper, the fixed-point iteration we use depends on a solution to a particular equation being found at each step, and occasionaly this might not happen. In that case, using the ML-VAMP state evolution instead of the fixed-point iteration should lead to better results.
The ML-VAMP state evolution is in general more stable, but rarely it might also happen that variances/precisions become negative. In our experience however, one of the two schemes will always work.
- M. Gabrié, A. Manoel, C. Luneau, J. Barbier, N. Macris, F. Krzakala and L. Zdeborová, Entropy and mutual information in models of deep neural networks, Advances in Neural Information Processing Systems 31 (NIPS 2018).