Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage #29

Open
momchilmm opened this issue Nov 7, 2019 · 1 comment
Open

Memory usage #29

momchilmm opened this issue Nov 7, 2019 · 1 comment

Comments

@momchilmm
Copy link
Member

It might be possible to significantly improve the memory usage especially when computing gradients, but how exactly requires some thought.

@momchilmm
Copy link
Member Author

The issue comes from the fact that there are a lot of intermediate matrices being constructed while computing the final matrix for diagonalization. In inference, some of those intermediate variables are dropped as the program progresses, but in gradient computation this is particularly bad, as all intermediate results are stored. Here are some thoughts of what can be improved:

  • In the computation of the matrix elements, there are matrices being constructed from vectors using broadcasting (or np.outer). These correspond to rank-1 updates that create rank-2 intermediate results. Some of those could be wrapped in a primitive that only depends on the input vectors, which should reduce the memory needed for the backward pass.

  • Also in that computation, if a layer is homogeneous, as is very often the case for the claddings, the corresponding matrix is only non-zero on the main diagonal (due to the element-wise multiplication by eps_inv_mat). However, the other matrices are still computed in their entirety.

  • Maybe something can also be improved in the way the matrix is constructed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant