Skip to content
This repository has been archived by the owner on Feb 25, 2019. It is now read-only.

basis set computation doesn't seem to work right #8

Open
franknoe opened this issue Jul 30, 2015 · 8 comments
Open

basis set computation doesn't seem to work right #8

franknoe opened this issue Jul 30, 2015 · 8 comments

Comments

@franknoe
Copy link
Contributor

This code:

import variational
from variational.basissets.ramachandran import RamachandranBasis
alabasis = RamachandranBasis('A', radians=False)
import numpy as np
atraj = np.array([[-120, 60],[120, 120]])
alabasis.map(atraj)

leads to this output:

array([[ 1.        ,  0.2007158 , -0.79413052],
       [-0.        ,  0.        , -0.        ]])

which can't be right. The last row shouldn't be zero. At least the first column must always be 1.0

@fvitalini
Copy link

Hi Frank,

no it does make sense.
The basis functions of the capped amino acids are evaluated on a 36X36 grid, but not all of the microstates are actually populated.

ac_a_nhme_rev_0

The micro state corresponding to the phi/psi combination [120,120] it is simply never visited.

What I have tested where the functions within ramachandran.py.
I checked that by providing an np-array containing the phi/psi time series of all residues, the function would construct a matrix containing the the trajectory projected onto the basis functions and that it would produce the same results as my old code.

Francesca

@fnueske
Copy link

fnueske commented Jul 30, 2015

This is a non-trivial point isn't it? The single amino-acid eigenvectors
are undefined for unpopulated states, but in principle, these states
might show up in simulations of more complicated systems. Francesca,
have you encountered this before?
We can at least modify the first eigenvector to be equal to one everywhere.

Am 30.07.15 um 16:27 schrieb fvitalini:

Hi Frank,

no it does make sense.
The basis functions of the capped amino acids are evaluated on a 36X36
grid, but not all of the microstates are actually populated.

ac_a_nhme_rev_0
https://cloud.githubusercontent.com/assets/13469315/8985400/7e792fd4-36d7-11e5-99e5-8ab21dd7fb85.jpg

The micro state corresponding to the phi/psi combination [120,120] it
is simply never visited.

What I have tested where the functions within ramachandran.py.
I checked that by providing an np-array containing the phi/psi time
series of all residues, the function would construct a matrix
containing the the trajectory projected onto the basis functions and
that it would produce the same results as my old code.

Francesca


Reply to this email directly or view it on GitHub
#8 (comment).

@franknoe
Copy link
Contributor Author

I agree

Am 30/07/15 um 16:42 schrieb Feliks Nüske:

This is a non-trivial point isn't it? The single amino-acid eigenvectors
are undefined for unpopulated states, but in principle, these states
might show up in simulations of more complicated systems. Francesca,
have you encountered this before?
We can at least modify the first eigenvector to be equal to one
everywhere.

Am 30.07.15 um 16:27 schrieb fvitalini:

Hi Frank,

no it does make sense.
The basis functions of the capped amino acids are evaluated on a 36X36
grid, but not all of the microstates are actually populated.

ac_a_nhme_rev_0

https://cloud.githubusercontent.com/assets/13469315/8985400/7e792fd4-36d7-11e5-99e5-8ab21dd7fb85.jpg

The micro state corresponding to the phi/psi combination [120,120] it
is simply never visited.

What I have tested where the functions within ramachandran.py.
I checked that by providing an np-array containing the phi/psi time
series of all residues, the function would construct a matrix
containing the the trajectory projected onto the basis functions and
that it would produce the same results as my old code.

Francesca


Reply to this email directly or view it on GitHub

#8 (comment).


Reply to this email directly or view it on GitHub
#8 (comment).


Prof. Dr. Frank Noe
Head of Computational Molecular Biology group
Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354
Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

@franknoe
Copy link
Contributor Author

Please still provide an example (a trajectory chunk) and demonstrate the
use of

  • Single Ramachandran Basis
  • Product Basis
  • Estimating correlation matrix from the result
  • Solving the generalized eigenproblem

Each of those exclusively using code from variational, and each should
just be a few lines of code

I guess that's to both Francesca and Feliks

Am 30/07/15 um 16:27 schrieb fvitalini:

Hi Frank,

no it does make sense.
The basis functions of the capped amino acids are evaluated on a 36X36
grid, but not all of the microstates are actually populated.

ac_a_nhme_rev_0
https://cloud.githubusercontent.com/assets/13469315/8985400/7e792fd4-36d7-11e5-99e5-8ab21dd7fb85.jpg

The micro state corresponding to the phi/psi combination [120,120] it
is simply never visited.

What I have tested where the functions within ramachandran.py.
I checked that by providing an np-array containing the phi/psi time
series of all residues, the function would construct a matrix
containing the the trajectory projected onto the basis functions and
that it would produce the same results as my old code.

Francesca


Reply to this email directly or view it on GitHub
#8 (comment).


Prof. Dr. Frank Noe
Head of Computational Molecular Biology group
Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354
Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

@fnueske
Copy link

fnueske commented Jul 30, 2015

Ok, but today I don't have the time. I'll try tomorrow, ok?

Am 30.07.15 um 16:53 schrieb Frank Noe:

I agree

Am 30/07/15 um 16:42 schrieb Feliks Nüske:

This is a non-trivial point isn't it? The single amino-acid eigenvectors
are undefined for unpopulated states, but in principle, these states
might show up in simulations of more complicated systems. Francesca,
have you encountered this before?
We can at least modify the first eigenvector to be equal to one
everywhere.

Am 30.07.15 um 16:27 schrieb fvitalini:

Hi Frank,

no it does make sense.
The basis functions of the capped amino acids are evaluated on a 36X36
grid, but not all of the microstates are actually populated.

ac_a_nhme_rev_0

https://cloud.githubusercontent.com/assets/13469315/8985400/7e792fd4-36d7-11e5-99e5-8ab21dd7fb85.jpg

The micro state corresponding to the phi/psi combination [120,120] it
is simply never visited.

What I have tested where the functions within ramachandran.py.
I checked that by providing an np-array containing the phi/psi time
series of all residues, the function would construct a matrix
containing the the trajectory projected onto the basis functions and
that it would produce the same results as my old code.

Francesca


Reply to this email directly or view it on GitHub

#8 (comment).


Reply to this email directly or view it on GitHub

#8 (comment).


Prof. Dr. Frank Noe
Head of Computational Molecular Biology group
Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354
Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany


Reply to this email directly or view it on GitHub
#8 (comment).

@fvitalini
Copy link

Hi,

The microstates where the first eigenvector is zero are states that are not part of the largest connected set in the MSM of the amino acid.
Theoretically it is true that the same amino acid in a different sequence might have a “slightly” different distribution.
However, the hypothesis at the basis of such basis set definition is that the differences in the dynamics of X between Ac-X-NHMe and Y-X-Z should be irrelevant.
The basis functions I have used for the paper have zeros for those microstates that are not visited by the trajectory.

I have encountered already a case where there was an obvious difference between the capped amino acid and the amino acid in the sequence.
For example, Alanine’s distribution in Ac-AP-NHMe is very different from Ac-A-NHMe. We ended up defining a new basis function in that case.
I haven’t checked if any of the other amino acids populates states that are not populated in the corresponding residue-based functions, but this has not been an issue for me so far.

I will provide an example on how to use the functions "Single Ramachandran Basis” and "Product Basis”. Is it ok if I add a folder, e.g. EXAMPLE, and inside provide scripts and files to try the functions?

Francesca

Il giorno 30/lug/2015, alle ore 16:42, Feliks Nüske [email protected] ha scritto:

This is a non-trivial point isn't it? The single amino-acid eigenvectors
are undefined for unpopulated states, but in principle, these states
might show up in simulations of more complicated systems. Francesca,
have you encountered this before?
We can at least modify the first eigenvector to be equal to one everywhere.

Am 30.07.15 um 16:27 schrieb fvitalini:

Hi Frank,

no it does make sense.
The basis functions of the capped amino acids are evaluated on a 36X36
grid, but not all of the microstates are actually populated.

ac_a_nhme_rev_0
https://cloud.githubusercontent.com/assets/13469315/8985400/7e792fd4-36d7-11e5-99e5-8ab21dd7fb85.jpg

The micro state corresponding to the phi/psi combination [120,120] it
is simply never visited.

What I have tested where the functions within ramachandran.py.
I checked that by providing an np-array containing the phi/psi time
series of all residues, the function would construct a matrix
containing the the trajectory projected onto the basis functions and
that it would produce the same results as my old code.

Francesca


Reply to this email directly or view it on GitHub
#8 (comment).


Reply to this email directly or view it on GitHub.

@franknoe
Copy link
Contributor Author

sure

Am 30/07/15 um 17:01 schrieb Feliks Nüske:

Ok, but today I don't have the time. I'll try tomorrow, ok?

Am 30.07.15 um 16:53 schrieb Frank Noe:

I agree

Am 30/07/15 um 16:42 schrieb Feliks Nüske:

This is a non-trivial point isn't it? The single amino-acid
eigenvectors
are undefined for unpopulated states, but in principle, these states
might show up in simulations of more complicated systems. Francesca,
have you encountered this before?
We can at least modify the first eigenvector to be equal to one
everywhere.

Am 30.07.15 um 16:27 schrieb fvitalini:

Hi Frank,

no it does make sense.
The basis functions of the capped amino acids are evaluated on a
36X36
grid, but not all of the microstates are actually populated.

ac_a_nhme_rev_0

https://cloud.githubusercontent.com/assets/13469315/8985400/7e792fd4-36d7-11e5-99e5-8ab21dd7fb85.jpg

The micro state corresponding to the phi/psi combination
[120,120] it
is simply never visited.

What I have tested where the functions within ramachandran.py.
I checked that by providing an np-array containing the phi/psi time
series of all residues, the function would construct a matrix
containing the the trajectory projected onto the basis functions and
that it would produce the same results as my old code.

Francesca


Reply to this email directly or view it on GitHub

#8 (comment).


Reply to this email directly or view it on GitHub

#8 (comment).


Prof. Dr. Frank Noe
Head of Computational Molecular Biology group
Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354
Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany


Reply to this email directly or view it on GitHub

#8 (comment).


Reply to this email directly or view it on GitHub
#8 (comment).


Prof. Dr. Frank Noe
Head of Computational Molecular Biology group
Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354
Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

@franknoe
Copy link
Contributor Author

Am 30/07/15 um 17:01 schrieb fvitalini:

Hi,

The microstates where the first eigenvector is zero are states that
are not part of the largest connected set in the MSM of the amino acid.
Theoretically it is true that the same amino acid in a different
sequence might have a “slightly” different distribution.
However, the hypothesis at the basis of such basis set definition is
that the differences in the dynamics of X between Ac-X-NHMe and Y-X-Z
should be irrelevant.
If you encounter a new system that visits points that have not been
visited in your parametrization, one still needs to do something
reasonable with them. At the least the first column must be 1, otherwise
subsequent algorithms such as Feliks' one will simply break down.
But also for the other columns I think we have to do some reasonable
interpolation.

I'm sure that in large peptides or proteins you will not only have
slight differences, but you can lock amino acids in phi/psi values that
are practically forbidden for separate amino acids. So this is an issue.

The basis functions I have used for the paper have zeros for those
microstates that are not visited by the trajectory.

I have encountered already a case where there was an obvious
difference between the capped amino acid and the amino acid in the
sequence.
For example, Alanine’s distribution in Ac-AP-NHMe is very different
from Ac-A-NHMe. We ended up defining a new basis function in that case.
I haven’t checked if any of the other amino acids populates states
that are not populated in the corresponding residue-based functions,
but this has not been an issue for me so far.

I will provide an example on how to use the functions "Single
Ramachandran Basis” and "Product Basis”. Is it ok if I add a folder,
e.g. EXAMPLE, and inside provide scripts and files to try the functions?
OK, add such a folder examples at the top level of the repository.
If you add data, again make sure to use binary data, and ideally compressed.

Francesca

Il giorno 30/lug/2015, alle ore 16:42, Feliks Nüske
[email protected] ha scritto:

This is a non-trivial point isn't it? The single amino-acid
eigenvectors
are undefined for unpopulated states, but in principle, these states
might show up in simulations of more complicated systems. Francesca,
have you encountered this before?
We can at least modify the first eigenvector to be equal to one
everywhere.

Am 30.07.15 um 16:27 schrieb fvitalini:

Hi Frank,

no it does make sense.
The basis functions of the capped amino acids are evaluated on a
36X36
grid, but not all of the microstates are actually populated.

ac_a_nhme_rev_0

https://cloud.githubusercontent.com/assets/13469315/8985400/7e792fd4-36d7-11e5-99e5-8ab21dd7fb85.jpg

The micro state corresponding to the phi/psi combination [120,120] it
is simply never visited.

What I have tested where the functions within ramachandran.py.
I checked that by providing an np-array containing the phi/psi time
series of all residues, the function would construct a matrix
containing the the trajectory projected onto the basis functions and
that it would produce the same results as my old code.

Francesca


Reply to this email directly or view it on GitHub

#8 (comment).


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub
#8 (comment).


Prof. Dr. Frank Noe
Head of Computational Molecular Biology group
Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354
Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants