Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/drorlab/pensa
Browse files Browse the repository at this point in the history
  • Loading branch information
martinvoegele committed Dec 1, 2024
2 parents 07c8420 + 6ce6108 commit dd9ede9
Show file tree
Hide file tree
Showing 32 changed files with 142 additions and 63 deletions.
Empty file modified docs/create_docs.sh
100644 → 100755
Empty file.
Binary file added docs/images/Density.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/JSD_pdb.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/SSI_pdb.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/Torsions.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/WaterFeatures.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/Waters_and_Density.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/bb-clusts.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/bb-dists.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Binary file added docs/images/receptor_sc-torsions_jsd.pdf
Binary file not shown.
Binary file added docs/images/receptor_sc-torsions_ssi.pdf
Binary file not shown.
Binary file added docs/images/sc-jsd.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/sc-ssi.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 5 additions & 1 deletion docs/tut-2-preprocessing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,10 @@ relatively rigid with sites that are spatially static, for example internal
water cavities in membrane proteins. Here we demonstrate the preprocessing for
water density, however the same procedure would be used for ions.

.. image:: images/Density.png
:height: 300px
:align: center
:alt: Density of protein

Files and Directories
---------------------
Expand Down Expand Up @@ -236,7 +240,7 @@ This helps us avoid memory errors with large python arrays.
out_name_water_a+".gro", out_name_water_a+"_aligned.xtc",
out_name_water_b+".gro", out_name_water_b+".xtc",
atomgroup="OH2", write_grid_as="TIP3P",
out_name="traj/water_grid_ab_",
out_name="ab_grid_",
use_memmap=True, memmap='traj/combined.mymemmap'
)
Expand Down
24 changes: 18 additions & 6 deletions docs/tut-3-featurization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,12 @@ Note that all reader functions load the names of the features and
their values separately.


.. image:: images/Torsions.jpg
:height: 300px
:align: center
:alt: Torsion angles.


Basic Example
*************

Expand Down Expand Up @@ -133,14 +139,20 @@ molecules occupy that are of interest. Water pocket featurization extracts
a distribution that represents whether or not a specific protein cavity is occupied
by a water molecule, and what that water molecule's orientation (polarisation) is.


.. image:: images/WaterFeatures.jpg
:width: 300px
:align: center
:alt: Water features derived from density.

.. code:: python
from pensa.features import read_water_features
For the pdb visualisation, the trajectory needs to be fit to the first frame of the simulation
so that the density and protein align with each other.

Here we featurize the top 3 most probable water sites (top_waters = 3).
Here we featurize the top 2 most probable water sites (top_waters = 2).
Orientation of the waters (water_data - spherical coordinates [radians]) is a
timeseries distribution. When water is not present at the site, the orientation
is recorded as 10000.0 to represent an empty state. If write=True, we can
Expand All @@ -161,7 +173,7 @@ water
water_feat, water_data = read_water_features(
structure_input = struc,
xtc_input = xtc,
top_waters = 1,
top_waters = 2,
atomgroup = "OH2",
write = True,
write_grid_as="TIP3P",
Expand All @@ -176,11 +188,11 @@ This way, sites are the same across both ensembles and can be compared.
struc = "traj/condition-a_water.gro"
xtc = "traj/condition-a_water_aligned.xtc"
grid = "traj/water_grid_ab_OH2_density.dx"
grid = "ab_grid_OH2_density.dx"
water_feat, water_data = read_water_features(
structure_input = struc,
xtc_input = xtc,
top_waters = 5,
top_waters = 2,
atomgroup = "OH2",
grid_input = grid
)
Expand All @@ -206,10 +218,10 @@ written (write=True) using the default density conversion "Angstrom^{-3}" in MDA
atom_feat, atom_data = read_atom_features(
structure_input = struc,
xtc_input = xtc,
top_atoms = 1,
top_atoms = 2,
atomgroup = "SOD",
element = "Na",
write = True,
out_name = "features/11426_dyn_151_sodium"
)
33 changes: 33 additions & 0 deletions docs/tut-4-comparison.rst
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,19 @@ distance per residue in the "B factor" field of a PDB file.
y_label='max. JS dist. of BB torsions'
)
.. image:: images/JSD_pdb.png
:height: 300px
:align: center
:alt: JSD pbd b-factor visualisaton file.


.. image:: images/sc-jsd.png
:height: 300px
:align: center
:alt: Jensen-Shannon Distance PDF output.


Let's now save the resulting data in CSV files.

.. code:: python
Expand Down Expand Up @@ -159,6 +172,18 @@ We can plot the results in the same way as we did for the backbone analysis.
)
.. image:: images/SSI_pdb.png
:height: 300px
:align: center
:alt: SSI pbd b-factor visualisaton file.

.. image:: images/sc-ssi.png
:height: 300px
:align: center
:alt: State-Specific Information PDF output.



Comparing Distances
-------------------

Expand Down Expand Up @@ -194,3 +219,11 @@ divergence, Kolmogorov-Smirnov statistic etc. instead).
names_bbdist, jsd_bbdist, "plots/receptor_jsd-bbdist.pdf",
vmin = 0.0, vmax = 1.0, cbar_label='JSD'
)
.. image:: images/bb-dists.png
:height: 300px
:align: center
:alt: JSD distances pbf plot.

8 changes: 8 additions & 0 deletions docs/tut-5-dimensionality.rst
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,14 @@ We now perform the actual clustering on the combined data.
)
cidx, cond, oidx, wss, centroids = cc
.. image:: images/bb-clusts.png
:height: 300px
:align: center
:alt: BB torsion cluster pbf plot.

... and save the results to a CSV file.

.. code:: python
Expand Down
2 changes: 1 addition & 1 deletion pensa/features/hbond_features.py
Original file line number Diff line number Diff line change
Expand Up @@ -483,7 +483,7 @@ def read_water_site_h_bonds_quickly(structure_input, xtc_input, atomgroups, site
else:
g = Grid(grid_input)
elif grid_input is None:
g = generate_grid(u, atomgroups)
g = generate_grid(u, atomgroups[0])
else:
g = Grid(grid_input)

Expand Down
2 changes: 1 addition & 1 deletion pensa/features/water_features.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,7 @@ def read_water_features(structure_input, xtc_input, atomgroup, top_waters=10,
water_information.append([water_ID, list(atom_location), water_pocket_occupation_frequency])

# Write data out and visualize water sites in pdb
if write is True:
if out_name is not None:
data_out(out_name + '_' + water_ID + '.txt', water_out)
data_out(out_name + '_WaterSummary.txt', water_information)
write_atom_to_pdb(pdb_outname, atom_location, water_ID, atomgroup)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,45 +22,43 @@
"""

struc = "mor-data/11426_dyn_151.pdb"
xtc = "mor-data/11423_trj_151.xtc"
water_feat, water_data = read_water_features(
structure_input=struc,
xtc_input=xtc,
top_waters=1,
atomgroup="OH2",
write=True,
write_grid_as="TIP3P",
out_name="11426_dyn_151"
)
# struc = "mor-data/11426_dyn_151.pdb"
# xtc = "mor-data/11423_trj_151.xtc"
# water_feat, water_data = read_water_features(
# structure_input=struc,
# xtc_input=xtc,
# top_waters=1,
# atomgroup="OH2",
# write_grid_as="TIP3P",
# out_name="11426_dyn_151"
# )


# We can use the get_atom_features, which provides the same
# functionality but ignores orientations as atoms are considered spherically symmetric.
# # We can use the get_atom_features, which provides the same
# # functionality but ignores orientations as atoms are considered spherically symmetric.

struc = "mor-data/11426_dyn_151.pdb"
# struc = "mor-data/11426_dyn_151.pdb"

xtc = "mor-data/11423_trj_151.xtc"
# xtc = "mor-data/11423_trj_151.xtc"

# Here we locate the sodium site which has the highest probability
# The density grid is written (write=True) using the default density conversion "Angstrom^{-3}" in MDAnalysis
# # Here we locate the sodium site which has the highest probability
# # The density grid is written (write=True) using the default density conversion "Angstrom^{-3}" in MDAnalysis

atom_feat, atom_data = read_atom_features(
structure_input=struc,
xtc_input=xtc,
top_atoms=1,
atomgroup="SOD",
element="Na",
write=True,
out_name="11426_dyn_151"
)
# atom_feat, atom_data = read_atom_features(
# structure_input=struc,
# xtc_input=xtc,
# top_atoms=1,
# atomgroup="SOD",
# element="Na",
# out_name="11426_dyn_151"
# )


# If we have already obtained the grid, we can speed up featurization by reading it in.

struc = "mor-data/11426_dyn_151.pdb"
xtc = "mor-data/11423_trj_151.xtc"
grid = "dens/11426_dyn_151OH2_density.dx"
grid = "11426_dyn_151_OH2_density.dx"
water_feat, water_data = read_water_features(
structure_input=struc,
xtc_input=xtc,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from pensa.features import read_cavity_bonds, read_h_bonds
from pensa.features import hbond_features, atom_features

# Featurize water cavity hydrogen bonds

Expand All @@ -8,26 +8,24 @@
pdb_file_a = root_dir + '/11426_dyn_151.pdb'
trj_file_a = root_dir + '/11423_trj_151.xtc'

names, data = read_cavity_bonds(
names, data = hbond_features.read_water_site_h_bonds_quickly(
ref_file_a, trj_file_a,
atomgroups=['OH2', 'H1', 'H2'],
site_IDs=[1, 2],
grid_input=None,
write=True,
write_grid_as='TIP3P',
out_name='11423_trj_169'
out_name='11423_trj_151'
)

# Faturize ligand-protein hydrogen bonds
# Featurize ligand-protein hydrogen bonds

ref_file_a = root_dir + '/11580_dyn_169.psf'
pdb_file_a = root_dir + '/11579_dyn_169.pdb'
trj_file_a = root_dir + '/11578_trj_169.xtc'

names, data = read_h_bonds(
names, data = hbond_features.read_h_bonds(
ref_file_a, trj_file_a,
fixed_group='resname 4VO',
dyn_group='protein',
write=True,
out_name='4VO_hbonds'
)
38 changes: 18 additions & 20 deletions tutorial/PENSA_Tutorial_SSI.py → scripts/state_specific_info.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@
if not os.path.exists(subdir):
os.makedirs(subdir)

# Extract the coordinates of the receptor from the trajectory
# # # Extract the coordinates of the receptor from the trajectory
extract_coordinates(
ref_file_a, pdb_file_a, trj_file_a,
out_name_a + "_receptor", sel_base_a
Expand All @@ -62,7 +62,7 @@
out_name_b + "_receptor", sel_base_b
)

# Extract the features from the beginning (start_frame) of the trajectory
# # Extract the features from the beginning (start_frame) of the trajectory
start_frame = 0
a_rec = read_structure_features(
out_name_a + "_receptor.gro",
Expand All @@ -81,9 +81,9 @@
out_name_a = "condition-a"
out_name_b = "condition-b"

# Extract the multivariate torsion coordinates of each residue as a
# timeseries from the trajectory and write into subdirectory
# output = [[torsion 1 timeseries], [torsion 2 timeseries], ..., [torsion n timeseries]]
# # Extract the multivariate torsion coordinates of each residue as a
# # timeseries from the trajectory and write into subdirectory
# # output = [[torsion 1 timeseries], [torsion 2 timeseries], ..., [torsion n timeseries]]
sc_multivar_res_feat_a, sc_multivar_res_data_a = get_multivar_res_timeseries(
a_rec_feat, a_rec_data, 'sc-torsions', write=True, out_name=out_name_a
)
Expand All @@ -96,16 +96,16 @@
discretize='gaussian', pbc=True
)

# We can calculate the State Specific Information (SSI) shared between the
# ensemble switch and the combined ensemble residue conformations. As the ensemble
# is a binary change, SSI can exist within the range [0, 1] units=bits.
# 0 bits = no information, 1 bits = maximum information, i.e. you can predict the state of the ensemble
# with certainty from the state of the residue.
# Set write_plots = True to generate a folder with all the clustered states for each residue.
# # We can calculate the State Specific Information (SSI) shared between the
# # ensemble switch and the combined ensemble residue conformations. As the ensemble
# # is a binary change, SSI can exist within the range [0, 1] units=bits.
# # 0 bits = no information, 1 bits = maximum information, i.e. you can predict the state of the ensemble
# # with certainty from the state of the residue.
# # Set write_plots = True to generate a folder with all the clustered states for each residue.
data_names, data_ssi = ssi_ensemble_analysis(
sc_multivar_res_feat_a['sc-torsions'], sc_multivar_res_feat_b['sc-torsions'],
sc_multivar_res_data_a['sc-torsions'], sc_multivar_res_data_b['sc-torsions'],
discrete_states_ab, verbose=True, write_plots=False
discrete_states_ab, verbose=True, write_plots=True
)

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
Expand Down Expand Up @@ -149,26 +149,25 @@

# Extract the combined density of the waters in both ensembles a and b
extract_combined_grid(
out_name_a + ".gro", "dens/cond-a_wateraligned.xtc",
out_name_a + ".gro", "traj/cond-a_water_aligned.xtc",
out_name_b + ".gro", out_name_b + ".xtc",
atomgroup="OH2",
write_grid_as="TIP3P",
out_name="ab_grid_",
use_memmap=True
)

grid_combined = "dens/ab_grid_OH2_density.dx"
grid_combined = "ab_grid_OH2_density.dx"

# Then we featurize the waters common to both simulations
# We can do the same analysis for ions using the get_atom_features featurizer.

water_feat_a, water_data_a = read_water_features(
structure_input=out_name_a + ".gro",
xtc_input="dens/cond-a_wateraligned.xtc",
xtc_input="traj/cond-a_water_aligned.xtc",
top_waters=2,
atomgroup="OH2",
grid_input=grid_combined,
write=True,
out_name="cond_a"
)

Expand All @@ -178,15 +177,14 @@
top_waters=2,
atomgroup="OH2",
grid_input=grid_combined,
write=True,
out_name="cond_b"
)

# Calculating SSI is then exactly the same as for residues
# Calculating SSI is then exactly the same as for residues, with the h2o argument set to True.
discrete_states_ab1 = get_discrete_states(
water_data_a['WaterPocket_Distr'],
water_data_b['WaterPocket_Distr'],
discretize='gaussian', pbc=True
discretize='gaussian', pbc=True, h2o=True, write_plots=True
)

# SSI shared between waters and the switch between ensemble conditions
Expand All @@ -198,7 +196,7 @@

# Alternatively we can see if the pocket occupancy (the presence/absence of water at the site) shares SSI
# Currently this is only enabled with ssi_ensemble_analysis. We need to turn off the periodic boundary conditions
# as the distributions are no longer periodic.
# as the distributions are no longer periodic.

discrete_states_ab2 = get_discrete_states(
water_data_a['WaterPocket_OccupDistr'],
Expand Down
Empty file modified tutorial/0-download.sh
100644 → 100755
Empty file.
Empty file modified tutorial/1-preprocessing-gpcrmd.sh
100644 → 100755
Empty file.
Empty file modified tutorial/2-comparison-of-feature-distributions.sh
100644 → 100755
Empty file.
Empty file modified tutorial/3-principal-component-analysis.sh
100644 → 100755
Empty file.
Empty file modified tutorial/4-clustering.sh
100644 → 100755
Empty file.
Loading

0 comments on commit dd9ede9

Please sign in to comment.