Skip to content

Get Started

Shriyaa Mittal edited this page Jun 22, 2018 · 7 revisions

This page details the steps to follow before you run the Optimal Probes program.

Datesets

Your dataset should comprise of the Molecular Dynamics (MD) simulations trajectory files in a format that is recognized by MDTraj and a topology file. We recommend that you strip your MD dataset to include only the Cα atoms of your protein to ensure fast processing.

Input File

Optimal Probes program is configured via an input file which must be set-up before running the program. The typical sections in the file are described below. An example input file is available here.

Source code path

sourceCodePath='/Users/Softwares/optimalProbes'

Job Identifiers

jobname = "test-01"

Genetic Algorithm Parameters

The program uses a genetic algorithm based optimization approach and it may be possible that the following parameters impact the results. However, over a large number of iterations, scores for the choices made will tend to converge.

N_ITERATIONS=20
populationSize=20
percentMutation=50
percentCrossover=20

Experiment Constraints

Please ensure these values are set according to the instrument using which you will conduct your experiment and the protein of your interest. The minimum and maximum number of probes specify the number of site labeled cysteine mutations you would be able to perform given the resources the amount of protein required for experiment.

NB: distance values are in nm (10-9m)

DEER_low = 1.8
DEER_up = 6.0
min_probes=2
max_probes=10

Trajectory Information

topology_file = "/Users/Dataset/test.pdb"
traj_path = "/Users/Dataset"
trajectory_format = "dcd"

Protein Topology Information

Each of the following are important to ensure that your analysis runs quickly and may take very long if the following are not specified.

  • Elements: These are the secondary structural elements in your protein of interest. We recommend that you use numbering as used by MDTraj which can be determined by loading one of the trajectories in MDTraj, an example is as follows:
~ $ ipython
In [1]: import mdtraj as md
In [2]: t=md.load('testtrajectory-01.dcd',top='test.pdb')
In [3]: t
Out[3]: <mdtraj.Trajectory with 1001 frames, 1136 atoms, 284 residues, and unitcells at 0x7f31bf15b550>
In [4]: for resid in range(t.n_residues):
  ....:     print resid, t.topology.residue(resid)
  ....:
0 ASP23
1 VAL24
2 THR25
3 GLN26
4 GLN27
5 ARG28
6 ASP29
7 GLU30
...
  • not_allowed: As the name indicates, this is a list of amino acids which cannot be labelled with the MTSSL probe, due to loss in functionality or inaccessible position in the protein.
  • intra, extra : In case of membrane proteins, it is required you indicate the regions of the protein which are on the intracellular and extracellular side of the protein by drawing an imaginary line through the middle of the protein. Incorrect residue specifications may lead to results that do not make sense. For cytoplasmic proteins, these can simply be left as intra=[] and extra=[].
elements=[range(0,5),range(5,37),range(37,44),range(44,73),range(73,81)]
not_allowed=range(8,34)+range(47,70)
intra=range(21,58)+range(97,137)
extra=range(0,21)+range(58,97)

Osprey Parameters

The choice of a Markovian lagtime is required to develop a Markov State Model and must be determined before using Optimal Probes. The lagtime here should be indicated as the number of frames (and not the actual simulation time). For now, the other parameters for building the MSM are chosen beforehand and user can define the number of clusters in the state decomposition.

clusters=200
lagtime=500