RiD-kit
uses a JSON-format file (typically rid.json
) to configure simulations. Here we explain these parameters one by one.
-
name
(str)
is the name of this task. -
numb_walkers
(int)
is the number of parallel walkers to explore.Exploration
Step can be achieved bynumb_walkers
parallel trajectories simultaneously. Data of these parallel walkers are collected together into to train neural networks. -
numb_iters
(int)
is the maximum number of iterations of RiD workflow. As it is very convenient to continue and rerun the RiD workflow, it does not really matter to set a accurate value. A recommended value is greater than 5, typically around 10 for the first attemption. -
trust_lvl_1
(int)
andtrust_lvl_2
(int)
(or$e_0$ and$e_1$ in published papers) are two thresholds to control the biased forces and select data. In biased simulation, the bias forces are tuned by model deviations: $$ F(r) = -\nabla_{r_i} U(r) + \sigma( \epsilon ( s( r))) \nabla_{r_i} A(r) \ \sigma(\epsilon)= \begin{cases} 1, & \epsilon<\epsilon_0 \ \frac{1}{2}+\frac{1}{2}\cos{(\pi \frac{\epsilon-\epsilon_0}{\epsilon_1-\epsilon_0})}, & \epsilon_0 <\epsilon < \epsilon_1 \ 0, &\epsilon > \epsilon_1 \end{cases} $$ In data selection, data will be collected if their model deviations are greater thantrust_lvl_1
.In adaptive RiD version, these two values refer to the initial trust levels and will be adjusted according to the number of clusters during simulations.
-
init_models
(List[str])
are the initial guesses of neural networks. Usually we know nothing about the systems and[]
is set to it.
"name": "test",
"numb_walkers": 2,
"numb_iters": 20,
"trust_lvl_1": 2,
"trust_lvl_2": 3,
"init_models": [],
This section configures collective variables (CVs). RiD-kit
provides three modes to configure CV: "torsion"
, "distance"
and "custom"
.
In torsion mode, RiD-kit
uses torsion (dihedral angles) of proteins as collective variables. Set "mode": "torsion"
. selected_resid
, angular_mask
and weights
must not be none
or empty if you use torsion mode.
-
selected_resid
(List[int])
residue ids (starting form 1) of selected residues. Two dihedral angles of each selected residue,$\phi$ and$\psi$ , are used. Note that the first residue of a chain (N terminal) doesn't have$\phi$ and the last residue of a chain (C terminal) doesn't have$\psi$ . -
angular_mask
(List[int])
the mask of augular (periodic) CVs, 1 for periodic and 0 for non-periodic. In torsion mode, all CVs are periodic, so a list filled by 1 with length equal to number of CVs should be set. -
weights
(List[int])
weights of CVs to scale their values. Used in clustering to calculate the Euclidean distances between CVs. This can prevent from CV discrimination if some CV's range is much larger than another one.
In distance mode, RiD-kit
uses distance between atoms of systems as collective variables. Set "mode": "distance"
.
selected_atomid
, angular_mask
and weights
must not be none
or empty if you use distance mode.
-
selected_atomid
(List[List[int]])
ids (starting form 1) of selected pair of atoms. -
angular_mask
(List[int])
the mask of augular (periodic) CVs, 1 for periodic and 0 for non-periodic. In distance mode, all CVs are nonperiodic, so a list filled by 0 with length equal to number of CVs should be set. -
weights
(List[int])
weights of CVs to scale their values. Used in clustering to calculate the Euclidean distances between CVs. This can prevent from CV discrimination if some CV's range is much larger than another one.
RiD-kit
also supports user-defined collective variables.
Set "mode": "custom"
to use customed CV of your own design. cv_file
, angular_mask
and weights
must not be none
or empty if you use custom mode. The CV file is in the PLUMED2 format, you should add your own CV to the PRINT
line in the CV file.
Note that if you use only one CV file, the CV file name should not end with ".pdb"
. If you use multiple files to define your own CV, the file to define your CV should not end with ".pdb"
, and other files should end with ".pdb"
.
-
"cv_file"
(List[str])
List of Paths to CV files. The files define collective variables in PLUMED2 format. Technically, CV that PLUMED2 supports can be suporrted byRiD-kit
. -
"angular_mask"
(List[int])
the mask of augular (periodic) CVs, 1 for periodic and 0 for non-periodic. In custom mode, figure out the periodic CVs and set 1 at the corresponding location in list. -
"weights"
(List[int])
the same as above.
"CV": {
"mode": "torsion",
"selected_resid": [ 1, 2 ],
"angular_mask": [ 1, 1 ],
"weights": [ 1, 1 ],
"cv_file":[""]
}
"CV": {
"mode": "distance",
"selected_atomid": [[2,5],[5,7]],
"angular_mask": [0,0],
"weights": [1,1],
"cv_file":[""]
}
"CV": {
"mode": "custom",
"selected_atomid":[[161,165],[124,156],[124,161],[156,165]],
"angular_mask": [0,0,0,0],
"weights": [1,1,1,1],
"cv_file": ["colvar", "plmpath.pdb"]
}
This section configures the parameters in Exploration
step. Currently, rid-kit supports two types of sampler: "gmx"
and "lmp"
, stand for Gromacs and Lammps respectively.
Set "type": "gmx"
to use Gromacs as sampler. nstep
, temperature
, ref-t
,output_freq
, dt
, output_mode
must not be none
or empty if you use gmx type. You can also set other parameters in mdp
files for your usage (Acutally it is recommended since the default settings in rid-kit may be be the best suit for your system).
-
nsteps
(int)
Number of steps of MD simulation in exploration step. -
type
(str)
Type of sampler inExploration
step. Currently rid-kit supportgmx
(Gromacs) andlmp
(Lammps). -
temperature
(int)
Temperature of MD simulations. Please make sure this value is the same as the temperature inLabel
step unless you are meant to keep them different. -
ref-t
(str)
Temperature for each group of the system, default is300 300
forwater non-water
group. -
output_freq
(int)
Frame output frequence of MD simulations. A recommended value isnsteps/1000
to make sure at least 1000 frames generated during exploration. -
dt
(int)
Time interval of MD simulations inps
. 0.002 is recommended for normal simulations. One may use a larger interval, e.g. 0.004, when heavy hydrogen modes in Gromacs. -
output_mode
(str):
Optional modes:"both", "single", "double", "none"
."both"
: Generate both full presicion format.trr
and compressed presision format.xtc
trajectories during MD simulations."single"
Only generate compressed presision format.xtc
trajectories during MD simulations."double"
Only generate full presicion format.trr
trajectories during MD simulations."none"
Don't generate trajectory files. (Used for tasks that only need PLUMED2 ouput.)
-
ntmpi
(int)
Number of thread-MPI ranks to start (0 is guess). See detail in Gromacs manual. -
nt
(int)
Total number of threads to start (0 is guess). See detail in Gromacs manual. -
max_warning
Max warnings ingmx grompp
steps. See detail in Gromacs manual.
Set "type": "lmp"
to use Lammps as sampler. output_freq
, dt
and inputfile
must not be none
or empty if you use lmp type.
inputfile
(str)
Name of lammps input file to define the simulation paramers.
"ExploreMDConfig": {
"nsteps": 50000,
"type": "gmx",
"temperature": 300,
"output_freq": 50,
"ref-t": "300 300",
"verlet-buffer-tolerance":"-1",
"rlist": 1,
"rvdw": 0.9,
"rvdw-switch": 0,
"rcoulomb": 0.9,
"rcoulomb-switch": 0,
"epsilon-r":1,
"epsilon-rf":80,
"dt": 0.002,
"fourierspacing": "0.12",
"output_mode": "single",
"ntmpi": 1,
"nt": 8,
"max_warning": 2
}
"ExploreMDConfig": {
"type":"lmp",
"inputfile":"input_explore.lammps",
"dt": 0.002,
"output_freq": 50,
"ntmpi": 1,
"nt": 8,
"max_warning": 2
}
This section configures the parameters in Selection
step. In Selection
step, all CV values are clustered. Then data owning high model deviation are selected, collected and sent to Label
step.
-
"cluster_threshold"
(int)
Initial guess of cluster threshold. Note: the real cluster threshold is generated from this guess. -
numb_cluster_lower
(int)
andnumb_cluster_upper
(int)
These two values form an closed interval[numb_cluster_lower, numb_cluster_upper]
to make a proper cluster threshold. From the initial guess of cluster, threshold will be adjusted to let the number of clusters fall into the interval. This process only happens in the first iteration. The threshold will be fixed in the following iterations where thetrust level
will be adjusted in adaptive version of RiD. See published paper for detail. -
"max_selection"
(int)
The max selection number of clusters duringSelection
step. If number of clusters is greater than this threshold, the firstmax_selection
th clusters will be selected. -
numb_cluster_threshold
(int)
If number of clusters of MD trajectories in exploration step at current interation is less than this value, the trust level will be adjusted. See published paper for detail. A recommended value is half ofnumb_cluster_lower
. -
slice_mode
(str)
Optional values:"gmx"
and"mdtraj"
.RiD-kit
extracts selected frame from MD trajectorie.gmx
mode uses, Gromacsgmx trjconv
to slice trajectories,mdtraj
mode usesmdtraj
python interface to slice trajectories. We highly recommed usinggmx
mode due to known bugs (#Issue1514 ) frommdtraj
of changing.gro
topology names.
"SelectorConfig": {
"cluster_threshold": 1,
"numb_cluster_lower": 16,
"numb_cluster_upper": 26,
"max_selection": 30,
"numb_cluster_threshold": 8,
"slice_mode": "gmx"
},
This section configures the parameters in Label
step. Currently, rid-kit supports two methods of labeling: "restrained"
and "constrained"
, stand for restrained MD
and constrained MD
respectively. Most settings are quite similar to those in Exploration
Step. The simulation time is usually different between Exploration
and Label
steps, while the simulation time for Exploration
has more freedom, the simulation time for Label
step has to be chosen with care (longer enough to ensure convergence and shorter enough to avoid wasting). Our experience is that 100ps
is enough for torsion mode
in restrained MD method
, 1ns
is enough for distance mode
in constrained MD method
.
Set "method": "restrained"
to use restrained MD as mean force calculator. The only different parameters with Exploration
step is kappas
and std_threshold
.
-
kappas
(List[int])
A list of force constants ($\kappa$ ) of harmonic restraints. The length of the list is equal to the number of CVs. -
std_threshold
(float)
(default 5.0, the unit is consistent with the mean force) A number represents the mean force standard deviation threshold, beyond which the mean force is neglected and will not be used in the dataset for training free energy model. You should test labeling MD for your own system to determine an appropriate number for this threshold.
Set "method": "constrained"
to use constrained MD as mean force calculator. Currently rid-kit only supports distance CV to use this method, also only gmx
type is supported to perform constrained MD simulation. The other parameters is the same with Exploration
step.
Note that if you want to use constrained MD as the mean force calculator, apart from setting method
to be constrained
in the label_config
, you should add [ constraints ]
line corresponding to the [ moleculartype ]
in your input topology
file yourself, since gromacs specifies constraints information for each [ moleculartype ]
.
Also note that, since gromacs only supports constrained MD for distance CV
, the constrained MD simulation in rid-kit only supports distance CV
at this moment.
"LabelMDConfig": {
"nsteps": 50000,
"temperature":300,
"method": "restrained",
"type": "gmx",
"output_freq": 100,
"ref-t": "300 300",
"rlist": 1,
"verlet-buffer-tolerance":"-1",
"rvdw": 0.9,
"rvdw-switch": 0,
"rcoulomb": 0.9,
"rcoulomb-switch": 0,
"epsilon-r":1,
"epsilon-rf":80,
"dt": 0.002,
"fourierspacing": "0.12",
"output_mode": "single",
"ntmpi": 1,
"nt": 8,
"max_warning": 2,
"kappas": [ 500, 500 ],
"std_threshold": 2.0
}
"LabelMDConfig": {
"nsteps": 50000,
"temperature":300,
"method": "constrained",
"type": "gmx",
"output_freq": 100,
"ref-t": "300 300",
"rlist": 1,
"verlet-buffer-tolerance":"-1",
"rvdw": 0.9,
"rvdw-switch": 0,
"rcoulomb": 0.9,
"rcoulomb-switch": 0,
"epsilon-r":1,
"epsilon-rf":80,
"dt": 0.001,
"fourierspacing": "0.12",
"output_mode": "both",
"ntmpi": 1,
"nt": 8,
"max_warning": 2,
"std_threshold": 10.0
}
This section configures the parameters in Train
step. RiD-kit
is based on Tensorflow
.
-
numb_models
(int)
Number of models that are trained inTrain
step.RiD-kit
uses model deviations (or standrad deviation of output of these models) to evaluate the quality of free energy surface, sonumb_models
mush be greater than 1. -
neurons
(List[int])
The number of neurons of each layer.RiD-kit
uses MLP as the basic neural network structure. Number of elements in list means the number of hidden layers and each element defines number of nodes in each layer. For example,[ 50, 50, 50, 50 ]
means there are 4 hidden layers and each hidden layers has 50 neurons. -
resnet
(bool)
Wether to use residual connection between layers. Iftrue
, the number of nodes of layers must be equal. -
epoches
(int)
Numebr of epoches. -
init_lr
(float)
Initial learning rate. It will decay exponentially during training. -
decay_steps
(int)
Decay steps of learning rate. See tensorflow api docs for detail. -
decay_rate
(float)
Decay rate of learning rate. See tensorflow api docs for detail. -
drop_out_rate
(float)
Dropout rate of dropout layers. -
numb_threads
(int)
Threads of training.
"Train": {
"numb_models": 4,
"neurons": [ 50, 50, 50, 50 ],
"resnet": true,
"batch_size": 32,
"epoches": 2000,
"init_lr": 0.0008,
"decay_steps": 120,
"decay_rate": 0.96,
"drop_out_rate": 0.1,
"numb_threads": 8,
"use_mix": false,
"restart": false
}
You can find full examples of rid.json
within "rid-kit/rid/template"
.